Recovering From Strange Failures/Problems (optional)

Last update: 18 Jul 2021 [History] [Edit]

When writing analysis code (just like any software) it is quite common that things don’t work. Usually that means that you have some problem in your code or your configuration, and that is usually the first thing you should be looking for. However, sometimes you are sure that you are doing everything correctly, and your code just behaves strangely. There are a couple of things you can try to recover from the more strange kinds of failures.

Basically the first thing to do is to rerun cmake and make and see if that fixes the problem. It is quite common that you forget to recompile and then your source code doesn’t match your compiled code anymore.

One of the next things to try is logging out and back in again, set up the release as described above. Sometimes you mess up your shell settings, your Kerberos tickets expired, etc. and you can just make sure you avoid all those problems in one swift move.

Another easy thing to try is to remove your entire build directory and recreate it as described here, usually combined with logging out and back in again. Sometimes you just mess up your build directory in one form or another (e.g. by setting up the wrong release). There are other ways to recover from this, but the easiest and most robust is to recreate your entire build directory. This is also the reason why we tell you not to put any files you create into the build directory, if you did you would lose them when deleting the build directory.

If you are not working on lxplus, you can try running on lxplus, and be it just for a test. While you may occasionally encounter an lxplus node that is not working correctly, overall they still represent our reference for correctly configured ATLAS systems, and if software works there but not on your institute cluster it probably means your institute machines are not properly configured.

If none of this helps, it is probably a good idea to ask for help.