Recovering From Strange Failures/Problems

Last update: 20 Nov 2024 [History] [Edit]

tip This section references some information that will be introduced later. Don’t worry about this. Just keep this section in mind for future reference.

When writing analysis code (just like any software) it is quite common that things don’t work. That often means that you have some problem in your code or your configuration and that is usually the first thing you should be looking for. However, sometimes you are sure that you are doing everything correctly and your code just behaves strangely. There are a couple of things you can try to recover from the more inexplicable failures.

The first thing to do is to rerun cmake and make and see if that fixes the problem. It is not uncommon that you forget to recompile after making code changes and then your source code doesn’t match your compiled code anymore. You can also try make clean and make to tidy up your build area and rebuild, which can help in case things get into an unexpected state.

Next, try logging out and back in again and then set up the release as described above. Sometimes you mess up your shell settings, your Kerberos tickets expired, etc., and you can just make sure you avoid all those problems in one swift move. This is the shell equivalent of unplugging it and plugging it back in.

Another easy thing to try is to remove your entire build directory and recreate it as described here, usually combined with logging out and back in again. Sometimes you just mess up your build directory in some way (e.g., by setting up the wrong release). There are other ways to recover from this, but the easiest and most robust is to recreate your entire build directory. This is also the reason why we tell you not to put any files you create into the build directory; if you did you would lose them when deleting the build directory.

If you are not working on lxplus, you can try running on lxplus as a test. While you may occasionally encounter an lxplus node that is not working correctly, overall they still represent our reference for correctly configured ATLAS systems, and if software works there but not on your institute cluster it probably means your institute machines are not properly configured.

If none of this helps, it is probably a good idea to ask for help.