Shift guide for release coordinators

Last update: 15 Mar 2024 [History] [Edit]

Introduction

This page contains information about the workflows and procedures a release coordinator is concerned with. It is assumed that you are familiar with the general workflow for ATLAS Offline software development as summarised in the Workflow Quick Reference. Having an overview of the tasks of the software review shifters as well as knowing basics of git is also helpful.

Checking the status of the release

Before accepting new MRs, you should check the status of the latest nightly. The first priority of a release coordinator should be to make sure that the release is in good shape, and obviously existing problems should usually be fixed before adding more MRs (though as release coordinator you should always feel free to use your judgement).

How to check the nightly build status?

The status of the nightly builds can be checked in two places:

The latter contains information on ART build jobs and other detailed information for the different build steps. Information on the nightly (and CI) build infrastructure are available on this twiki page.

For each nightly build check:

  1. All unit tests succeed. If there are failures, open a Jira ticket in the relevant Jira project and assign the “CI” label (see below). Also post a comment on the relevant MR that introduced the problem (if you can identify it). This also applies in case a unit test failure gets introduced during the day.
  2. No new failures in the ART build tests.
  3. Check the ART grid tests to see which tests are running / failing. It can happen that a MR has broken some tests because they aren’t part of the CI tests (the CI is necessarily a subset of all of the possible tests).

Any problem seen in the unit tests will very likely also affect the CI jobs. To help the review shifters and other developers please mark any relevant Jira tickets with the “CI” label. This will make the issue appear on the CI Status Board.

What to do if nightly is not yet available on cvmfs?

First check on Jenkins if the nightly finished or it is still ongoing. If it is still ongoing and it is already afternoon check that it is not stuck on something. Just by checking the console log file should give you a good idea if the nightly is progressing (select the nightly you are interested in and then click on the “Console” icon).

If the nightly finished and the status of the build is a Blue ball - the nightly should be available on CVMFS.

If the nightly finished and the Status is Red ball - the build failed for some reason. Now you need to identify the exact reason. Every nightly is executing the following steps so while investigating check the steps one by one and see where there is a problem:

  • git clone
  • building externals
  • building the main project
  • creating the RPMs
  • preparing the RPMs and copying them to EOS – once done the RPMs should be visible under http://atlas-software-dist-eos.web.cern.ch/atlas-software-dist-eos/RPMs/nightlies/
  • installation on CVMFS (in the Console log at the end there will be a line like
    The logfile is copied under: /cvmfs/atlas-nightlies.cern.ch/repo/sw/master/2018-07-15T2059/master__Athena__x86_64-slc6-gcc62-dbg__2018-07-15T2059__1531720274.ayum.log
    

    All the information can be found inside the “Console” log file for the given nightly.

WarningNote that issues with git and copying the RPMs to EOS are IT infrastructure related issues. If there were git problems you can restart the nightly. If there were RPM to EOS copy problems check with Atlas.Release to redo the copy.

Handling new merge requests

Finding open merge requests for your release

In git, different releases are represented by different branches of the atlas/athena git repository. One subtle point is that what code we build from the branch is controlled by the project specific scripts that live in the Projects directory of the branch (e.g., here for the main branch). You are, however, the release coordinator for all projects built from a branch.

Upon creation, merge requests will be tagged automatically by the CI system with a label indicating their target branch. In addition, the CI system also adds a label indicating the stage and result of the software review process. Changes approved by the software review shifters are labeled as review-approved. Therefore, you should be able to select all ready-to-be-accepted merge requests from the GitLab merge request overview page. Simply filter the list of merge requests by selecting review-approved and your release branch name from the dropdown menu for labels.

Full rebuilds need to be triggered after updates where an incremental rebuild may not catch all changes properly, such as a LCG version update. Such updates should be performed in late evenings, with advance notification emailed to ATLAS robot to provide CI system administrators enough time to schedule cleaning local disks of build machines (to ensure builds are from scratch).

The CI job state is depicted by a small icon on the right (e.g. green checkmark for passed, red cross for failed, blue circle for running). This status sometimes does not get updated correctly (e.g. a CI job is indicated as still running while it has already finished). Therefore, please do not rely on this icon. If in doubt, check the discussion tab of this MR for the latest CI summary comment from the ATLAS robot.

Accepting merge requests

Before accepting any merge request, please have a quick look at the description and the discussion on GitLab merge request page. They may contain some more detailed information about the nature of this merge request and possibly also the relation with other merge requests (e.g. dependencies between MRs). You should also make sure that all discussions were resolved.

Once you are sure that you want to merge these changes into your release, you can accept the merge request by pushing the green “Merge” button.

WarningPlease do not use the command line merging procedure as described on GitLab.

Due to the fact that updates of the CI pipeline status has been found unreliable in the past, release coordinators are discouraged from using the “Merge When Pipeline Succeeds” feature.

How to roll back bad merges?

If you want to undo the changes introduced by an (already accepted) merge request, you have to go the GitLab webpage for this merge request. There is an orange button labeled Revert. Clicking on this button opens a new dialog where you should choose the target branch to be the same as the target branch of the initial merge request. Please leave the checkbox “Open a new merge request” checked. Confirm the settings by clicking on the green Revert button. This will open a new merge request undoing the changes introduced by the faulty merge request and the usual CI jobs and the review process start automatically.

Output-changing merge requests

As of the time of writing, the policy for output-changing merge requests (i.e. MRs that change controlled output formats, and indicated by labels, e.g. Run3-XXX-output-changed) on main is that such MRs should not be accepted without explicit approval from Reconstruction and/or Software Coordinators.

In detail:

  • RC should tag Reconstruction (Simulation) and Software Coordinators on the MR, who may tag other experts if necessary.
  • The above should agree that the change is OK and also on the order in which concurrent output-changing MRs will be accepted.
  • All decisions should be recorded on the MR. Once the changes are agreed upon by the appropriate responsible persons according to the nature of the changes (Software/Reconstruction/Simulation Coordinators) the offline-sw-review-approved label should be added by those persons.
  • RC (or someone instructed by them) will update the reference files on eos and ask the developer to update the MR accordingly.
    • Due to the tricky sequencing related to reference file updates, other developers should not touch the reference files

Tag and deploy a release after a successful build

Once you are satisfied as the release coordinator that a nightly release is good enough to be deployed to the production CVMFS server then you need to do the following steps:

  1. Copy RPMS and Install to CVMFS
  2. Publish the release
  3. Update the release candidate number
  4. Announce the release by email

These steps are described in the following sections.

Copy RPMS and Install to CVMFS

Tip See here for instructions on how to setup tunnelling using browser plugins.

  • Log in with username “jobrest”. If you do not know the password, ask Alex Undrus.

  • Click “Build with parameters” on the left hand sidebar. Fill in the details on the screen where:

    • numbered_release: 23.0.42
    • cmtconfig can be: x86_64-centos7-gcc11-opt
    • project can be: Athena, AthSimulation, AnalysisBase etc
    • Executor_Label: leave it unchanged
    • nightly_name can be: main, 24.0, 22.0-mc20, ondemand and so on
    • rel_nightly is the timestamp for your nightly, e.g. 2023-04-02-T2101
    • Then click “Build”
    • Check the console to make sure nothing has gone wrong.
    • When everything is deployed, please announce the installed release by email as indicated below.

Repeat the above steps for any other platforms you may need (e.g. the opt and dbg builds typically have slightly different timestamps).

Publish the release

There are two ways to publish a release, manually or using the prepare_release_notes.py script. The recommended approach is to use the script (since this automates a lot of the process), but please note that it requires a GitLab token (see here, for instructions on how to create one: please note that at minimum your token needs to have an API ‘scope’ or permissions) in order to work optimally. Both approaches are described below.

Run the following commands to publish the release (making sure to add your gitlab API token instead of YOUR_GITLAB_API_TOKEN_HERE):

lsetup git
lsetup gitlab
git clone ssh://git@gitlab.cern.ch:7999/atlas/athena.git
cd athena
./Build/AtlasBuildScripts/prepare_release_notes.py release/23.0.42 nightly/23.0/2023-04-02T2101 -t YOUR_GITLAB_API_TOKEN_HERE

WarningAlways use the latest version of the script from the main branch even when building releases from other branches.

The script will ask if you want to let it build the release for you. If you choose yes (which you should!), it will ask you for the associated JIRA ticket and a short description. This description is NOT the release notes (the script will make these for you), but rather a high-level summary of the purpose of the release e.g.

This is the latest build of the Tier-0 data taking branch that fixes rare crashes seen for high pt muons.

Alert If you did NOT let the script make the release, then you will need to follow the relevant instructions in the next section.

The script will now create the release tag, make the release in gitlab and fill in the detailed release notes.

You should now skip the next section and go straight to Update the release candidate number.

Fallback: Publishing a release manually

Assuming you chose not to use the recommended approach above, you will need to do the following steps manually.

Create a new tag for the release in the main repository. Enter the Tag name, which should always be release/A.B.X[.Y], i.e., the actual numbered release version; the Create from should be the nightly build tag that is being used, which has the format nightly/BRANCH_NAME/DATE_STAMP.

Tip In case you have built multiple platforms for the same release use the nightly stamp of the primary platform (e.g. gcc11-opt). In practice this should not matter as the nightly tags for different platforms should all point to the same git commit if started around the same time.

Alert You can also create the tag locally in any clone of the repository and push it to atlas/athena. It is not recommended unless you really know what you are doing.

Then enter an informative message about the reason for deploying this particular release that will be useful to others (do not put the full release notes into the commit message, this is done in separate step below). An example might be the following:

Even if you are publishing the release manually, we still recommend that you generate the release notes using the prepare_release_notes.py script with the following parameters: the tag of the release you are building, the tag of the nightly used to build it. Even without a token, the script can generate the release notes for you (just a slightly less informative version).

For example:

lsetup git
lsetup gitlab
git clone ssh://git@gitlab.cern.ch:7999/atlas/athena.git
cd athena
./Build/AtlasBuildScripts/prepare_release_notes.py release/23.0.42 nightly/23.0/2023-04-02T2101 
# Release notes generated in 'release_notes.md'

WarningAlways use the latest version of the script from the main branch even when building releases from other branches.

Next you have to create the release manually:

  • Navigate to the tag that you just created
  • Click on the pen symbol next to the tag (“Create release”)
  • Fill in the form:
    • Release title: same as tag name (e.g. release/23.0.42)
    • Release date: today’s date
    • Release notes: copy the release_notes.md content here

You will now need to Update the release candidate number, as described below.

Update the release candidate number

Finally, now that a nightly has been promoted to a numbered release the release candidate number needs to be updated to be the next number in the release series for this branch + project. The easiest way to do this is via the “Web IDE” editor in GitLab itself. Navigate to the correct branch and project version.txt file and select “Open in Web IDE”:

In the following navigate to all version.txt files that need updating and increment the release number. Once done create a single commit on the release branch with all updates:

Alert All Projects (except AnalysisBase, AthGeneration and AthAnalysis) should be kept in sync. So if, for example, you update the version of Projects/Athena/ to 23.0.43 you should also update AthDataQuality, AthSimulation, DetCommon and VP1Light

Tip Only people with push rights to the branch can do this directly. You should have these rights as a release coordinator, otherwise you need to make a merge request.

Alternatively, you can do it entirely on the command in your local athena checkout:

git fetch upstream
git checkout upstream/BRANCH    # e.g. main, 24.0, ...
grep "" Projects/*/version.txt  # show all project versions
old=23.0.4 new=23.0.5 && git grep -l $old Projects/*/version.txt | xargs sed -i s/$old/$new/
git diff                        # verify the changes
git commit -a -m "Bump project versions to $new"
git push upstream

Announce the release by email

After the release has been built, please announce the release by sending out an email:

mailto: atlas-sw-spmb@cern.ch, hn-atlas-recoIntegration@cern.ch, atlas-dp-proc@cern.ch, atlas-trigger-operation@cern.ch, hn-atlas-releaseKitAnnounce@cern.ch
subject: Release Athena,23.0.42

Dear all,

This is to let you know that the release Athena,23.0.42 has been built from the nightly
Athena,main,2023-04-02T2101 and is in the process of being distributed to CVMFS.

The JIRA ticket can be found here:
https://its.cern.ch/jira/browse/ATLINFR-4208

The release notes are available at:
https://gitlab.cern.ch/atlas/athena/-/releases/release%252F23.0.42

Regards, myName

Manual merge of release branches

As of writing, changes to the 24.0 release branch are manually merged (~daily) into main by the main release coordinator. To ignore certain directories and files in the merge (e.g. version.txt) the atlas_git_merge.sh script should be used. The script will fetch the remote, create a local branch, merge 24.0 (but without the final commit) and reject changes to a predefined list of files (see atlas_git_merge.sh --help):

# in your atlas/athena clone:
./Build/AtlasBuildScripts/atlas_git_merge.sh 24.0
# if no conflicts are found
git commit
# if there are conflicts, resolve them and
git merge --continue
# push branch and create MR
git push origin
./Build/AtlasBuildScripts/prepare_release_notes.py --sweep -t YOUR_GITLAB_API_TOKEN_HERE

The last step creates the MR diff (release_notes.md) which will be used as the merge request description. The script will prompt if a pre-filled Draft MR should be created in GitLab. In case you had to apply conflict resolutions, you may want to amend the appropriate lines (or remove entire lines if you rejected some commits). The script will also warn you in case a MR with the sweep:ignore label was merged and provides instructions on how to remove it. The GitLab token (which must have at least API ‘scope’ or permission) is required to decorate the MRs with the domain labels and to allow the script to create the MR on your behalf.

Warning In order to preserve the commit history, do not squash the commits when finalizing the merge request.

Useful commands for resolving conflicts:

  • git merge --abort to abort the merge
  • git reset --hard upstream/main to start over with current branch
  • git [-n] revert HASH to revert a specific commit [without creating commit]
  • git checkout --ours FILE to keep version from main (or --theirs for 24.0) followed by git add FILE to mark conflict as resolved

On-demand release build

For emergencies the following procedure can be used to launch an on-demand release build. This would mainly be used in case a patch is required to an older version of the nightly branch. The procedure is very similar to building/deploying a regular nightly build, except that the release is being built from a fixed git tag.

  1. Prepare the git branch with the required patches
    • Create a new git branch with the name of the branching point, e.g. if a patch is required for release 23.0.29:
      git fetch upstream --tags
      git checkout -b 23.0.29-patches release/23.0.29
      

      In case there is already a patch release for this branch point, simply re-use the existing branch.

    • Apply the required patches to the branch and commit. In most cases you probably want to “cherry-pick” an existing merge request.
    • Warning Update the release number in version.txt to the next 4-digit number (e.g. 23.0.29.1, 23.0.29.2, etc.). In case you have been re-using a branch in step 1, check the list of already existing tags via git tag -l 'release/23.0.29.*'. This step has to be done before building the release.
    • Perform any validation that may be required in your local branch (e.g. build the affected packages and run relevant tests)
    • Push the branch to the athena repository (not your fork):
      git push upstream 23.0.29-patches
      
    • Tag the release in git (e.g. release/23.0.29.1). Make sure to explain in the “Release notes” which changes went into this tag.
    • After the tag has been created you can compare it to the original release.
  2. Use the Ondemand Jenkins job (tunnelling is necessary from outside CERN at present).
    • Log in with username “jobrest”. If you do not know the password, ask Alex Undrus.
    • Click “Build with parameters” on the left hand sidebar. Fill in the details on the screen where:
      • Executor_Label: leave it unchanged
      • project can be: Athena, AthSimulation, AnalysisBase, etc.
      • platform: x86_64-el9-gcc13-opt
      • git_tag: release/23.0.60.1
      • compiler: gcc13 (needs to match the compiler used in the nightly)
      • cmake version: 3.27.5 (>= regular nightly, check with cmake --version)
      • cuda version: 11.4 (check version in corresponding nightly via echo $CUDAXX)
      • Then click “Build” (depending on the project this can take up to 9 hours)
  3. Once finished check the ondemand build results on NICOS and note the datestamp of the build.
  4. Deploy the release to cvmfs with the usual procedure, but use:
    • nightly_name: ondemand
    • rel_nightly: the datestamp from the NICOS page of this build (e.g. 2023-05-16T2101). You can also find this datestamp in the console output of the Jenkins job from step 2.

Setting up a stable branch

To set up a production branch, first make the base release (as above) so there is a clear branching point, and so the release numbering makes sense. Next, make a branch from this release, naming it descriptively e.g. 22.0-mc20. Another acceptable option would be to name it after the base release e.g. 22.0.41.X.

You should then setup permissions for this this branch allowing only the branch managers to push and merge (look for Protected branches under Repository Settings) and have a look at other protected branches for examples).

Next, it is very important to update the release candidate version (recall that in order to make make a release from a nightly, the nightly needs to know the release it will become) e.g. if you branched from 22.0.41, you should immediately change the version.txt in all relevant projects (i.e. all, except AnalysisBase and AthAnalysis) to 22.0.41.1.

Finally, do not forget to update the README.md with details about the new branch (ideally in all branches, but certainly in the branch and main).

Tagging a point (patch) release

In productions branches we will need to make patch or point releases, e.g. 22.0.41.1

The procedure should be exactly the same as above, namely:

  • Copy the RPMs of the nightly to EOS
  • Create a Jira ticket for CVMFS installation
  • Tag the release in git
  • Update the release candidate number to the next point release i.e. 22.0.41.1 becomes 22.0.41.2

Clean build directory on CI nodes

The CI compiles code-changes in the MRs incrementally. Occasionally, the build nodes end up in a bad state and need full rebuilds to recover. Updates to the externals frequently create this problem. This can be achieved by scheduling the clean-build-dir job on atlas-sit-ci.cern.ch after logging in via SSO. The ongoing CI jobs will be allowed to complete before the cleanup. All new CI jobs will be waiting in the queue until the clean-build-dir jobs are completed on all CI nodes.

Locate the clean-build-dir job and select Build with Parameters. The default parameters (*) will run a clean up for all branches and all build nodes. This can be customized via:

  • BRANCH: As the CI runs in separate build directories for each git branch, issues are usually restricted to one branch. Selecting a specific git branch (e.g. main, 24.0) for the cleanup avoids delays in other branches.
  • BUILD_NODE: Select the CI node (e.g. aibuild64-011) in case the problem only affects one particular node.

Please announce the clean-up of the CI nodes outside of the regular weekly cleanup on the MR shifters mattermost channel at the link.

LCG and TDAQ version updates

The following sections contains brief information about how to update the LCG layer version and/or the TDAQ/TDAQ-COMMON version in a Athena release in gitlab.

LCG layer update in Athena build

The LCG layer version is configured in the build_externals.sh or in rare cases the CMakeLists.txt files in the athena/Projects/*/ subdirectories - see the gitlab directory Projects. Set the two variables LCG_VERSION_NUMBER and LCG_VERSION_POSTFIX accordingly. There are a few exceptions: not all Projects use LCG like e.g. AnalysisBase or only use LCG_VERSION_NUMBER like e.g. DetCommon.

Edit all the typically 7 relevant files by hand or use the following one-liner for are search-and-replace in your local git area before creating the MR:

cd athena/Projects
git grep -l 'b_ATLAS_9' | xargs sed -i 's/b_ATLAS_9/b_ATLAS_11/g'

Here are two example merge requests that update only a LCG layer or a combination of the LCG layer + TDAQ/TDAQ-COMMON:

  • Update to LCG layer LCG_102b_ATLAS_11 in MR 59722.
  • Full update to new LCG layer LCG_102b_ATLAS_2 and TDAQ/TDAQ-COMMON version 09-05-00 in MR 58081 + fix for DetCommon in MR 59092.

In the MR description please add a list of the updated packages and their versions and a link to the relevant SPI jira ticket like e.g. https://sft.its.cern.ch/jira/browse/SPI-2277.

Please add the gitlab labels full-build, full-unit-tests and the label RC Attention Required with a small written reminder in the MR text to the current release coordinator to trigger a CI nodes clean-up after the MR has been merged.

TDAQ and TDAQ-Common update in Athena build

The TDAQ and TDAQ-COMMON versions are configured in the CMakeLists.txt files in the athena/Projects/*/ subdirectories and are set by the two variables TDAQ-COMMON_VERSION and TDAQ_VERSION. Not all Projects actually have a TDAQ and/or TDAQ-COMMON configuration since they don’t depend on them.

Here is one example merge request that updates only TDAQ/TDAQ-COMMON:

  • Update to TDAQ and TDAQ-COMMON versions 10-00-00 in MR 59665.

Please add the same gitlab labels as mentioned in the previous section.

Update reference files

Occasionally a MR will change either the digest output, and/or the pool file output (ESD/AOD/HITs etc). If so, the reference files will need to be updated before the CI can succeed. The procedure is explained on the Frozen Tier0 Policy twiki, but briefly:

  • Text reference files are stored in the ProcTools package, and should be updated as part of the MR;
  • Pool reference files are stored on EOS (/eos/atlas/atlascerngroupdisk/data-art/grid-input/WorkflowReferences/) and then replicated to cvmfs (/cvmfs/atlas-nightlies.cern.ch/repo/data/data-art/WorkflowReferences/). To update these, you will need to increment (i.e. change _vN to _vN+1``, and then commit) the reference version in WorkflowTestRunner/python/References.py, and copy new reference pool files to approproately versioned directories in WorkflowReferences on EOS.

There is now a script available to take care of this for you, which just needs to be passed the URL to the MR CI summary page (which you find by scrolling through a MR and looking for a link labelled “Full details available on this CI monitor view”). It is suggested you run first in –test-run mode, to make sure everything is fine e.g.

update_ci_reference_files.py --test-run https://bigpanda.cern.ch/ciview/?rel=MR-62281-2023-11-10-14-55

Please note that in order for References.py and the digest summaries to be updated, you will need have the MR of the branch checked out. The script will explain, but it will involve steps like:

 $ git remote add <MR_AUTHOR> <URL_TO_FORK>
 $ git fetch <MR_AUTHOR>
 $ git switch -c <MR_BRANCH> <MR_AUTHOR>/<MR_BRANCH>
 $ git rebase upstream/main

The script will also give you the link to the Jenkins job which copies files from EOS to CVMFS. Since the MR will not succeed until the reference files are on CVMFS, you may want to trigger it manually.

Have a look at this presentation for more details on the script.