ATLAS Analysis-based Software Tutorial

Monte Carlo (MC) event generation in ATLAS uses AthGeneration, which provides a variety of generators and the detector simulation. While the generator software is available independent from ATLAS software, it is important to use it through the ATLAS interfaces to ensure consistent usage of these tools, reproducibility for other collaboration members, and consistent settings of program parameters.

There are numerous generators available for use in ATLAS, each with its own advantages and disadvantages. In many cases, new physics samples use the MadGraph5_aMC@NLO event generator (or just “MadGraph” for short).

Directory setup

The MC generation part of the tutorial consists of two parts: generation and validation. These need to be done in separate directories, using different release setups. To keep a sensible directory structure, use the following commands from your tutorial directory:

mkdir MCTutorial
cd MCTutorial
mkdir MCGeneration

For this tutorial, we’re working in an area that will be saved so that you have everything preserved in front of you and can play around for as long as you like. Normally, if you are running test jobs or a small production, you should use a responsive locally-mounted disk, like /tmp/$USER on lxplus or the scratch space available to you in the batch system. The general idea you should follow is to preserve what you need in a stable area like AFS or EOS, copy job inputs to the local area, run, and then copy outputs to stable storage again (preferring EOS for storing large outputs). Generally, AFS will be more responsive for storing software that you want to compile and run against, and EOS will be more performant (and has more space) for storing large input and output files. If you are running in that mode, don’t forget to also copy back any log files you need, or to check carefully for job failures. Incidentally, this is basically what grid jobs do as well! When in doubt, you should check if $TMPDIR is defined and use it for your jobs.

Basics of MC generation

Begin by moving to the MCGeneration directory and setting up a recent AthGeneration release. These use the 23.6 series.

tutorial/MCTutorial/MCGeneration

cd MCGeneration
setupATLAS

This will start a new shell with the container. When the new shell is open, use the following command to set up the release.

asetup AthGeneration,23.6.39

The MC generation tools make use of a jobOptions (often referred to as JOs) file that uses python to define commands for AthGeneration to execute.

JOs are used for many different tasks using ATLAS software. The format and common methods used in JOs are very procedure-specific. If you need to write JOs for a task, it is helpful to look at existing examples.

We will create the JOs to produce a pair of leptoquarks with a mass of 1000 GeV that decay to first and second generation leptons/quarks with a final state containing either two electron or two muons.

In your MC generation work area, create a directory called 100000 and the file 100000/mc.MGPy8EG_A14N23LO_LO_LQ_S1_PairProd_SameFlav_m1000.py.

mkdir 100000
touch 100000/mc.MGPy8EG_A14N23LO_LO_LQ_S1_PairProd_SameFlav_m1000.py

The 6- or 7-digit directory name (100000) is known as a DSID (Dataset Identifier). This is used as a unique numerical identifier for the specific JOs. For local testing, you can use dummy 6- or 7-digit numbers, placing exactly one JO file in each DSID directory. A unique DSID is assigned to each of your JOs in the central sample production procedure. Smaller numbers (below 500000) are reserved already, so new production generally will use larger numbers.

The JOs filename is required to follow a certain format. The string between mc. and .py is known as the DID or “physics short” and provides a succinct description of the process described by the JOs. It is required to contain information about the generator(s) and PDF(s) used and should contain other information such as the model and other properties related to the production and decay mechanism. It is also required to contain no more than 50 characters and should be unique to the sample if possible (e.g. each signal point in a scan should have different physics shorts, and differently configured top-quark production samples should have different physics shorts). The first part (MGPy8EG) indicates which tools are used in the event generation. MG refers to LO MadGraph, which is used for the matrix element calculation. (NLO MadGraph is denoted as aMC) Py8 indicates that Pythia8 is used for the parton shower and hadronization step. EG refers to EvtGen, which is an afterburner that ensures the decays of B hadrons are correctly modeled. A14NN23LO refers to the LO NNPDF2.3 PDF set used with the A14 tune. LO_LQ_S1 is the MadGraph model that is used. See if you can parse the rest to understand what physics process (production mode, decay mode, and BSM particle masses) is being simulated by this JOs.

Copy the code below into your JOs file:

# Import all of the necessary methods to use MadGraph
from MadGraphControl.MadGraphUtils import *
from MadGraphControl.MadGraph_NNPDF30NLO_Base_Fragment import *

# Some includes that are necessary to interface MadGraph with Pythia
include("Pythia8_i/Pythia8_A14_NNPDF23LO_EvtGen_Common.py")
include("Pythia8_i/Pythia8_MadGraph.py")

# Mass of the leptoquark
lq_mass = 500

# Number of events to produce
safety = 1.1 # safety factor to account for filter efficiency
nevents = runArgs.maxEvents * safety

# Make sure LQ PDG IDs are known to TestHepMC:
pdgfile = open("pdgid_extras.txt", "w+")
pdgfile.write("""
-9000005
9000005
""")
pdgfile.close()

# Here is where we define the commands that will be passed to MadGraph

# Import the LQ model
process = """
import model LO_LQ_S1
"""

# Define some multi-particle representations
process += """
define charm = c c~
define up = u u~
define q = u u~ d d~ c c~ s s~
define e = e- e+
define mu = mu- mu+
"""

# Define the physics process to be simulated
process += """
generate g g > e mu up charm
"""

# This defines the MadGraph outputs
process += """
output -f
"""

# Define the process and create the run card from a template
process_dir = new_process(process)
settings = {'ickkw': 0, 'nevents':nevents}
modify_run_card(process_dir=process_dir,runArgs=runArgs,settings=settings)

# Set some values in the param card
# BSM particle masses
masses={'9000005':lq_mass, #S1
        '1000021':1000000. } # chi10 - needed because of a bug in the model

# Leptoquark width
# This is hard-coded here, but could be calculated on the fly with a function
lq_width = 39.7887
decays={'9000005':"""DECAY 9000005  %g #leptoquark decay""" % lq_width}

# These are the couplings of the leptoquarks to first and second
# generation fermions
yuks1ll={'1   1':"""0.000000e-00 # yll1x1"""}
yuks1rr={'1   1':"""1.000000e-01 # yRR1x1"""}
yuks1rr={'2   2':"""1.000000e-01 # yRR2x2"""}

# Create the param card and modify some parameters from their default values
modify_param_card(process_dir=process_dir,params={'MASS':masses,'DECAY':decays,'YUKS1LL':yuks1ll,'YUKS1RR':yuks1rr})

# Do the event generation
generate(process_dir=process_dir,runArgs=runArgs)

# These details are important information about the JOs
evgenConfig.description = 'Single Leptoquark coupling lam1122. m_S1 = %s GeV' % (lq_mass)
evgenConfig.contact = [ "Jason Veatch <jason.veatch@cern.ch>" ]
evgenConfig.keywords += ['BSM','exotic', 'scalar', 'leptoquark']

arrange_output(process_dir=process_dir, runArgs=runArgs)

Note the lines that include the variable safety. The production system expects maxEvents to come out of the generation step before they are passed down the simulation chain. This is not always equal to the number of events the generator is instructed to produce. There can be multiple reasons for this, but the main one is filters, which are discussed later. The MadGraph step produces nevents events and the events then pass through Pythia8 and might pass through additional filters before being written into an output file. If the number of events in the output file is less than maxEvents, the job will produce an error. The safety factor makes sure that MadGraph produces enough events to account for any losses due to filters and other inefficiencies.

In the normal containers provided by setupATLAS, some text editors like emacs, nano, or pico will not work. It is generally recommended to separate the terminal in which you do your file editing from that in which you run athena. The editor vim will still work in the container.

Running a Gen_tf.py command will use the JOs to produce an events file. This is a truth-level description of the event before detector effects are taken into account.

tutorial/MCTutorial/MCGeneration

Use the following command:

Gen_tf.py --ecmEnergy=13600. \
          --maxEvents=100 \
          --randomSeed=123456 \
          --outputEVNTFile=evgen.root \
          --jobConfig=100000

Gen_tf.py is one of several transforms that Athena provides. A transform is a high-level configuration that combines multiple job option files that your job can build off of. These transforms are used heavily on the grid for official samples, and they are designed to always be run from an empty directory. That means if you run two jobs in the same directory at the same time, you might see crashes.

This will take a few minutes to run and will use the JOs in 100000 to generate 100 events at a center-of-mass energy of 13.6 TeV. A large amount of text will be written to your screen as well as log.generate. If the process runs successfully, you should see a message at the end saying:

INFO leaving with code 0: “successful run”

and a file called evgen.root in the EVNT format. You can also find the LHE-format events produced by MadGraph in a file called events.lhe.

If you rerun the Gen_tf.py command again in the same directory, it will overwrite all of the outputs.

Now let’s look at the output to see if we are generating the expected signal.

MC Signal Generation

Directory setup

Basics of MC generation