MC Signal Generation

Last update: 16 Aug 2024 [History] [Edit]

Monte Carlo (MC) event generation in ATLAS uses AthGeneration, which provides a variety of generators and the detector simulation. While the generator software is available independent from ATLAS software, it is important to use it through the ATLAS interfaces to ensure consistent usage of these tools, as well as allowing access to ATLAS conditions.

There are numerous generators available for use in ATLAS, each with its own advantages and disadvantages. In most cases, however, new physics samples use the MadGraph event generator.

Directory setup

The MC generation part of the tutorial consists of two parts: generation and validation. These need to be done in separate directories, using different release setups. To keep a sensible directory structure, use the following commands from your tutorial directory:

mkdir MCTutorial
cd MCTutorial
mkdir MCGeneration

Basics of MC generation

Begin by moving to the MCGeneration directory and setting up a recent AthGeneration release. These use the 23.6 series.

tip The AthGeneration releases are currently incompatible with AlmaLinux 9, so it is necessary to use a CentOS 7 container.

cd MCGeneration
setupATLAS -c centos7
asetup AthGeneration,23.6.18

The MC generation tools make use of a jobOptions (often referred to as JOs) file that uses pseudo-python to define commands for AthGeneration to execute.

JOs are used for many different tasks using ATLAS software. The format and common methods used in JOs are very procedure-specific. If you need to write JOs for a task, it is helpful to look at existing examples.

We will create the JOs to produce a pair of leptoquarks with a mass of 1000 GeV that decay to first and second generation leptons/quarks with a final state containing either two electron or two muons.

In your MC generation work area, create a directory called 1000000 and the file 1000000/mc.MGPy8EG_A14N23LO_LO_LQ_S1_PairProd_SameFlav_m1000.py.

The 6- or 7-digit directory name (1000000) is known as a DSID (Dataset Identifier). This is used as a unique numerical identifier for the specific JOs. For local testing, you can use dummy 6- or 7-digit numbers, placing exactly one JOs in each DSID directory. A unique DSID is assigned to each of your JOs in the central sample production procedure. Smaller numbers (below 500000) are reserved already, so new production generally will use larger numbers.

The JOs filename is required to follow a certain format. The string between mc. and .py is known as the DID and provides a succinct description of the process described by the JOs. It is required to contain information about the generator(s) and PDF(s) used and should contain other information such as the model and other properties related to the production and decay mechanism. It is also required to contain no more than 50 characters. The first part (MGPy8EG) indicates which tools are used in the event generation. MG refers to LO MadGraph, which is used for the matrix element calculation. (N.B. NLO MadGraph is denoted as aMC) Py8 indicates that Pythia8 is used for the parton shower and hadronization step. EG refers to EvtGen, which is an afterburner that ensures the decays of B hadrons are correctly modeled. A14NN23LO refers to the LO NNPDF2.3 PDF set used with the A14 tune. LO_LQ_S1 is the MadGraph model that is used. See if you can parse the rest to understand what physics process (production mode, decay mode, and BSM particle masses) is being simulated by this JOs.

Copy the code below into your JOs file:

# Import all of the necessary methods to use MadGraph
from MadGraphControl.MadGraphUtils import *
from MadGraphControl.MadGraph_NNPDF30NLO_Base_Fragment import *

# Some includes that are necessary to interface MadGraph with Pythia
include("Pythia8_i/Pythia8_A14_NNPDF23LO_EvtGen_Common.py")
include("Pythia8_i/Pythia8_MadGraph.py")

# Mass of the leptoquark
lq_mass = 500. # GeV

# Number of events to produce
safety = 1.1 # safety factor to account for filter efficiency
nevents = runArgs.maxEvents * safety

# Make sure LQ PDG IDs are known to TestHepMC:
pdgfile = open("pdgid_extras.txt", "w+")
pdgfile.write("""
-9000005
9000005
""")
pdgfile.close()

# Here is where we define the commands that will be passed to MadGraph

# Import the LQ model
process = """
import model LO_LQ_S1
"""

# Define some multi-particle represenations
process += """
define charm = c c~
define up = u u~
define q = u u~ d d~ c c~ s s~
define e = e- e+
define mu = mu- mu+
"""

# Define the physics process to be simulated
process += """
generate g g > e mu up charm
"""

# This defines the MadGraph outputs
process += """
output -f
"""

# Define the process and create the run card from a template
process_dir = new_process(process)
settings = {'ickkw': 0, 'nevents':nevents}
modify_run_card(process_dir=process_dir,runArgs=runArgs,settings=settings)

# Set some values in the param card
# BSM particle masses
masses={'9000005':lq_mass, #S1
        '1000021':1000000. } # chi10 - needed because of a bug in the model

# Leptoquark width
# This is hard-coded here, but could be calculated on the fly with a function
lq_width = 39.7887
decays={'9000005':"""DECAY 9000005  %g #leptoquark decay""" % lq_width}

# These are the couplings of the leptoquarks to first and second
# generation fermions
yuks1ll={'1   1':"""0.000000e-00 # yll1x1"""}
yuks1rr={'1   1':"""1.000000e-01 # yRR1x1"""}
yuks1rr={'2   2':"""1.000000e-01 # yRR2x2"""}

# Create the param card and modify some parameters from their default values
modify_param_card(process_dir=process_dir,params={'MASS':masses,'DECAY':decays,'YUKS1LL':yuks1ll,'YUKS1RR':yuks1rr})

# Do the event generation
generate(process_dir=process_dir,runArgs=runArgs)

# These details are important information about the JOs
evgenConfig.description = 'Single Leptoquark coupling lam1122. m_S1 = %s GeV' % (lq_mass)
evgenConfig.contact = [ "Jason Veatch <jason.veatch@cern.ch>" ]
evgenConfig.keywords += ['BSM','exotic', 'scalar', 'leptoquark']

arrange_output(process_dir=process_dir, runArgs=runArgs)

tip Note the lines that include the variable safety. The production system expects maxEvents to come out of the generation step before they are passed down the simulation chain. This is not always equal to the number of events the generator is instructed to produce. There can be multiple reasons for this, but the main one is filters, which are discussed later. The generation step produces nevents events and then passes on the events that pass filters to the next simulation steps. If this number of events is less than maxEvents, the production system throws an error and exits. The safety factor makes sure the generation step produces enough events to account for any losses due to filters and otherwise.

Running a Gen_tf.py command will use the JOs to produce an events file. This is a truth-level description of the event before detector effects are taken into account. In you work area, use the following command in your MCGeneration directory:

Gen_tf.py --ecmEnergy=13600. \
          --maxEvents=100 \
          --randomSeed=123456 \
          --outputEVNTFile=evgen.root \
          --jobConfig=1000000

tip Gen_tf.py is one of several transforms that Athena provides. A transform is a high-level configuration that combines multiple job option files that your job can build off of. These transforms are used heavily on the grid for official samples, and they are designed to always be run from an empty directory. That means if you run two jobs in the same directory at the same time, you might see crashes.

This will take a few minutes to run and will use the JOs in 1000000 to generate 100 events at a center-of-mass energy of 13.6 TeV. A large amount of text will be written to your screen as well as log.generate. If the process runs successfully, you should see a message saying:

INFO leaving with code 0: “successful run”

and a file called evgen.root in the EVNT format.

If you rerun the Gen_tf.py command again in the same directory, it will overwrite all of the outputs.

Now let’s look at the output to see if we are generating the expected signal.