diff --git a/L1Trigger/L1CaloTrigger/test/egid_hgcal/README.md b/L1Trigger/L1CaloTrigger/test/egid_hgcal/README.md new file mode 100644 index 0000000000000..bc48bd2a7ac85 --- /dev/null +++ b/L1Trigger/L1CaloTrigger/test/egid_hgcal/README.md @@ -0,0 +1,25 @@ +# HGCal L1-Trigger e/gamma ID +This repository contains a series of scripts which are used to train the e/gamma ID BDT for the HGCal L1T. The workflow has been separated into subfolder (see below), which must be ran in the correct order. The output is two .xml files, which contain the info of the trained BDT in the low eta (1.5 < |eta| < 2.7) and high eta (2.7 < |eta| < 3.0) regions. These .xml files can then be placed directly in the HGCAL L1T TPG software. + + +## Installation +Follow the instructions for installation (users) of the [HGCAL L1T TPG software](https://twiki.cern.ch/twiki/bin/viewauth/CMS/HGCALTriggerPrimitivesSimulation#Installation_for_users) + +The next step is to clone this repository: + +``` +cd L1Trigger +git clone git@github.com:jonathon-langford/hgcal_l1t_egid +``` +In each new terminal, must set environment: `source setup.sh` + +## Contents +Each subfolder contains instructions in the form of a `README.md` which details how to run the scripts: + +* `ntuples`: contains the CMSSW config to create ntuples from the HGCAL L1T TPG software. The ntuple production uses CRAB and thus requires a grid certificate. After generating the signal ntuples, you should create the efficiency plots using `plotting/make_eff_plot.py` to check the drop in efficiency for the latest clustering algorithm. This will indicate how strongly a new e/gamma ID BDT is needed. +* `cl3d_selection`: takes as input the ntuples produced in the `ntuples` subfolder and outputs a flat ntuple of 3d clusters passing the selection criteria for training the e/gamma ID. +* `training`: scripts for training the e/gamma ID BDT, converting into .xml format, evaluating the newly-trained BDTs and summaries of the performance (working points,ROC curve) +* `plotting`: scripts to plot the efficiency curves and the 3D cluster variables. + +For the full workflow users should: `ntuples` -> `cl3d_selection` -> `training` + diff --git a/L1Trigger/L1CaloTrigger/test/egid_hgcal/cl3d_selection/README.md b/L1Trigger/L1CaloTrigger/test/egid_hgcal/cl3d_selection/README.md new file mode 100644 index 0000000000000..a161871a3af0b --- /dev/null +++ b/L1Trigger/L1CaloTrigger/test/egid_hgcal/cl3d_selection/README.md @@ -0,0 +1,29 @@ +# 3D Cluster selection +The next step is to run the cluster selection on the ntuples. To run on a single file, over all events... + +``` +python hgcal_l1t_cl3d_selection.py --sampleType electron_200PU --fileNumber +``` + +If the ntuples are not stored in the default location, then specify the path to the files using the `--inputPath` option. Also, if using a different clustering algorithm then add the long TDirectory name to the `clusteringAlgoDirDict`. + +The script matches 3D clusters to gen particles e.g. for electrons requires a gen particle with pdgID=11, with the requirement dR between the cluster and the gen particle is less than 0.2. There is no gen matching for background (neutrino). Additionally clusters are required to have a pT > 20 (10) GeV for signal (background). + +The output of this script is a flat ntuple with clusters corresponding to the input sampleType i.e. for signal you obtain an ntuple of gen-matched electron clusters, and for background you obtain an ntuple of pile-up initiated clusters. + +## Running in parallel +There is an additional script, `submit_cl3d_selection.py`, which allows you to run the cluster selection over each file in parallel. The number of files to run over can be set as an option, but the default is to run over all (specified in `totalFilesDict`). The jobs are submitted using the HTCondor batch. + +``` +python submit_cl3d_selection.py --sampleType electron_200PU +``` + +These jobs should not take long. You can specify the queue using the `--queue` option. + +## Adding the cluster ntuples: test, train and all +The final step is to combine the output flat ntuples, into test (10%), train (90%) and all (100%) samples. This has been automated by the `add_files.py` script. The script calculates the number of files in the relevant directory and combines accordingly. + +``` +python add_files.py --sampleType electron_200PU --deleteIndividualFiles 1 +``` +The `--deleteIndividualFiles` option deletes all the individual flat ntuples that have been used to create the test, train and all samples. This should be used to avoid taking up to much space. diff --git a/L1Trigger/L1CaloTrigger/test/egid_hgcal/cl3d_selection/add_files.py b/L1Trigger/L1CaloTrigger/test/egid_hgcal/cl3d_selection/add_files.py new file mode 100644 index 0000000000000..e624b425eac59 --- /dev/null +++ b/L1Trigger/L1CaloTrigger/test/egid_hgcal/cl3d_selection/add_files.py @@ -0,0 +1,57 @@ +# Python script for hadd-ing files for input sample type +# Combines files to have: full (100%), train (90%) and test (10%) + +import ROOT +import sys +import os +from optparse import OptionParser + +print "~~~~~~~~~~~~~~~~~~~~~~~~ ADD FILES ~~~~~~~~~~~~~~~~~~~~~~~~" + +def get_options(): + parser = OptionParser() + parser = OptionParser( usage="python add_files.py --sampleType=" ) + parser.add_option("--sampleType", dest='sampleType', default='electron_200PU', help="Sample to process, default signal is electron_200PU, default bkg is neutrino_200PU") + parser.add_option("--clusteringAlgo", dest="clusteringAlgo", default="Histomaxvardr", help="Clustering algorithm used in ntuple production") + parser.add_option("--deleteIndividualFiles", dest="deleteIndividualFiles", default=0, type="int", help="Delete individual files after hadding [1=yes,0=no (default)]") + return parser.parse_args() + +(opt,args) = get_options() + +#Check if directory exists: +if not os.path.isdir("./%s"%opt.sampleType): + print " --> [ERROR] Directory %s does not exist. Try running the cl3d selection first" + print "~~~~~~~~~~~~~~~~~~~~~ ADD FILES (END) ~~~~~~~~~~~~~~~~~~~~~" + sys.exit(1) + +#Get number of files in folder and separate into test and train sample sizes +N_files = len(next(os.walk('./%s'%opt.sampleType))[2]) +if N_files == 0: + print " --> [ERROR] %s is empty. No files to add"%opt.sampleType + print "~~~~~~~~~~~~~~~~~~~~~ ADD FILES (END) ~~~~~~~~~~~~~~~~~~~~~" + sys.exit(1) +N_test = int(0.1*N_files) +N_train = int(N_files-N_test) + +print " --> Making all (%g), test (%g) and train (%g) samples"%(N_files,N_test,N_train) + +#first add all files for full +os.system('mkdir %s/all'%opt.sampleType) +os.system('hadd %s/all/%s_%s_all.root %s/%s_%s_*.root'%(opt.sampleType,opt.sampleType,opt.clusteringAlgo,opt.sampleType,opt.sampleType,opt.clusteringAlgo)) + +#make test and train samples +os.system('mkdir %s/test %s/train'%(opt.sampleType,opt.sampleType)) +os.system('for fileNum in {1..%g}; do mv %s/%s_%s_${fileNum}.root %s/train; done'%(N_train,opt.sampleType,opt.sampleType,opt.clusteringAlgo,opt.sampleType)) +os.system('mv %s/%s_%s_*.root %s/test'%(opt.sampleType,opt.sampleType,opt.clusteringAlgo,opt.sampleType)) +os.system('hadd %s/%s_%s_train.root %s/train/*.root'%(opt.sampleType,opt.sampleType,opt.clusteringAlgo,opt.sampleType)) +os.system('hadd %s/%s_%s_test.root %s/test/*.root'%(opt.sampleType,opt.sampleType,opt.clusteringAlgo,opt.sampleType)) +os.system('mv %s/all/%s_%s_all.root %s/'%(opt.sampleType,opt.sampleType,opt.clusteringAlgo,opt.sampleType)) +os.system('rm -Rf %s/all'%opt.sampleType) +print " --> Successfully made files" + +if opt.deleteIndividualFiles: + print " --> Deleting individual files..." + os.system('rm -Rf %s/train'%opt.sampleType) + os.system('rm -Rf %s/test'%opt.sampleType) + +print "~~~~~~~~~~~~~~~~~~~~~ ADD FILES (END) ~~~~~~~~~~~~~~~~~~~~~" diff --git a/L1Trigger/L1CaloTrigger/test/egid_hgcal/cl3d_selection/hgcal_l1t_cl3d_selection.py b/L1Trigger/L1CaloTrigger/test/egid_hgcal/cl3d_selection/hgcal_l1t_cl3d_selection.py new file mode 100644 index 0000000000000..14fc6c69c8c66 --- /dev/null +++ b/L1Trigger/L1CaloTrigger/test/egid_hgcal/cl3d_selection/hgcal_l1t_cl3d_selection.py @@ -0,0 +1,216 @@ +# Python script for cluster selection: gen-matching + selection cuts + +import ROOT +import sys +import os +import math +from array import array +from optparse import OptionParser + +print "~~~~~~~~~~~~~~~~~~~~~~~~ Cl3D Selection ~~~~~~~~~~~~~~~~~~~~~~~~" + +def get_options(): + parser = OptionParser() + parser = OptionParser( usage="usage: python hgcal_l1t_cl3d_selection.py " ) + parser.add_option("--sampleType", dest='sampleType', default='electron_200PU', help="Sample to process, default signal is electron_200PU, default bkg is neutrino_200PU") + parser.add_option("--inputPath", dest="inputPath", default="%s/ntuples"%os.environ['HGCAL_L1T_BASE'], help="Path to directories which hold input ntuples") + parser.add_option("--fileNumber", dest="fileNumber", default=1, type="int", help="Input ntuple number") + parser.add_option("--maxEvents", dest="maxEvents", default=-1, type="int", help="Maximum number of events to process") + parser.add_option("--clusteringAlgo", dest="clusteringAlgo", default="Histomaxvardr", help="Clustering algorithm used in ntuple production") + # IF want to process with a different clustering alg: need to change the cms config script in ntuples/ directory + return parser.parse_args() + +(opt,args) = get_options() + +# Mapping to extract TDirectory for different clustering algo +# Add full chain name if using a different alg +clusteringAlgoDirDict = {"gen":"Floatingpoint8ThresholdDummyHistomaxvardrGenclustersntuple","Histomaxvardr":"Floatingpoint8ThresholdDummyHistomaxvardrGenclustersntuple","Histomaxvardr_stc":"Floatingpoint8SupertriggercellDummyHistomaxvardrClustersntuple"} + +# Mapping: sample type to pdgid used for gen matching +pdgIdDict = { + "electron":[11], + "photon":[22], + "pion":[211], + "neutrino":[] +} +pdgid = pdgIdDict[ opt.sampleType.split("_")[0] ] + +#Check: clustering exists in dir +clusteringAlgo = opt.clusteringAlgo +if clusteringAlgo not in clusteringAlgoDirDict: + print " --> [ERROR] not configured for %s clustering. Leaving..."%clusteringAlgo + print "~~~~~~~~~~~~~~~~~~~~ Cl3D Selection (END) ~~~~~~~~~~~~~~~~~~~~" + sys.exit(1) + +# Check: if ntuple exists +if os.path.exists("%s/%s/ntuple_%g.root"%(opt.inputPath,opt.sampleType,opt.fileNumber)): + print " --> Processing %s/%s/ntuple_%g.root"%(opt.inputPath,opt.sampleType,opt.fileNumber) + f_in_name = "%s/%s/ntuple_%g.root"%(opt.inputPath,opt.sampleType,opt.fileNumber) + print " --> Clustering algorithm: %s"%clusteringAlgo + print " --> Events to be processed: %g"%opt.maxEvents +else: + print " --> [ERROR] %s/%s/ntuple_%g.root does not exist. Leaving..."%(opt.inputPath,opt.sampleType,opt.fileNumber) + print "~~~~~~~~~~~~~~~~~~~~ Cl3D Selection (END) ~~~~~~~~~~~~~~~~~~~~" + sys.exit(1) + +# Open trees to read from +f_in = ROOT.TFile.Open( f_in_name ) +gen_tree = f_in.Get("%s/HGCalTriggerNtuple"%clusteringAlgoDirDict["gen"]) +cl3d_tree = f_in.Get("%s/HGCalTriggerNtuple"%clusteringAlgoDirDict[clusteringAlgo]) + +#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +# CLASS DEFINITIONS + +# class for 3d cluster variable: only initiate if cl3d passes selection +class Cluster3D: + + #Constructor method: takes event and cluster number as input + def __init__(self, _event, _ncl3d): + #initialise TLorentzVector + _p4 = ROOT.TLorentzVector() + _p4.SetPtEtaPhiE( _event.cl3d_pt[_ncl3d], _event.cl3d_eta[_ncl3d], _event.cl3d_phi[_ncl3d], _event.cl3d_energy[_ncl3d] ) + self.P4 = _p4 + self.clusters_n = _event.cl3d_clusters_n[_ncl3d] + self.showerlength = _event.cl3d_showerlength[_ncl3d] + self.coreshowerlength = _event.cl3d_coreshowerlength[_ncl3d] + self.firstlayer = _event.cl3d_firstlayer[_ncl3d] + self.maxlayer = _event.cl3d_maxlayer[_ncl3d] + self.seetot = _event.cl3d_seetot[_ncl3d] + self.seemax = _event.cl3d_seemax[_ncl3d] + self.spptot = _event.cl3d_spptot[_ncl3d] + self.sppmax = _event.cl3d_sppmax[_ncl3d] + self.szz = _event.cl3d_szz[_ncl3d] + self.srrtot = _event.cl3d_srrtot[_ncl3d] + self.srrmax = _event.cl3d_srrmax[_ncl3d] + self.srrmean = _event.cl3d_srrmean[_ncl3d] + self.emaxe = _event.cl3d_emaxe[_ncl3d] + self.bdteg = _event.cl3d_bdteg[_ncl3d] + self.quality = _event.cl3d_quality[_ncl3d] + +#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +# FUNCTION DEFINITIONS + +#function to fill tree +def fill_cl3d( _cl3d, _out_var ): + _out_var['pt'][0] = _cl3d.P4.Pt() + _out_var['eta'][0] = _cl3d.P4.Eta() + _out_var['phi'][0] = _cl3d.P4.Phi() + _out_var['clusters_n'][0] = _cl3d.clusters_n + _out_var['showerlength'][0] = _cl3d.showerlength + _out_var['coreshowerlength'][0] = _cl3d.coreshowerlength + _out_var['firstlayer'][0] = _cl3d.firstlayer + _out_var['maxlayer'][0] = _cl3d.maxlayer + _out_var['seetot'][0] = _cl3d.seetot + _out_var['seemax'][0] = _cl3d.seemax + _out_var['spptot'][0] = _cl3d.spptot + _out_var['sppmax'][0] = _cl3d.sppmax + _out_var['szz'][0] = _cl3d.szz + _out_var['srrtot'][0] = _cl3d.srrtot + _out_var['srrmax'][0] = _cl3d.srrmax + _out_var['srrmean'][0] = _cl3d.srrmean + _out_var['emaxe'][0] = _cl3d.emaxe + _out_var['bdteg'][0] = _cl3d.bdteg + _out_var['quality'][0] = _cl3d.quality + out_tree.Fill() + +#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +# CONFIGURE OUTPUT +print " --> Configuring output ntuple" + +#Check if output dir exists +if not os.path.isdir( "%s/cl3d_selection/%s"%(os.environ['HGCAL_L1T_BASE'],opt.sampleType) ): + print " --> Making directory: %s/cl3d_selection/%s"%(os.environ['HGCAL_L1T_BASE'],opt.sampleType) + os.system("mkdir %s/cl3d_selection/%s"%(os.environ['HGCAL_L1T_BASE'],opt.sampleType) ) + +# Create output file name +f_out_name = "%s/cl3d_selection/%s/%s_%s_%g.root"%(os.environ['HGCAL_L1T_BASE'],opt.sampleType,opt.sampleType,clusteringAlgo,opt.fileNumber) +print " --> Creating output file: %s"%f_out_name +f_out = ROOT.TFile( f_out_name, "RECREATE" ) + +# Initialise ttree and define output variables +sampleToTreeDict = { + "electron":"e_sig", + "photon":"g_sig", + "pion":"pu_sig", + "neutrino":"pu_bkg" +} +out_tree = ROOT.TTree( sampleToTreeDict[opt.sampleType.split("_")[0]], sampleToTreeDict[opt.sampleType.split("_")[0]] ) +out_var_names = ['pt','eta','phi','clusters_n','showerlength','coreshowerlength','firstlayer','maxlayer','seetot','seemax','spptot','sppmax','szz','srrtot','srrmax','srrmean','emaxe','bdteg','quality'] +out_var = {} +for var in out_var_names: out_var[var] = array('f',[0.]) +#Create branches in output tree +for var_name, var in out_var.iteritems(): out_tree.Branch( "cl3d_%s"%var_name, var, "cl3d_%s/F"%var_name ) + +print " --> Output configured" + +#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +# CL3D SELECTION: loop over events and write clusters passing selection to output file +for ev_idx in range(cl3d_tree.GetEntries()): + + if ev_idx == opt.maxEvents: break + if opt.maxEvents == -1: + if ev_idx % 100 == 0: print " --> Processing event: %g/%g"%(ev_idx+1,cl3d_tree.GetEntries()) + else: + if ev_idx % 100 == 0: print " --> Processing event: %g/%g"%(ev_idx+1,opt.maxEvents) + + #Extract event info from both gen and cluster tree + gen_tree.GetEntry( ev_idx ) + cl3d_tree.GetEntry( ev_idx ) + + #Extract number of gen particles + cl3d in event + N_gen = gen_tree.gen_n + N_cl3d = cl3d_tree.cl3d_n + + # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + # GEN-MATCHED CLUSTERS + if opt.sampleType.split("_")[0] in ['electron','photon','pion']: + + #Loop over gen-e/gamma in event + for gen_idx in range( N_gen ): + if abs( gen_tree.gen_pdgid[gen_idx] ) in pdgid: + #define TLorentzVector for gen particle + gen_p4 = ROOT.TLorentzVector() + gen_p4.SetPtEtaPhiE( gen_tree.gen_pt[gen_idx], gen_tree.gen_eta[gen_idx], gen_tree.gen_phi[gen_idx], gen_tree.gen_energy[gen_idx] ) + # require gen e/g/pi pT > 20 GeV + if gen_p4.Pt() < 20.: continue + + # loop overi 3d clusters: save index of max pt cluster if in + cl3d_genMatched_maxpt_idx = -1 + cl3d_genMatched_maxpt = -999 + for cl3d_idx in range( N_cl3d ): + #requre that cluster pt > 10 GeV + if cl3d_tree.cl3d_pt[cl3d_idx] < 10.: continue + #define TLorentxVector for cl3d + cl3d_p4 = ROOT.TLorentzVector() + cl3d_p4.SetPtEtaPhiE( cl3d_tree.cl3d_pt[cl3d_idx], cl3d_tree.cl3d_eta[cl3d_idx], cl3d_tree.cl3d_phi[cl3d_idx], cl3d_tree.cl3d_energy[cl3d_idx] ) + #Require cluster to be dR < 0.2 within gen particle + if cl3d_p4.DeltaR( gen_p4 ) < 0.2: + #If pT of cluster is > present max then set + if cl3d_p4.Pt() > cl3d_genMatched_maxpt: + cl3d_genMatched_maxpt = cl3d_p4.Pt() + cl3d_genMatched_maxpt_idx = cl3d_idx + + # if cl3d idx has been set then add fill cluster to tree + if cl3d_genMatched_maxpt_idx >= 0: + cl3d = Cluster3D( cl3d_tree, cl3d_genMatched_maxpt_idx ) + fill_cl3d( cl3d, out_var ) + + # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + #BACKGROUND CLUSTERS: PU + else: + + #Loop over 3d clusters: if pT > 20 GeV then fill as background + for cl3d_idx in range(0, N_cl3d ): + + if cl3d_tree.cl3d_pt[cl3d_idx] > 20.: + cl3d = Cluster3D( cl3d_tree, cl3d_idx ) + fill_cl3d( cl3d, out_var ) + + # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +# END OF EVENTS LOOP + +#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +f_out.Write() +f_out.Close() +print "~~~~~~~~~~~~~~~~~~~~ Cl3D Selection (END) ~~~~~~~~~~~~~~~~~~~~" diff --git a/L1Trigger/L1CaloTrigger/test/egid_hgcal/cl3d_selection/run.sh b/L1Trigger/L1CaloTrigger/test/egid_hgcal/cl3d_selection/run.sh new file mode 100755 index 0000000000000..0af13aad63922 --- /dev/null +++ b/L1Trigger/L1CaloTrigger/test/egid_hgcal/cl3d_selection/run.sh @@ -0,0 +1,18 @@ +#!/bin/bash + +# Set up environment +export HGCAL_L1T_BASE=$1 +cd $HGCAL_L1T_BASE + +#Script to run the HGCal L1T cluster definition +cd cl3d_selection +eval `scramv1 runtime -sh` + +#Input to cl3d selection +sample_type=$2 +input_path=$3 +file_number=$4 +clustering_algo=$5 + +# Run selection +python hgcal_l1t_cl3d_selection.py --sampleType $sample_type --inputPath $input_path --fileNumber $file_number --clusteringAlgo $clustering_algo diff --git a/L1Trigger/L1CaloTrigger/test/egid_hgcal/cl3d_selection/submit_cl3d_selection.py b/L1Trigger/L1CaloTrigger/test/egid_hgcal/cl3d_selection/submit_cl3d_selection.py new file mode 100644 index 0000000000000..f761883ea84b8 --- /dev/null +++ b/L1Trigger/L1CaloTrigger/test/egid_hgcal/cl3d_selection/submit_cl3d_selection.py @@ -0,0 +1,69 @@ +import os, sys +from optparse import OptionParser + +print "~~~~~~~~~~~~~~~~~~~~~~~~ Cl3D Selection Submission ~~~~~~~~~~~~~~~~~~~~~~~~" + +def get_options(): + parser = OptionParser() + parser.add_option('--sampleType', dest='sampleType', default='electron_200PU', help="Sample to process, default signal is electron_200PU, default bkg is neutrino_200PU" ) + parser.add_option("--inputPath", dest="inputPath", default="%s/ntuples"%os.environ['HGCAL_L1T_BASE'], help="Path to directories which hold input ntuples") + parser.add_option('--clusteringAlgo', dest='clusteringAlgo', default='Histomaxvardr', help="Clustering algorithm" ) + parser.add_option('--numberOfFiles', dest='numberOfFiles', default=-1, type="int", help="Number of files to process" ) + parser.add_option('--queue', dest='queue', default='microcentury', help="HTCondor Queue" ) + return parser.parse_args() + +(opt,args) = get_options() + +#Total number of files for each sample: use if want all files to be processes +totalFilesDict = { + "electron_0PU":22, + "electron_200PU":400, + "neutrino_200PU":2599 +} + +#Define path and +path = "%s/cl3d_selection"%os.environ['HGCAL_L1T_BASE'] +f_sub_name = "submit_%s_%s.sub"%(opt.sampleType,opt.clusteringAlgo) +sub_handle = "%s_%s_$(procID)"%(opt.sampleType,opt.clusteringAlgo) + +print " --> Submitting cl3d selection for %s"%opt.sampleType +if opt.numberOfFiles == -1: + print " --> Processing all files: %g"%totalFilesDict[opt.sampleType] + N_process = totalFilesDict[opt.sampleType] +elif opt.numberOfFiles > totalFilesDict[opt.sampleType]: + print " --> [WARNING] only %g files exist. Processing all files"%totalFilesDict[opt.sampleType] + N_process = totalFilesDict[opt.sampleType] +else: + print " --> Processing %g files"%opt.numberOfFiles + N_process = opt.numberOfFiles + +#Create condor submission file +print " --> Creating HTCondor submission file: %s"%f_sub_name +f_sub = open("%s"%f_sub_name,"w+") +f_sub.write("plusone = $(Process) + 1\n") +f_sub.write("procID = $INT(plusone,%d)\n\n") +f_sub.write("executable = %s/run.sh\n"%path) +f_sub.write("arguments = %s %s %s $(procID) %s\n"%(os.environ['HGCAL_L1T_BASE'],opt.sampleType,opt.inputPath,opt.clusteringAlgo)) +f_sub.write("output = %s/jobs/out/%s.out\n"%(path,sub_handle)) +f_sub.write("error = %s/jobs/err/%s.err\n"%(path,sub_handle)) +f_sub.write("log = %s/jobs/log/%s.log\n"%(path,sub_handle)) +f_sub.write("+JobFlavour = \"%s\"\n"%opt.queue) +f_sub.write("queue %s\n"%N_process) +f_sub.close() + +print " plusone = $(Process) + 1" +print " procID = $INT(plusone,%d)" +print " executable = %s/run.sh"%path +print " arguments = %s %s %s $(procID) %s"%(os.environ['HGCAL_L1T_BASE'],opt.sampleType,opt.inputPath,opt.clusteringAlgo) +print " output = %s/jobs/out/%s.out"%(path,sub_handle) +print " error = %s/jobs/err/%s.err"%(path,sub_handle) +print " log = %s/jobs/log/%s.log"%(path,sub_handle) +print " +JobFlavour = \"%s\""%opt.queue +print " queue %s"%N_process + +print " --> Submitting..." +os.system('condor_submit %s'%f_sub_name) +print " --> Deleting submission file" +os.system('rm %s'%f_sub_name) + +print "~~~~~~~~~~~~~~~~~~~~~ Cl3D Selection Submission (END)~~~~~~~~~~~~~~~~~~~~~~" diff --git a/L1Trigger/L1CaloTrigger/test/egid_hgcal/ntuples/README.md b/L1Trigger/L1CaloTrigger/test/egid_hgcal/ntuples/README.md new file mode 100644 index 0000000000000..72de22e45e18b --- /dev/null +++ b/L1Trigger/L1CaloTrigger/test/egid_hgcal/ntuples/README.md @@ -0,0 +1,36 @@ +# Making the ntuples + +This subfolder contains the CMSSW config script to create the signal and background ntuples for training the e/gamma ID. For standard ntuples you do not need to make changes to `hgcal_l1t_ntupliser_v9_cfg.py`. However if you wish to use additional clustering algorithms, please add to the trigger chains in [L94-96](https://github.com/jonathon-langford/hgcal_l1t_egid/blob/master/ntuples/hgcal_l1t_ntupliser_v9_cfg.py#L94-L96). + +All operations are automated with the `crab_interface.py` script. The script is currently configured for the standard signal (electron 200PU) and background (neutrino 200PU). To run over different samples you will need to add the LFN to `sampleDict`, the total number of files to `totalFilesDict` and a identifier tag to `datasetTagDict`. + +## Submission +CRAB is used to submit jobs to the grid. To run ntupliser over all samples for signal (background just replace electron_200PU with neutrino_200PU): + +``` +python crab_interface.py --mode sub --numberOfSamples -1 --sampleType electron_200PU +``` + +You need to use a storage site that you have write access for, using the option `--storageSite `. + +## Checking the status of submission +``` +python crab_interface.py --mode status --sampleType electron_200PU +``` + +If any jobs have failed then you can resubmit with... +``` +python crab_interface.py --mode resub --sampleType electron_200PU +``` + +## Extracting the ntuples +This mode should only be used when all jobs have finished. If so, then the following command will get the output ntuples and move to a directory. The default is in a directory with the name of the sampleType (e.g. electron_200PU) in the `ntuples` subfolder. To store in a user defined location then use the option `--outputPath `. + +``` +python crab_interface.py --mode extract --numberOfSamples -1 --sampleType electron_200PU +``` + +## Killing a crab submission +``` +python crab_interface.py --mode kill --sampleType electron_200PU +``` diff --git a/L1Trigger/L1CaloTrigger/test/egid_hgcal/ntuples/crab_interface.py b/L1Trigger/L1CaloTrigger/test/egid_hgcal/ntuples/crab_interface.py new file mode 100644 index 0000000000000..907bde53a23d1 --- /dev/null +++ b/L1Trigger/L1CaloTrigger/test/egid_hgcal/ntuples/crab_interface.py @@ -0,0 +1,220 @@ +# Script to automate the submission, check status, resubmission and extraction of ntuples using crab + +import os, sys +from optparse import OptionParser + +def get_options(): + parser = OptionParser() + parser.add_option('--mode', dest='mode', default='sub', help="Option: [sub,status,resub,extract,kill]") + parser.add_option('--numberOfSamples', dest='numberOfSamples', default=1, type='int', help="Number of samples to process (used for submission and extraction only)") + parser.add_option('--sampleType', dest='sampleType', default='electron_200PU', help="Sample to process, default signal is electron_200PU, default bkg is neutrino_200PU") + parser.add_option('--storageSite', dest='storageSite', default='T2_UK_London_IC', help="User storage site") + parser.add_option('--outputPath', dest='outputPath', default='cwd', help="Output path to hold directory containing ntuples") #allows user to save ntuples in eos + return parser.parse_args() + +(opt,args) = get_options() + +#Mapping of sample type to dataset: this will need to be updated if moving to new geometry (new datasets) +sampleDict = { + "electron_0PU":"/SingleE_FlatPt-2to100/PhaseIIMTDTDRAutumn18DR-NoPU_103X_upgrade2023_realistic_v2-v1/FEVT", + "electron_200PU":"/SingleE_FlatPt-2to100/PhaseIIMTDTDRAutumn18DR-PU200_103X_upgrade2023_realistic_v2-v1/FEVT", + "neutrino_200PU":"/NeutrinoGun_E_10GeV/PhaseIIMTDTDRAutumn18DR-PU200_103X_upgrade2023_realistic_v2-v1/FEVT" +} + +#Total number of files for each sample: use if want all files to be processes +totalFilesDict = { + "electron_0PU":22, + "electron_200PU":400, + "neutrino_200PU":2599 +} + +#Output dataset tag +datasetTagDict = { + "electron_0PU":"SingleElectron_FlatPt-2to100_0PU_hgcal_l1t_v9", + "electron_200PU":"SingleElectron_FlatPt-2to100_0PU_hgcal_l1t_v9", + "neutrino_200PU":"SingleNeutrino_200PU_hgcal_l1t_v9" +} + +# Catch: check using available sample type +if opt.sampleType not in sampleDict: + print " --> [ERROR] sample type %s not supported. Exiting..."%opt.sampleType + sys.exit(1) + +#determine number of files to process +if opt.numberOfSamples == -1: N_process = totalFilesDict[opt.sampleType] +elif opt.numberOfSamples > totalFilesDict[opt.sampleType]: N_process = totalFilesDict[opt.sampleType] +else: N_process = opt.numberOfSamples + + +#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~# +# SUBMISSION +if opt.mode == "sub": + print "~~~~~~~~~~~~~~~~~~~~~~~~ SUBMISSION ~~~~~~~~~~~~~~~~~~~~~~~~" + print " --> Writing crab submission file" + + #determine number of files to process + if opt.numberOfSamples == -1: print " --> Processing all samples: %g"%N_process + elif opt.numberOfSamples > totalFilesDict[opt.sampleType]: print " --> [WARNING] Trying to process %g samples. Only %g exist, processing all samples"%(opt.numberOfSamples,N_process) + else: print " --> Processing %g sample(s)"%N_process + + #write crab submission script + f_sub_name = "crab_submit_%s_cfg.py"%opt.sampleType + f_sub = open("%s"%f_sub_name, "w+") + f_sub.write("from CRABClient.UserUtilities import config\n") + f_sub.write("config = config()\n\n") + f_sub.write("config.Debug.scheddName = \'crab3@vocms0198.cern.ch\'\n\n") + f_sub.write("config.General.requestName = \'%s\'\n"%opt.sampleType) + f_sub.write("config.General.workArea = \'crab_area\'\n") + f_sub.write("config.General.transferOutputs = True\n") + f_sub.write("config.General.transferLogs = True\n\n") + f_sub.write("config.JobType.pluginName = \'Analysis\'\n") + f_sub.write("config.JobType.psetName = \'hgcal_l1t_ntupliser_v9_cfg.py\'\n") + f_sub.write("config.JobType.maxMemoryMB = 2500\n\n") + f_sub.write("config.Data.inputDataset = \'%s\'\n"%sampleDict[opt.sampleType]) + f_sub.write("config.Data.inputDBS = \'global\'\n") + f_sub.write("config.Data.splitting = \'FileBased\'\n") + f_sub.write("config.Data.unitsPerJob = 1\n") + f_sub.write("NJOBS = %g\n"%N_process) + f_sub.write("config.Data.totalUnits = config.Data.unitsPerJob * NJOBS\n") + f_sub.write("config.Data.outLFNDirBase = \'/store/user/%s/\'\n"%os.environ['USER']) + f_sub.write("config.Data.publication = True\n") + f_sub.write("config.Data.outputDatasetTag = \'%s\'\n\n"%datasetTagDict[opt.sampleType]) + f_sub.write("config.Site.storageSite = \'%s\'"%opt.storageSite) + f_sub.close() + + #Check if submission already exists: if so then ask user if want to delete previous submission + if os.path.isdir("./crab_area/crab_%s"%opt.sampleType): + delete = input(" --> Submission %s already exists. Do you want to delete previous submission [yes=1,no=0]:"%opt.sampleType) + if delete: + print " --> Deleting previous submission" + os.system("rm -Rf crab_area/crab_%s"%opt.sampleType) + else: + print " --> Keeping previous submission. Leaving..." + print "~~~~~~~~~~~~~~~~~~~~~ SUBMISSION (END) ~~~~~~~~~~~~~~~~~~~~~~" + sys.exit(1) + + #Submit file to crab server + print " --> Submitting to crab server..." + os.system("crab submit -c crab_submit_%s_cfg.py"%opt.sampleType) + print "~~~~~~~~~~~~~~~~~~~~~ SUBMISSION (END) ~~~~~~~~~~~~~~~~~~~~~~" + +#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~# +# STATUS +elif opt.mode == "status": + print "~~~~~~~~~~~~~~~~~~~~~~~~ STATUS ~~~~~~~~~~~~~~~~~~~~~~~~" + + #check if crab submission exists + if os.path.isdir("./crab_area/crab_%s"%opt.sampleType): + print " --> Checking the status of submission: %s"%opt.sampleType + os.system("crab status -d crab_area/crab_%s/"%opt.sampleType) + else: + print " --> [ERROR] No submission for %s. Use the sub option first"%opt.sampleType + print "~~~~~~~~~~~~~~~~~~~~~ STATUS (END) ~~~~~~~~~~~~~~~~~~~~~" + sys.exit(1) + print "~~~~~~~~~~~~~~~~~~~~~ STATUS (END) ~~~~~~~~~~~~~~~~~~~~~" + +#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~# +# RESUBMIT +elif opt.mode == "resub": + print "~~~~~~~~~~~~~~~~~~~~~~~~ RESUBMISSION ~~~~~~~~~~~~~~~~~~~~~~~~" + + #check if crab submission exists + if os.path.isdir("./crab_area/crab_%s"%opt.sampleType): + print " --> Re-submitting failed jobs for submission: %s"%opt.sampleType + os.system("crab resubmit -d crab_area/crab_%s/"%opt.sampleType) + else: + print " --> [ERROR] No submission for %s. Use the sub option first"%opt.sampleType + print "~~~~~~~~~~~~~~~~~~~~~ RESUBMISSION (END) ~~~~~~~~~~~~~~~~~~~~~" + sys.exit(1) + print "~~~~~~~~~~~~~~~~~~~~~ RESUBMISSION (END) ~~~~~~~~~~~~~~~~~~~~~" + +#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~# +# KILL +elif opt.mode == "kill": + print "~~~~~~~~~~~~~~~~~~~~~~~~ KILL ~~~~~~~~~~~~~~~~~~~~~~~~" + + #check if crab submission exists + if os.path.isdir("./crab_area/crab_%s"%opt.sampleType): + kill = input(" --> Are you sure you want to kill all jobs for %s [yes=1,no=0]:"%opt.sampleType) + if kill: + print " --> Killing all jobs for submission: %s"%opt.sampleType + os.system("crab kill -d crab_area/crab_%s/"%opt.sampleType) + else: + print " --> Leaving..." + print "~~~~~~~~~~~~~~~~~~~~~ KILL (END) ~~~~~~~~~~~~~~~~~~~~~" + sys.exit(1) + else: + print " --> [ERROR] No submission for %s. Use the sub option first"%opt.sampleType + print "~~~~~~~~~~~~~~~~~~~~~ KILL (END) ~~~~~~~~~~~~~~~~~~~~~" + sys.exit(1) + print "~~~~~~~~~~~~~~~~~~~~~ KILL (END) ~~~~~~~~~~~~~~~~~~~~~" + +#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~# +# EXTRACTION OF NTUPLES +elif opt.mode == "extract": + print "~~~~~~~~~~~~~~~~~~~~~~~~ EXTRACTION ~~~~~~~~~~~~~~~~~~~~~~~~" + + #check if crab submission exists + if os.path.isdir("./crab_area/crab_%s"%opt.sampleType): + extract = input(" --> Only use this mode when all crab jobs have finished. Have they finished [yes=1,no=0]:") + if extract: + print " --> Extracting ntuples for submission: %s"%opt.sampleType + #Crab only allows user to extract 500 ntuples at a time: therefore if N_process > 500 do multiple times + if N_process/500 > 0: + # FIXME: this should be parallelized as takes a long time for a lot of samples! + for jobblock in range(0,N_process/500+1): + jobids = "" + #For final block: + if jobblock == N_process/500: + for jobid in range(1,N_process%500+1): jobids += "%g,"%(500*jobblock+jobid) + else: + for jobid in range(1,501): jobids += "%g,"%(500*jobblock+jobid) + jobids = jobids[:-1] #remove last comma + os.system("crab getoutput -d crab_area/crab_%s/ --jobids %s\n"%(opt.sampleType,jobids)) + else: + os.system("crab getoutput -d crab_area/crab_%s/"%opt.sampleType) + + #If no ntuples to receive then leave + if not os.path.exists("crab_area/crab_%s/results/ntuple_1.root"%opt.sampleType): + print " --> No ntuples in folder. Leaving..." + print "~~~~~~~~~~~~~~~~~~~~~ EXTRACTION (END) ~~~~~~~~~~~~~~~~~~~~~" + sys.exit(1) + + #Check if path to place output directory exists + if opt.outputPath == "cwd": outputPath = os.environ['PWD'] + else: + #Check if path exists + if os.path.isdir( opt.outputPath ): outputPath = opt.outputPath + else: + print " --> [ERROR] path %s does not exist. Ntuples will remain in crab_area/crab_%s/results"%(opt.outputPath,opt.sampleType) + print "~~~~~~~~~~~~~~~~~~~~~ EXTRACTION (END) ~~~~~~~~~~~~~~~~~~~~~" + sys.exit(1) + + # Check if directory already exists + if os.path.isdir("%s/%s"%(outputPath,opt.sampleType)): + move = input("Output directory already exists. Do you want to move ntuples anyway [yes=1,no=0]:") + if move: + print " --> Moving ntuples to %s/%s"%(outputPath,opt.sampleType) + os.system("mv crab_area/crab_%s/results/ntuple*.root %s/%s"%(opt.sampleType,outputPath,opt.sampleType)) + else: + print "Ntuples will remain in crab_area/crab_%s/results"%opt.sampleType + # if not then make directory and move ntuples there + else: + os.system("mkdir %s/%s"%(outputPath,opt.sampleType)) + print " --> Moving ntuples to %s/%s"%(outputPath,opt.sampleType) + os.system("mv crab_area/crab_%s/results/ntuple*.root %s/%s"%(opt.sampleType,outputPath,opt.sampleType)) + + else: + print " --> Leaving" + print "~~~~~~~~~~~~~~~~~~~~~ EXTRACTION (END) ~~~~~~~~~~~~~~~~~~~~~" + sys.exit(1) + else: + print " --> [ERROR] No submission for %s. Use the sub option first"%opt.sampleType + print "~~~~~~~~~~~~~~~~~~~~~ EXTRACTION (END) ~~~~~~~~~~~~~~~~~~~~~" + sys.exit(1) + print "~~~~~~~~~~~~~~~~~~~~~ EXTRACTION (END) ~~~~~~~~~~~~~~~~~~~~~" + +#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~# +else: + print " --> [ERROR] mode %s is not supported. Please use [sub,status,resub,kill,extract]" + sys.exit(1) diff --git a/L1Trigger/L1CaloTrigger/test/egid_hgcal/ntuples/hgcal_l1t_ntupliser_v9_cfg.py b/L1Trigger/L1CaloTrigger/test/egid_hgcal/ntuples/hgcal_l1t_ntupliser_v9_cfg.py new file mode 100644 index 0000000000000..eac7b84f9067b --- /dev/null +++ b/L1Trigger/L1CaloTrigger/test/egid_hgcal/ntuples/hgcal_l1t_ntupliser_v9_cfg.py @@ -0,0 +1,136 @@ +import FWCore.ParameterSet.Config as cms +from Configuration.StandardSequences.Eras import eras +from Configuration.ProcessModifiers.convertHGCalDigisSim_cff import convertHGCalDigisSim + +# For old samples use the digi converter +process = cms.Process('DIGI',eras.Phase2C4) + +# import of standard configurations +process.load('Configuration.StandardSequences.Services_cff') +process.load('SimGeneral.HepPDTESSource.pythiapdt_cfi') +process.load('FWCore.MessageService.MessageLogger_cfi') +process.load('Configuration.EventContent.EventContent_cff') +process.load('SimGeneral.MixingModule.mixNoPU_cfi') +process.load('Configuration.Geometry.GeometryExtended2023D35Reco_cff') +process.load('Configuration.Geometry.GeometryExtended2023D35_cff') +process.load('Configuration.StandardSequences.MagneticField_cff') +process.load('Configuration.StandardSequences.Generator_cff') +process.load('IOMC.EventVertexGenerators.VtxSmearedHLLHC14TeV_cfi') +process.load('GeneratorInterface.Core.genFilterSummary_cff') +process.load('Configuration.StandardSequences.SimIdeal_cff') +process.load('Configuration.StandardSequences.Digi_cff') +process.load('Configuration.StandardSequences.SimL1Emulator_cff') +process.load('Configuration.StandardSequences.DigiToRaw_cff') +process.load('Configuration.StandardSequences.EndOfProcess_cff') +process.load('Configuration.StandardSequences.FrontierConditions_GlobalTag_cff') + +process.maxEvents = cms.untracked.PSet( + input = cms.untracked.int32(10) +) + +# Input source +process.source = cms.Source("PoolSource", + fileNames = cms.untracked.vstring('/store/mc/PhaseIIMTDTDRAutumn18DR/NeutrinoGun_E_10GeV/FEVT/PU200_103X_upgrade2023_realistic_v2-v1/280000/13B217C2-4935-CE45-BBF7-EC843AFA3D8D.root'), + inputCommands=cms.untracked.vstring( + 'keep *', + 'drop l1tEMTFHit2016Extras_simEmtfDigis_CSC_HLT', + 'drop l1tEMTFHit2016Extras_simEmtfDigis_RPC_HLT', + 'drop l1tEMTFHit2016s_simEmtfDigis__HLT', + 'drop l1tEMTFTrack2016Extras_simEmtfDigis__HLT', + 'drop l1tEMTFTrack2016s_simEmtfDigis__HLT', + ) + ) + +process.options = cms.untracked.PSet( + +) + +# Production Info +process.configurationMetadata = cms.untracked.PSet( + version = cms.untracked.string('$Revision: 1.20 $'), + annotation = cms.untracked.string('SingleElectronPt10_cfi nevts:10'), + name = cms.untracked.string('Applications') +) + +# Output definition +process.TFileService = cms.Service( + "TFileService", + fileName = cms.string("ntuple.root") + ) + +# Other statements +from Configuration.AlCa.GlobalTag import GlobalTag +process.GlobalTag = GlobalTag(process.GlobalTag, 'auto:phase2_realistic', '') + +# load HGCAL TPG simulation +process.load('L1Trigger.L1THGCal.hgcalTriggerPrimitives_cff') +process.load('L1Trigger.L1THGCalUtilities.hgcalTriggerNtuples_cff') +from L1Trigger.L1THGCalUtilities.hgcalTriggerChains import HGCalTriggerChains +import L1Trigger.L1THGCalUtilities.vfe as vfe +import L1Trigger.L1THGCalUtilities.concentrator as concentrator +import L1Trigger.L1THGCalUtilities.clustering2d as clustering2d +import L1Trigger.L1THGCalUtilities.clustering3d as clustering3d +import L1Trigger.L1THGCalUtilities.customNtuples as ntuple + + +chains = HGCalTriggerChains() +# Register algorithms +## VFE +chains.register_vfe("Floatingpoint8", lambda p : vfe.create_compression(p, 4, 4, True)) +## ECON +chains.register_concentrator("Threshold", concentrator.create_threshold) +chains.register_concentrator("Supertriggercell", concentrator.create_supertriggercell) +## BE1 +chains.register_backend1("Dummy", clustering2d.create_dummy) +## BE2 +chains.register_backend2("Histomaxvardr", lambda p,i : clustering3d.create_histoMax_variableDr(p,i)) +# Register ntuples +# Store gen info only in the reference ntuple +ntuple_list_ref = ['event', 'gen', 'multiclusters'] +ntuple_list = ['event', 'multiclusters'] +chains.register_ntuple("Genclustersntuple", lambda p,i : ntuple.create_ntuple(p,i, ntuple_list_ref)) +chains.register_ntuple("Clustersntuple", lambda p,i : ntuple.create_ntuple(p,i, ntuple_list)) + +# Register trigger chains +concentrator_algos = ['Threshold','Supertriggercell'] +backend_algos = ['Histomaxvardr'] +## Make cross product fo ECON and BE algos +import itertools +ch_idx = 0 #add gen info to first chain +for cc,be in itertools.product(concentrator_algos,backend_algos): + if ch_idx == 0: chains.register_chain('Floatingpoint8', cc, 'Dummy', be, ntuple='Genclustersntuple') + else: chains.register_chain('Floatingpoint8', cc, 'Dummy', be, ntuple='Clustersntuple') + ch_idx += 1 + +#Add chains to process +process = chains.create_sequences(process) + +# Set MinPt threshold in each chain +for ch in chains.chain: + #Concatenate elements of chain up to backend algo + ch_str = "" + for element in ch[:4]: ch_str += element + #Treat Ref3d and Histomaxvardr separaterly + if ch[3] == "Histomaxvardr": + pset = getattr(process,ch_str).ProcessorParameters.C3d_parameters.histoMax_C3d_clustering_parameters + else: + print "[ERROR] Backend alg %s not supported. Exiting..."%ch[3] + sys.exit(1) + #Set minPt threshold + pset.minPt_multicluster = cms.double(10.) + +# Remove towers from sequence +process.hgcalTriggerPrimitives.remove(process.hgcalTowerMap) +process.hgcalTriggerPrimitives.remove(process.hgcalTower) + +process.hgcl1tpg_step = cms.Path(process.hgcalTriggerPrimitives) +process.ntuple_step = cms.Path(process.hgcalTriggerNtuples) + +# Schedule definition +process.schedule = cms.Schedule(process.hgcl1tpg_step,process.ntuple_step) + +# Add early deletion of temporary data products to reduce peak memory need +from Configuration.StandardSequences.earlyDeleteSettings_cff import customiseEarlyDelete +process = customiseEarlyDelete(process) +# End adding early deletion + diff --git a/L1Trigger/L1CaloTrigger/test/egid_hgcal/plotting/README.md b/L1Trigger/L1CaloTrigger/test/egid_hgcal/plotting/README.md new file mode 100644 index 0000000000000..21331553ad62b --- /dev/null +++ b/L1Trigger/L1CaloTrigger/test/egid_hgcal/plotting/README.md @@ -0,0 +1,29 @@ +# Plotting scripts + +## Efficiency plots +Plot efficiency of the e/gamma ID as a function of generator-level pT and eta. The efficiency is defined as the number of generator level electrons with a matching 3D cluster passing the e/gamma ID working point, divided by the number of generator level electrons. The gen electrons are required to have pT > 30 GeV, and the 3D clusters with pT > 20 GeV. Note, this script only accommodates the e/gamma ID in the TPG software (not any newly-trained BDTs). + +The script takes as input the original ntuples generated in the `ntuples` subfolder: + +``` +python make_eff_plot.py --signalType electron_200PU +``` + +You can set the minimum efficiency to plot using the `--minimumEff` option (default is 0.75). The output plots will be saved in the `plots` directory. + +## 3D cluster variable plots +Plot 3d cluster variable distributions from the flat ntuples either in the `cl3d_selection` subfolder, or to plot one of the new BDT scores, in the `training/results` directory. For example, to plot the cluster p_T distribution for both signal (electron 200PU) and background (neutrino 200PU): + +``` +python make_cl3dVar_plot.py --inputMap electron_200PU,Histomaxvardr,selection,2,23,Hist:neutrino_200PU,Histomaxvardr,selection,1,23,Hist --variable pt --legendMap "electron 200PU,2,23,L+pile-up,1,23,L" +``` + +The `--inputMap` has the following format: (sampleType),(cl3d algo.),(location of ntuples: selection/evaluation),(ROOT colour),(ROOT marker),(ROOT plotting style). + +The `--legendMap` option details the entries in the legend. Separate each entry with a "+" symbol. The format is: (text),(ROOT colour),(ROOT marker),(Entry style e.g. L=line) + +To suppress output to screen use the option: `--batch 1`. + +### Examples +Example cluster p_T distribution and efficiency vs gen eta plots are stored in the `plots` directory. + diff --git a/L1Trigger/L1CaloTrigger/test/egid_hgcal/plotting/make_cl3dVar_plot.py b/L1Trigger/L1CaloTrigger/test/egid_hgcal/plotting/make_cl3dVar_plot.py new file mode 100644 index 0000000000000..e246e55864cdf --- /dev/null +++ b/L1Trigger/L1CaloTrigger/test/egid_hgcal/plotting/make_cl3dVar_plot.py @@ -0,0 +1,213 @@ +# Script to plot cl3d variable (normalised) +# > Input: selected cl3d ntuple in cl3d_selection/ or cl3d ntuple in training/results/ +# > Output: plot of vl3d variable + +import ROOT +import sys +import os +from array import array +from optparse import OptionParser + +print "~~~~~~~~~~~~~~~~~~~~~~~~ CL3D VAR PLOTTER ~~~~~~~~~~~~~~~~~~~~~~~~" +def leave(): + print "~~~~~~~~~~~~~~~~~~~~~ CL3D VAR PLOTTER (END) ~~~~~~~~~~~~~~~~~~~~~" + sys.exit(1) + +# Get options from option parser +def get_options(): + parser = OptionParser() + parser = OptionParser( usage="usage: python make_cl3dVar_plot.py " ) + parser.add_option("--inputMap", dest="inputMap", default="electron_200PU,Histomaxvardr,selection,2,23,Hist", help="List of inputs to plot, of the form: sample type,clustering algo.,ntuple stage [selection,evaluation],colour,marker style,plotting option:..." ) + parser.add_option("--variable", dest="variable", default="pt", help="Variable to plot") + parser.add_option("--legendMap", dest="legendMap", default='', help="List of elements of legend. Separate each element with '+'. Format: ,,,