diff --git a/L1Trigger/L1CaloTrigger/test/egid_hgcal/README.md b/L1Trigger/L1CaloTrigger/test/egid_hgcal/README.md
new file mode 100644
index 0000000000000..bc48bd2a7ac85
--- /dev/null
+++ b/L1Trigger/L1CaloTrigger/test/egid_hgcal/README.md
@@ -0,0 +1,25 @@
+# HGCal L1-Trigger e/gamma ID
+This repository contains a series of scripts which are used to train the e/gamma ID BDT for the HGCal L1T. The workflow has been separated into subfolder (see below), which must be ran in the correct order. The output is two .xml files, which contain the info of the trained BDT in the low eta (1.5 < |eta| < 2.7) and high eta (2.7 < |eta| < 3.0) regions. These .xml files can then be placed directly in the HGCAL L1T TPG software.
+
+
+## Installation
+Follow the instructions for installation (users) of the [HGCAL L1T TPG software](https://twiki.cern.ch/twiki/bin/viewauth/CMS/HGCALTriggerPrimitivesSimulation#Installation_for_users)
+
+The next step is to clone this repository:
+
+```
+cd L1Trigger
+git clone git@github.com:jonathon-langford/hgcal_l1t_egid
+```
+In each new terminal, must set environment: `source setup.sh`
+
+## Contents
+Each subfolder contains instructions in the form of a `README.md` which details how to run the scripts:
+
+* `ntuples`: contains the CMSSW config to create ntuples from the HGCAL L1T TPG software. The ntuple production uses CRAB and thus requires a grid certificate. After generating the signal ntuples, you should create the efficiency plots using `plotting/make_eff_plot.py` to check the drop in efficiency for the latest clustering algorithm. This will indicate how strongly a new e/gamma ID BDT is needed.
+* `cl3d_selection`: takes as input the ntuples produced in the `ntuples` subfolder and outputs a flat ntuple of 3d clusters passing the selection criteria for training the e/gamma ID.
+* `training`: scripts for training the e/gamma ID BDT, converting into .xml format, evaluating the newly-trained BDTs and summaries of the performance (working points,ROC curve)
+* `plotting`: scripts to plot the efficiency curves and the 3D cluster variables.
+
+For the full workflow users should: `ntuples` -> `cl3d_selection` -> `training`
+
diff --git a/L1Trigger/L1CaloTrigger/test/egid_hgcal/cl3d_selection/README.md b/L1Trigger/L1CaloTrigger/test/egid_hgcal/cl3d_selection/README.md
new file mode 100644
index 0000000000000..a161871a3af0b
--- /dev/null
+++ b/L1Trigger/L1CaloTrigger/test/egid_hgcal/cl3d_selection/README.md
@@ -0,0 +1,29 @@
+# 3D Cluster selection
+The next step is to run the cluster selection on the ntuples. To run on a single file, over all events...
+
+```
+python hgcal_l1t_cl3d_selection.py --sampleType electron_200PU --fileNumber <number of file to process>
+```
+
+If the ntuples are not stored in the default location, then specify the path to the files using the `--inputPath` option. Also, if using a different clustering algorithm then add the long TDirectory name to the `clusteringAlgoDirDict`. 
+
+The script matches 3D clusters to gen particles e.g. for electrons requires a gen particle with pdgID=11, with the requirement dR between the cluster and the gen particle is less than 0.2. There is no gen matching for background (neutrino). Additionally clusters are required to have a pT > 20 (10) GeV for signal (background).
+
+The output of this script is a flat ntuple with clusters corresponding to the input sampleType i.e. for signal you obtain an ntuple of gen-matched electron clusters, and for background you obtain an ntuple of pile-up initiated clusters. 
+
+## Running in parallel
+There is an additional script, `submit_cl3d_selection.py`, which allows you to run the cluster selection over each file in parallel. The number of files to run over can be set as an option, but the default is to run over all (specified in `totalFilesDict`). The jobs are submitted using the HTCondor batch.
+
+```
+python submit_cl3d_selection.py --sampleType electron_200PU
+```
+
+These jobs should not take long. You can specify the queue using the `--queue` option.
+
+## Adding the cluster ntuples: test, train and all
+The final step is to combine the output flat ntuples, into test (10%), train (90%) and all (100%) samples. This has been automated by the `add_files.py` script. The script calculates the number of files in the relevant directory and combines accordingly.
+
+```
+python add_files.py --sampleType electron_200PU --deleteIndividualFiles 1
+```
+The `--deleteIndividualFiles` option deletes all the individual flat ntuples that have been used to create the test, train and all samples. This should be used to avoid taking up to much space.
diff --git a/L1Trigger/L1CaloTrigger/test/egid_hgcal/cl3d_selection/add_files.py b/L1Trigger/L1CaloTrigger/test/egid_hgcal/cl3d_selection/add_files.py
new file mode 100644
index 0000000000000..e624b425eac59
--- /dev/null
+++ b/L1Trigger/L1CaloTrigger/test/egid_hgcal/cl3d_selection/add_files.py
@@ -0,0 +1,57 @@
+# Python script for hadd-ing files for input sample type
+# Combines files to have: full (100%), train (90%) and test (10%)
+
+import ROOT
+import sys
+import os
+from optparse import OptionParser
+
+print "~~~~~~~~~~~~~~~~~~~~~~~~ ADD FILES ~~~~~~~~~~~~~~~~~~~~~~~~"
+
+def get_options():
+  parser = OptionParser()
+  parser = OptionParser( usage="python add_files.py --sampleType=<sampleType>" )
+  parser.add_option("--sampleType", dest='sampleType', default='electron_200PU', help="Sample to process, default signal is electron_200PU, default bkg is neutrino_200PU")
+  parser.add_option("--clusteringAlgo", dest="clusteringAlgo", default="Histomaxvardr", help="Clustering algorithm used in ntuple production")
+  parser.add_option("--deleteIndividualFiles", dest="deleteIndividualFiles", default=0, type="int", help="Delete individual files after hadding [1=yes,0=no (default)]")
+  return parser.parse_args()
+
+(opt,args) = get_options()
+
+#Check if directory exists:
+if not os.path.isdir("./%s"%opt.sampleType):
+  print " --> [ERROR] Directory %s does not exist. Try running the cl3d selection first"
+  print "~~~~~~~~~~~~~~~~~~~~~ ADD FILES (END) ~~~~~~~~~~~~~~~~~~~~~"
+  sys.exit(1)
+
+#Get number of files in folder and separate into test and train sample sizes
+N_files = len(next(os.walk('./%s'%opt.sampleType))[2])
+if N_files == 0:
+  print " --> [ERROR] %s is empty. No files to add"%opt.sampleType
+  print "~~~~~~~~~~~~~~~~~~~~~ ADD FILES (END) ~~~~~~~~~~~~~~~~~~~~~"
+  sys.exit(1)
+N_test = int(0.1*N_files)
+N_train = int(N_files-N_test)
+
+print " --> Making all (%g), test (%g) and train (%g) samples"%(N_files,N_test,N_train)
+
+#first add all files for full
+os.system('mkdir %s/all'%opt.sampleType)
+os.system('hadd %s/all/%s_%s_all.root %s/%s_%s_*.root'%(opt.sampleType,opt.sampleType,opt.clusteringAlgo,opt.sampleType,opt.sampleType,opt.clusteringAlgo))
+
+#make test and train samples
+os.system('mkdir %s/test %s/train'%(opt.sampleType,opt.sampleType))
+os.system('for fileNum in {1..%g}; do mv %s/%s_%s_${fileNum}.root %s/train; done'%(N_train,opt.sampleType,opt.sampleType,opt.clusteringAlgo,opt.sampleType))
+os.system('mv %s/%s_%s_*.root %s/test'%(opt.sampleType,opt.sampleType,opt.clusteringAlgo,opt.sampleType))
+os.system('hadd %s/%s_%s_train.root %s/train/*.root'%(opt.sampleType,opt.sampleType,opt.clusteringAlgo,opt.sampleType))  
+os.system('hadd %s/%s_%s_test.root %s/test/*.root'%(opt.sampleType,opt.sampleType,opt.clusteringAlgo,opt.sampleType))
+os.system('mv %s/all/%s_%s_all.root %s/'%(opt.sampleType,opt.sampleType,opt.clusteringAlgo,opt.sampleType))
+os.system('rm -Rf %s/all'%opt.sampleType)
+print " --> Successfully made files"
+
+if opt.deleteIndividualFiles:
+  print " --> Deleting individual files..."
+  os.system('rm -Rf %s/train'%opt.sampleType)
+  os.system('rm -Rf %s/test'%opt.sampleType)
+
+print "~~~~~~~~~~~~~~~~~~~~~ ADD FILES (END) ~~~~~~~~~~~~~~~~~~~~~"
diff --git a/L1Trigger/L1CaloTrigger/test/egid_hgcal/cl3d_selection/hgcal_l1t_cl3d_selection.py b/L1Trigger/L1CaloTrigger/test/egid_hgcal/cl3d_selection/hgcal_l1t_cl3d_selection.py
new file mode 100644
index 0000000000000..14fc6c69c8c66
--- /dev/null
+++ b/L1Trigger/L1CaloTrigger/test/egid_hgcal/cl3d_selection/hgcal_l1t_cl3d_selection.py
@@ -0,0 +1,216 @@
+# Python script for cluster selection: gen-matching + selection cuts
+
+import ROOT
+import sys
+import os
+import math
+from array import array
+from optparse import OptionParser
+
+print "~~~~~~~~~~~~~~~~~~~~~~~~ Cl3D Selection ~~~~~~~~~~~~~~~~~~~~~~~~"
+
+def get_options():
+  parser = OptionParser()
+  parser = OptionParser( usage="usage: python hgcal_l1t_cl3d_selection.py <options>" )
+  parser.add_option("--sampleType", dest='sampleType', default='electron_200PU', help="Sample to process, default signal is electron_200PU, default bkg is neutrino_200PU")
+  parser.add_option("--inputPath", dest="inputPath", default="%s/ntuples"%os.environ['HGCAL_L1T_BASE'], help="Path to directories which hold input ntuples")
+  parser.add_option("--fileNumber", dest="fileNumber", default=1, type="int", help="Input ntuple number")
+  parser.add_option("--maxEvents", dest="maxEvents", default=-1, type="int", help="Maximum number of events to process")
+  parser.add_option("--clusteringAlgo", dest="clusteringAlgo", default="Histomaxvardr", help="Clustering algorithm used in ntuple production")
+  # IF want to process with a different clustering alg: need to change the cms config script in ntuples/ directory
+  return parser.parse_args()
+
+(opt,args) = get_options()
+
+# Mapping to extract TDirectory for different clustering algo
+# Add full chain name if using a different alg
+clusteringAlgoDirDict = {"gen":"Floatingpoint8ThresholdDummyHistomaxvardrGenclustersntuple","Histomaxvardr":"Floatingpoint8ThresholdDummyHistomaxvardrGenclustersntuple","Histomaxvardr_stc":"Floatingpoint8SupertriggercellDummyHistomaxvardrClustersntuple"}
+
+# Mapping: sample type to pdgid used for gen matching
+pdgIdDict = {
+  "electron":[11],
+  "photon":[22],
+  "pion":[211],
+  "neutrino":[]
+}
+pdgid = pdgIdDict[ opt.sampleType.split("_")[0] ]
+
+#Check: clustering exists in dir
+clusteringAlgo = opt.clusteringAlgo
+if clusteringAlgo not in clusteringAlgoDirDict:
+  print " --> [ERROR] not configured for %s clustering. Leaving..."%clusteringAlgo  
+  print "~~~~~~~~~~~~~~~~~~~~ Cl3D Selection (END) ~~~~~~~~~~~~~~~~~~~~"
+  sys.exit(1)
+
+# Check: if ntuple exists
+if os.path.exists("%s/%s/ntuple_%g.root"%(opt.inputPath,opt.sampleType,opt.fileNumber)):
+  print " --> Processing %s/%s/ntuple_%g.root"%(opt.inputPath,opt.sampleType,opt.fileNumber)
+  f_in_name = "%s/%s/ntuple_%g.root"%(opt.inputPath,opt.sampleType,opt.fileNumber)
+  print " --> Clustering algorithm: %s"%clusteringAlgo
+  print " --> Events to be processed: %g"%opt.maxEvents
+else:
+  print " --> [ERROR] %s/%s/ntuple_%g.root does not exist. Leaving..."%(opt.inputPath,opt.sampleType,opt.fileNumber)
+  print "~~~~~~~~~~~~~~~~~~~~ Cl3D Selection (END) ~~~~~~~~~~~~~~~~~~~~"
+  sys.exit(1)
+
+# Open trees to read from
+f_in = ROOT.TFile.Open( f_in_name )
+gen_tree = f_in.Get("%s/HGCalTriggerNtuple"%clusteringAlgoDirDict["gen"])
+cl3d_tree = f_in.Get("%s/HGCalTriggerNtuple"%clusteringAlgoDirDict[clusteringAlgo])
+
+#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+# CLASS DEFINITIONS
+
+# class for 3d cluster variable: only initiate if cl3d passes selection
+class Cluster3D:
+
+  #Constructor method: takes event and cluster number as input
+  def __init__(self, _event, _ncl3d):
+    #initialise TLorentzVector
+    _p4 = ROOT.TLorentzVector()
+    _p4.SetPtEtaPhiE( _event.cl3d_pt[_ncl3d], _event.cl3d_eta[_ncl3d], _event.cl3d_phi[_ncl3d], _event.cl3d_energy[_ncl3d] )
+    self.P4 = _p4
+    self.clusters_n       = _event.cl3d_clusters_n[_ncl3d]
+    self.showerlength     = _event.cl3d_showerlength[_ncl3d]
+    self.coreshowerlength = _event.cl3d_coreshowerlength[_ncl3d]
+    self.firstlayer       = _event.cl3d_firstlayer[_ncl3d]
+    self.maxlayer         = _event.cl3d_maxlayer[_ncl3d]
+    self.seetot           = _event.cl3d_seetot[_ncl3d]
+    self.seemax           = _event.cl3d_seemax[_ncl3d]
+    self.spptot           = _event.cl3d_spptot[_ncl3d]
+    self.sppmax           = _event.cl3d_sppmax[_ncl3d]
+    self.szz              = _event.cl3d_szz[_ncl3d]
+    self.srrtot           = _event.cl3d_srrtot[_ncl3d]
+    self.srrmax           = _event.cl3d_srrmax[_ncl3d]
+    self.srrmean          = _event.cl3d_srrmean[_ncl3d]
+    self.emaxe            = _event.cl3d_emaxe[_ncl3d]
+    self.bdteg            = _event.cl3d_bdteg[_ncl3d]
+    self.quality          = _event.cl3d_quality[_ncl3d]
+
+#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+# FUNCTION DEFINITIONS
+
+#function to fill tree
+def fill_cl3d( _cl3d, _out_var ):
+  _out_var['pt'][0] = _cl3d.P4.Pt()
+  _out_var['eta'][0] = _cl3d.P4.Eta()
+  _out_var['phi'][0] = _cl3d.P4.Phi()
+  _out_var['clusters_n'][0] = _cl3d.clusters_n
+  _out_var['showerlength'][0] = _cl3d.showerlength
+  _out_var['coreshowerlength'][0] = _cl3d.coreshowerlength
+  _out_var['firstlayer'][0] = _cl3d.firstlayer
+  _out_var['maxlayer'][0] = _cl3d.maxlayer
+  _out_var['seetot'][0] = _cl3d.seetot
+  _out_var['seemax'][0] = _cl3d.seemax
+  _out_var['spptot'][0] = _cl3d.spptot
+  _out_var['sppmax'][0] = _cl3d.sppmax
+  _out_var['szz'][0] = _cl3d.szz
+  _out_var['srrtot'][0] = _cl3d.srrtot
+  _out_var['srrmax'][0] = _cl3d.srrmax
+  _out_var['srrmean'][0] = _cl3d.srrmean
+  _out_var['emaxe'][0] = _cl3d.emaxe
+  _out_var['bdteg'][0] = _cl3d.bdteg
+  _out_var['quality'][0] = _cl3d.quality
+  out_tree.Fill()
+
+#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+# CONFIGURE OUTPUT
+print " --> Configuring output ntuple"
+
+#Check if output dir exists
+if not os.path.isdir( "%s/cl3d_selection/%s"%(os.environ['HGCAL_L1T_BASE'],opt.sampleType) ):
+  print " --> Making directory: %s/cl3d_selection/%s"%(os.environ['HGCAL_L1T_BASE'],opt.sampleType)
+  os.system("mkdir %s/cl3d_selection/%s"%(os.environ['HGCAL_L1T_BASE'],opt.sampleType) )
+  
+# Create output file name
+f_out_name = "%s/cl3d_selection/%s/%s_%s_%g.root"%(os.environ['HGCAL_L1T_BASE'],opt.sampleType,opt.sampleType,clusteringAlgo,opt.fileNumber)
+print " --> Creating output file: %s"%f_out_name
+f_out = ROOT.TFile( f_out_name, "RECREATE" )
+
+# Initialise ttree and define output variables
+sampleToTreeDict = {
+  "electron":"e_sig",
+  "photon":"g_sig",
+  "pion":"pu_sig",
+  "neutrino":"pu_bkg"
+}
+out_tree = ROOT.TTree( sampleToTreeDict[opt.sampleType.split("_")[0]], sampleToTreeDict[opt.sampleType.split("_")[0]] )
+out_var_names = ['pt','eta','phi','clusters_n','showerlength','coreshowerlength','firstlayer','maxlayer','seetot','seemax','spptot','sppmax','szz','srrtot','srrmax','srrmean','emaxe','bdteg','quality']
+out_var = {}
+for var in out_var_names: out_var[var] = array('f',[0.])
+#Create branches in output tree
+for var_name, var in out_var.iteritems(): out_tree.Branch( "cl3d_%s"%var_name, var, "cl3d_%s/F"%var_name )
+
+print " --> Output configured"
+
+#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+# CL3D SELECTION: loop over events and write clusters passing selection to output file
+for ev_idx in range(cl3d_tree.GetEntries()):
+
+  if ev_idx == opt.maxEvents: break
+  if opt.maxEvents == -1:
+    if ev_idx % 100 == 0: print "    --> Processing event: %g/%g"%(ev_idx+1,cl3d_tree.GetEntries())
+  else:
+    if ev_idx % 100 == 0: print "    --> Processing event: %g/%g"%(ev_idx+1,opt.maxEvents)
+
+  #Extract event info from both gen and cluster tree
+  gen_tree.GetEntry( ev_idx )
+  cl3d_tree.GetEntry( ev_idx )
+
+  #Extract number of gen particles + cl3d in event
+  N_gen = gen_tree.gen_n
+  N_cl3d = cl3d_tree.cl3d_n
+
+  # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+  # GEN-MATCHED CLUSTERS
+  if opt.sampleType.split("_")[0] in ['electron','photon','pion']:
+
+    #Loop over gen-e/gamma in event
+    for gen_idx in range( N_gen ):
+      if abs( gen_tree.gen_pdgid[gen_idx] ) in pdgid:
+        #define TLorentzVector for gen particle
+        gen_p4 = ROOT.TLorentzVector()
+        gen_p4.SetPtEtaPhiE( gen_tree.gen_pt[gen_idx], gen_tree.gen_eta[gen_idx], gen_tree.gen_phi[gen_idx], gen_tree.gen_energy[gen_idx] )
+        # require gen e/g/pi pT > 20 GeV
+        if gen_p4.Pt() < 20.: continue
+
+        # loop overi 3d clusters: save index of max pt cluster if in 
+        cl3d_genMatched_maxpt_idx = -1
+        cl3d_genMatched_maxpt = -999
+        for cl3d_idx in range( N_cl3d ):
+          #requre that cluster pt > 10 GeV
+          if cl3d_tree.cl3d_pt[cl3d_idx] < 10.: continue
+          #define TLorentxVector for cl3d
+          cl3d_p4 = ROOT.TLorentzVector()
+          cl3d_p4.SetPtEtaPhiE( cl3d_tree.cl3d_pt[cl3d_idx], cl3d_tree.cl3d_eta[cl3d_idx], cl3d_tree.cl3d_phi[cl3d_idx], cl3d_tree.cl3d_energy[cl3d_idx] )
+          #Require cluster to be dR < 0.2 within gen particle
+          if cl3d_p4.DeltaR( gen_p4 ) < 0.2:
+            #If pT of cluster is > present max then set 
+            if cl3d_p4.Pt() > cl3d_genMatched_maxpt:
+               cl3d_genMatched_maxpt = cl3d_p4.Pt()
+               cl3d_genMatched_maxpt_idx = cl3d_idx
+
+        # if cl3d idx has been set then add fill cluster to tree
+        if cl3d_genMatched_maxpt_idx >= 0:
+          cl3d = Cluster3D( cl3d_tree, cl3d_genMatched_maxpt_idx )
+          fill_cl3d( cl3d, out_var )
+
+  # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+  #BACKGROUND CLUSTERS: PU
+  else:
+
+    #Loop over 3d clusters: if pT > 20 GeV then fill as background
+    for cl3d_idx in range(0, N_cl3d ):
+
+      if cl3d_tree.cl3d_pt[cl3d_idx] > 20.:
+        cl3d = Cluster3D( cl3d_tree, cl3d_idx )
+        fill_cl3d( cl3d, out_var )
+
+  # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+# END OF EVENTS LOOP
+
+#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+f_out.Write()
+f_out.Close()
+print "~~~~~~~~~~~~~~~~~~~~ Cl3D Selection (END) ~~~~~~~~~~~~~~~~~~~~"
diff --git a/L1Trigger/L1CaloTrigger/test/egid_hgcal/cl3d_selection/run.sh b/L1Trigger/L1CaloTrigger/test/egid_hgcal/cl3d_selection/run.sh
new file mode 100755
index 0000000000000..0af13aad63922
--- /dev/null
+++ b/L1Trigger/L1CaloTrigger/test/egid_hgcal/cl3d_selection/run.sh
@@ -0,0 +1,18 @@
+#!/bin/bash
+
+# Set up environment
+export HGCAL_L1T_BASE=$1
+cd $HGCAL_L1T_BASE
+
+#Script to run the HGCal L1T cluster definition
+cd cl3d_selection
+eval `scramv1 runtime -sh`
+
+#Input to cl3d selection
+sample_type=$2
+input_path=$3
+file_number=$4
+clustering_algo=$5
+
+# Run selection
+python hgcal_l1t_cl3d_selection.py --sampleType $sample_type --inputPath $input_path --fileNumber $file_number --clusteringAlgo $clustering_algo 
diff --git a/L1Trigger/L1CaloTrigger/test/egid_hgcal/cl3d_selection/submit_cl3d_selection.py b/L1Trigger/L1CaloTrigger/test/egid_hgcal/cl3d_selection/submit_cl3d_selection.py
new file mode 100644
index 0000000000000..f761883ea84b8
--- /dev/null
+++ b/L1Trigger/L1CaloTrigger/test/egid_hgcal/cl3d_selection/submit_cl3d_selection.py
@@ -0,0 +1,69 @@
+import os, sys
+from optparse import OptionParser
+
+print "~~~~~~~~~~~~~~~~~~~~~~~~ Cl3D Selection Submission ~~~~~~~~~~~~~~~~~~~~~~~~"
+
+def get_options():
+  parser = OptionParser()
+  parser.add_option('--sampleType', dest='sampleType', default='electron_200PU', help="Sample to process, default signal is electron_200PU, default bkg is neutrino_200PU" )
+  parser.add_option("--inputPath", dest="inputPath", default="%s/ntuples"%os.environ['HGCAL_L1T_BASE'], help="Path to directories which hold input ntuples")
+  parser.add_option('--clusteringAlgo', dest='clusteringAlgo', default='Histomaxvardr', help="Clustering algorithm" )
+  parser.add_option('--numberOfFiles', dest='numberOfFiles', default=-1, type="int", help="Number of files to process" )
+  parser.add_option('--queue', dest='queue', default='microcentury', help="HTCondor Queue" )
+  return parser.parse_args()
+
+(opt,args) = get_options()
+
+#Total number of files for each sample: use if want all files to be processes
+totalFilesDict = {
+  "electron_0PU":22,
+  "electron_200PU":400,
+  "neutrino_200PU":2599
+}
+
+#Define path and 
+path = "%s/cl3d_selection"%os.environ['HGCAL_L1T_BASE']
+f_sub_name = "submit_%s_%s.sub"%(opt.sampleType,opt.clusteringAlgo)
+sub_handle = "%s_%s_$(procID)"%(opt.sampleType,opt.clusteringAlgo)
+
+print " --> Submitting cl3d selection for %s"%opt.sampleType
+if opt.numberOfFiles == -1:
+  print " --> Processing all files: %g"%totalFilesDict[opt.sampleType]
+  N_process = totalFilesDict[opt.sampleType]
+elif opt.numberOfFiles > totalFilesDict[opt.sampleType]:
+  print " --> [WARNING] only %g files exist. Processing all files"%totalFilesDict[opt.sampleType]
+  N_process = totalFilesDict[opt.sampleType]
+else:
+  print " --> Processing %g files"%opt.numberOfFiles
+  N_process = opt.numberOfFiles
+
+#Create condor submission file
+print " --> Creating HTCondor submission file: %s"%f_sub_name
+f_sub = open("%s"%f_sub_name,"w+")
+f_sub.write("plusone = $(Process) + 1\n")
+f_sub.write("procID = $INT(plusone,%d)\n\n")
+f_sub.write("executable          = %s/run.sh\n"%path)
+f_sub.write("arguments           = %s %s %s $(procID) %s\n"%(os.environ['HGCAL_L1T_BASE'],opt.sampleType,opt.inputPath,opt.clusteringAlgo))
+f_sub.write("output              = %s/jobs/out/%s.out\n"%(path,sub_handle))
+f_sub.write("error               = %s/jobs/err/%s.err\n"%(path,sub_handle))
+f_sub.write("log                 = %s/jobs/log/%s.log\n"%(path,sub_handle))
+f_sub.write("+JobFlavour         = \"%s\"\n"%opt.queue)
+f_sub.write("queue %s\n"%N_process)
+f_sub.close()
+
+print "      plusone = $(Process) + 1"
+print "      procID = $INT(plusone,%d)"
+print "      executable          = %s/run.sh"%path
+print "      arguments           = %s %s %s $(procID) %s"%(os.environ['HGCAL_L1T_BASE'],opt.sampleType,opt.inputPath,opt.clusteringAlgo)
+print "      output              = %s/jobs/out/%s.out"%(path,sub_handle)
+print "      error               = %s/jobs/err/%s.err"%(path,sub_handle)
+print "      log                 = %s/jobs/log/%s.log"%(path,sub_handle)
+print "      +JobFlavour         = \"%s\""%opt.queue
+print "      queue %s"%N_process
+
+print " --> Submitting..."
+os.system('condor_submit %s'%f_sub_name)
+print " --> Deleting submission file"
+os.system('rm %s'%f_sub_name)
+
+print "~~~~~~~~~~~~~~~~~~~~~ Cl3D Selection Submission (END)~~~~~~~~~~~~~~~~~~~~~~"
diff --git a/L1Trigger/L1CaloTrigger/test/egid_hgcal/ntuples/README.md b/L1Trigger/L1CaloTrigger/test/egid_hgcal/ntuples/README.md
new file mode 100644
index 0000000000000..72de22e45e18b
--- /dev/null
+++ b/L1Trigger/L1CaloTrigger/test/egid_hgcal/ntuples/README.md
@@ -0,0 +1,36 @@
+# Making the ntuples
+
+This subfolder contains the CMSSW config script to create the signal and background ntuples for training the e/gamma ID. For standard ntuples you do not need to make changes to `hgcal_l1t_ntupliser_v9_cfg.py`. However if you wish to use additional clustering algorithms, please add to the trigger chains in [L94-96](https://github.com/jonathon-langford/hgcal_l1t_egid/blob/master/ntuples/hgcal_l1t_ntupliser_v9_cfg.py#L94-L96).
+
+All operations are automated with the `crab_interface.py` script. The script is currently configured for the standard signal (electron 200PU) and background (neutrino 200PU). To run over different samples you will need to add the LFN to `sampleDict`, the total number of files to `totalFilesDict` and a identifier tag to `datasetTagDict`.
+
+## Submission
+CRAB is used to submit jobs to the grid. To run ntupliser over all samples for signal (background just replace electron_200PU with neutrino_200PU):
+
+```
+python crab_interface.py --mode sub --numberOfSamples -1 --sampleType electron_200PU
+```
+
+You need to use a storage site that you have write access for, using the option `--storageSite <Name of site e.g. T2_UK_London_IC'>`.
+
+## Checking the status of submission
+```
+python crab_interface.py --mode status --sampleType electron_200PU
+```
+
+If any jobs have failed then you can resubmit with...
+```
+python crab_interface.py --mode resub --sampleType electron_200PU
+```
+
+## Extracting the ntuples
+This mode should only be used when all jobs have finished. If so, then the following command will get the output ntuples and move to a directory. The default is in a directory with the name of the sampleType (e.g. electron_200PU) in the `ntuples` subfolder. To store in a user defined location then use the option `--outputPath <path to directory>`.
+
+```
+python crab_interface.py --mode extract --numberOfSamples -1 --sampleType electron_200PU
+```
+
+## Killing a crab submission
+```
+python crab_interface.py --mode kill --sampleType electron_200PU
+```
diff --git a/L1Trigger/L1CaloTrigger/test/egid_hgcal/ntuples/crab_interface.py b/L1Trigger/L1CaloTrigger/test/egid_hgcal/ntuples/crab_interface.py
new file mode 100644
index 0000000000000..907bde53a23d1
--- /dev/null
+++ b/L1Trigger/L1CaloTrigger/test/egid_hgcal/ntuples/crab_interface.py
@@ -0,0 +1,220 @@
+# Script to automate the submission, check status, resubmission and extraction of ntuples using crab
+
+import os, sys
+from optparse import OptionParser
+
+def get_options():
+  parser = OptionParser()
+  parser.add_option('--mode', dest='mode', default='sub', help="Option: [sub,status,resub,extract,kill]")
+  parser.add_option('--numberOfSamples', dest='numberOfSamples', default=1, type='int', help="Number of samples to process (used for submission and extraction only)")
+  parser.add_option('--sampleType', dest='sampleType', default='electron_200PU', help="Sample to process, default signal is electron_200PU, default bkg is neutrino_200PU")
+  parser.add_option('--storageSite', dest='storageSite', default='T2_UK_London_IC', help="User storage site")
+  parser.add_option('--outputPath', dest='outputPath', default='cwd', help="Output path to hold directory containing ntuples") #allows user to save ntuples in eos
+  return parser.parse_args()
+
+(opt,args) = get_options()
+
+#Mapping of sample type to dataset: this will need to be updated if moving to new geometry (new datasets)
+sampleDict = {
+  "electron_0PU":"/SingleE_FlatPt-2to100/PhaseIIMTDTDRAutumn18DR-NoPU_103X_upgrade2023_realistic_v2-v1/FEVT",
+  "electron_200PU":"/SingleE_FlatPt-2to100/PhaseIIMTDTDRAutumn18DR-PU200_103X_upgrade2023_realistic_v2-v1/FEVT",
+  "neutrino_200PU":"/NeutrinoGun_E_10GeV/PhaseIIMTDTDRAutumn18DR-PU200_103X_upgrade2023_realistic_v2-v1/FEVT"
+}
+
+#Total number of files for each sample: use if want all files to be processes
+totalFilesDict = {
+  "electron_0PU":22,
+  "electron_200PU":400,
+  "neutrino_200PU":2599
+}
+
+#Output dataset tag
+datasetTagDict = {
+  "electron_0PU":"SingleElectron_FlatPt-2to100_0PU_hgcal_l1t_v9",
+  "electron_200PU":"SingleElectron_FlatPt-2to100_0PU_hgcal_l1t_v9",
+  "neutrino_200PU":"SingleNeutrino_200PU_hgcal_l1t_v9"
+}
+
+# Catch: check using available sample type
+if opt.sampleType not in sampleDict: 
+  print " --> [ERROR] sample type %s not supported. Exiting..."%opt.sampleType
+  sys.exit(1)
+
+#determine number of files to process
+if opt.numberOfSamples == -1: N_process = totalFilesDict[opt.sampleType]
+elif opt.numberOfSamples > totalFilesDict[opt.sampleType]: N_process = totalFilesDict[opt.sampleType]
+else: N_process = opt.numberOfSamples
+
+
+#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#
+# SUBMISSION
+if opt.mode == "sub":
+  print "~~~~~~~~~~~~~~~~~~~~~~~~ SUBMISSION ~~~~~~~~~~~~~~~~~~~~~~~~"
+  print " --> Writing crab submission file"
+  
+  #determine number of files to process
+  if opt.numberOfSamples == -1: print " --> Processing all samples: %g"%N_process
+  elif opt.numberOfSamples > totalFilesDict[opt.sampleType]: print " --> [WARNING] Trying to process %g samples. Only %g exist, processing all samples"%(opt.numberOfSamples,N_process)
+  else: print " --> Processing %g sample(s)"%N_process
+
+  #write crab submission script
+  f_sub_name = "crab_submit_%s_cfg.py"%opt.sampleType
+  f_sub = open("%s"%f_sub_name, "w+")
+  f_sub.write("from CRABClient.UserUtilities import config\n")
+  f_sub.write("config = config()\n\n")
+  f_sub.write("config.Debug.scheddName = \'crab3@vocms0198.cern.ch\'\n\n")
+  f_sub.write("config.General.requestName = \'%s\'\n"%opt.sampleType)
+  f_sub.write("config.General.workArea = \'crab_area\'\n")
+  f_sub.write("config.General.transferOutputs = True\n")
+  f_sub.write("config.General.transferLogs = True\n\n")
+  f_sub.write("config.JobType.pluginName = \'Analysis\'\n")
+  f_sub.write("config.JobType.psetName = \'hgcal_l1t_ntupliser_v9_cfg.py\'\n")
+  f_sub.write("config.JobType.maxMemoryMB = 2500\n\n")
+  f_sub.write("config.Data.inputDataset = \'%s\'\n"%sampleDict[opt.sampleType])
+  f_sub.write("config.Data.inputDBS = \'global\'\n")
+  f_sub.write("config.Data.splitting = \'FileBased\'\n")
+  f_sub.write("config.Data.unitsPerJob = 1\n")
+  f_sub.write("NJOBS = %g\n"%N_process)
+  f_sub.write("config.Data.totalUnits = config.Data.unitsPerJob * NJOBS\n")
+  f_sub.write("config.Data.outLFNDirBase = \'/store/user/%s/\'\n"%os.environ['USER'])
+  f_sub.write("config.Data.publication = True\n")
+  f_sub.write("config.Data.outputDatasetTag = \'%s\'\n\n"%datasetTagDict[opt.sampleType])
+  f_sub.write("config.Site.storageSite = \'%s\'"%opt.storageSite)
+  f_sub.close()
+
+  #Check if submission already exists: if so then ask user if want to delete previous submission
+  if os.path.isdir("./crab_area/crab_%s"%opt.sampleType):
+    delete = input(" --> Submission %s already exists. Do you want to delete previous submission [yes=1,no=0]:"%opt.sampleType)
+    if delete:
+      print " --> Deleting previous submission"
+      os.system("rm -Rf crab_area/crab_%s"%opt.sampleType)
+    else:
+      print " --> Keeping previous submission. Leaving..."
+      print "~~~~~~~~~~~~~~~~~~~~~ SUBMISSION (END) ~~~~~~~~~~~~~~~~~~~~~~"
+      sys.exit(1)
+
+  #Submit file to crab server
+  print " --> Submitting to crab server..."
+  os.system("crab submit -c crab_submit_%s_cfg.py"%opt.sampleType)
+  print "~~~~~~~~~~~~~~~~~~~~~ SUBMISSION (END) ~~~~~~~~~~~~~~~~~~~~~~"
+
+#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#
+# STATUS
+elif opt.mode == "status":
+  print "~~~~~~~~~~~~~~~~~~~~~~~~ STATUS ~~~~~~~~~~~~~~~~~~~~~~~~"
+  
+  #check if crab submission exists
+  if os.path.isdir("./crab_area/crab_%s"%opt.sampleType):
+    print " --> Checking the status of submission: %s"%opt.sampleType
+    os.system("crab status -d crab_area/crab_%s/"%opt.sampleType) 
+  else:
+    print " --> [ERROR] No submission for %s. Use the sub option first"%opt.sampleType
+    print "~~~~~~~~~~~~~~~~~~~~~ STATUS (END) ~~~~~~~~~~~~~~~~~~~~~"
+    sys.exit(1)
+  print "~~~~~~~~~~~~~~~~~~~~~ STATUS (END) ~~~~~~~~~~~~~~~~~~~~~"
+
+#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#
+# RESUBMIT
+elif opt.mode == "resub":
+  print "~~~~~~~~~~~~~~~~~~~~~~~~ RESUBMISSION ~~~~~~~~~~~~~~~~~~~~~~~~"
+  
+  #check if crab submission exists
+  if os.path.isdir("./crab_area/crab_%s"%opt.sampleType):
+    print " --> Re-submitting failed jobs for submission: %s"%opt.sampleType
+    os.system("crab resubmit -d crab_area/crab_%s/"%opt.sampleType) 
+  else:
+    print " --> [ERROR] No submission for %s. Use the sub option first"%opt.sampleType
+    print "~~~~~~~~~~~~~~~~~~~~~ RESUBMISSION (END) ~~~~~~~~~~~~~~~~~~~~~"
+    sys.exit(1)
+  print "~~~~~~~~~~~~~~~~~~~~~ RESUBMISSION (END) ~~~~~~~~~~~~~~~~~~~~~"
+
+#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#
+# KILL
+elif opt.mode == "kill":
+  print "~~~~~~~~~~~~~~~~~~~~~~~~ KILL ~~~~~~~~~~~~~~~~~~~~~~~~"
+  
+  #check if crab submission exists
+  if os.path.isdir("./crab_area/crab_%s"%opt.sampleType):
+    kill = input(" --> Are you sure you want to kill all jobs for %s [yes=1,no=0]:"%opt.sampleType)
+    if kill: 
+      print " --> Killing all jobs for submission: %s"%opt.sampleType
+      os.system("crab kill -d crab_area/crab_%s/"%opt.sampleType)
+    else:
+      print " --> Leaving..."
+      print "~~~~~~~~~~~~~~~~~~~~~ KILL (END) ~~~~~~~~~~~~~~~~~~~~~"
+      sys.exit(1)
+  else:
+    print " --> [ERROR] No submission for %s. Use the sub option first"%opt.sampleType
+    print "~~~~~~~~~~~~~~~~~~~~~ KILL (END) ~~~~~~~~~~~~~~~~~~~~~"
+    sys.exit(1)
+  print "~~~~~~~~~~~~~~~~~~~~~ KILL (END) ~~~~~~~~~~~~~~~~~~~~~"
+
+#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#
+# EXTRACTION OF NTUPLES
+elif opt.mode == "extract":
+  print "~~~~~~~~~~~~~~~~~~~~~~~~ EXTRACTION ~~~~~~~~~~~~~~~~~~~~~~~~"
+  
+  #check if crab submission exists
+  if os.path.isdir("./crab_area/crab_%s"%opt.sampleType):
+    extract = input(" --> Only use this mode when all crab jobs have finished. Have they finished [yes=1,no=0]:")
+    if extract:
+      print " --> Extracting ntuples for submission: %s"%opt.sampleType
+      #Crab only allows user to extract 500 ntuples at a time: therefore if N_process > 500 do multiple times
+      if N_process/500 > 0:
+        # FIXME: this should be parallelized as takes a long time for a lot of samples!
+        for jobblock in range(0,N_process/500+1):
+          jobids = ""
+          #For final block:
+          if jobblock == N_process/500:
+            for jobid in range(1,N_process%500+1): jobids += "%g,"%(500*jobblock+jobid)
+          else:
+            for jobid in range(1,501): jobids += "%g,"%(500*jobblock+jobid)
+          jobids = jobids[:-1] #remove last comma
+          os.system("crab getoutput -d crab_area/crab_%s/ --jobids %s\n"%(opt.sampleType,jobids))
+      else: 
+        os.system("crab getoutput -d crab_area/crab_%s/"%opt.sampleType)
+
+      #If no ntuples to receive then leave
+      if not os.path.exists("crab_area/crab_%s/results/ntuple_1.root"%opt.sampleType):
+        print " --> No ntuples in folder. Leaving..."
+        print "~~~~~~~~~~~~~~~~~~~~~ EXTRACTION (END) ~~~~~~~~~~~~~~~~~~~~~"  
+        sys.exit(1) 
+
+      #Check if path to place output directory exists
+      if opt.outputPath == "cwd": outputPath = os.environ['PWD']
+      else: 
+        #Check if path exists
+        if os.path.isdir( opt.outputPath ): outputPath = opt.outputPath
+        else:
+          print " --> [ERROR] path %s does not exist. Ntuples will remain in crab_area/crab_%s/results"%(opt.outputPath,opt.sampleType)
+          print "~~~~~~~~~~~~~~~~~~~~~ EXTRACTION (END) ~~~~~~~~~~~~~~~~~~~~~"  
+          sys.exit(1)
+
+      # Check if directory already exists
+      if os.path.isdir("%s/%s"%(outputPath,opt.sampleType)):
+        move = input("Output directory already exists. Do you want to move ntuples anyway [yes=1,no=0]:")
+        if move:
+          print " --> Moving ntuples to %s/%s"%(outputPath,opt.sampleType)
+          os.system("mv crab_area/crab_%s/results/ntuple*.root %s/%s"%(opt.sampleType,outputPath,opt.sampleType))
+        else:
+          print "Ntuples will remain in crab_area/crab_%s/results"%opt.sampleType
+      # if not then make directory and move ntuples there
+      else:
+        os.system("mkdir %s/%s"%(outputPath,opt.sampleType))
+        print " --> Moving ntuples to %s/%s"%(outputPath,opt.sampleType)
+        os.system("mv crab_area/crab_%s/results/ntuple*.root %s/%s"%(opt.sampleType,outputPath,opt.sampleType))
+        
+    else:
+      print " --> Leaving"
+      print "~~~~~~~~~~~~~~~~~~~~~ EXTRACTION (END) ~~~~~~~~~~~~~~~~~~~~~"  
+      sys.exit(1)
+  else:
+    print " --> [ERROR] No submission for %s. Use the sub option first"%opt.sampleType
+    print "~~~~~~~~~~~~~~~~~~~~~ EXTRACTION (END) ~~~~~~~~~~~~~~~~~~~~~"
+    sys.exit(1)
+  print "~~~~~~~~~~~~~~~~~~~~~ EXTRACTION (END) ~~~~~~~~~~~~~~~~~~~~~"
+
+#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#
+else:
+  print " --> [ERROR] mode %s is not supported. Please use [sub,status,resub,kill,extract]"
+  sys.exit(1)
diff --git a/L1Trigger/L1CaloTrigger/test/egid_hgcal/ntuples/hgcal_l1t_ntupliser_v9_cfg.py b/L1Trigger/L1CaloTrigger/test/egid_hgcal/ntuples/hgcal_l1t_ntupliser_v9_cfg.py
new file mode 100644
index 0000000000000..eac7b84f9067b
--- /dev/null
+++ b/L1Trigger/L1CaloTrigger/test/egid_hgcal/ntuples/hgcal_l1t_ntupliser_v9_cfg.py
@@ -0,0 +1,136 @@
+import FWCore.ParameterSet.Config as cms 
+from Configuration.StandardSequences.Eras import eras
+from Configuration.ProcessModifiers.convertHGCalDigisSim_cff import convertHGCalDigisSim
+
+# For old samples use the digi converter
+process = cms.Process('DIGI',eras.Phase2C4)
+
+# import of standard configurations
+process.load('Configuration.StandardSequences.Services_cff')
+process.load('SimGeneral.HepPDTESSource.pythiapdt_cfi')
+process.load('FWCore.MessageService.MessageLogger_cfi')
+process.load('Configuration.EventContent.EventContent_cff')
+process.load('SimGeneral.MixingModule.mixNoPU_cfi')
+process.load('Configuration.Geometry.GeometryExtended2023D35Reco_cff')
+process.load('Configuration.Geometry.GeometryExtended2023D35_cff')
+process.load('Configuration.StandardSequences.MagneticField_cff')
+process.load('Configuration.StandardSequences.Generator_cff')
+process.load('IOMC.EventVertexGenerators.VtxSmearedHLLHC14TeV_cfi')
+process.load('GeneratorInterface.Core.genFilterSummary_cff')
+process.load('Configuration.StandardSequences.SimIdeal_cff')
+process.load('Configuration.StandardSequences.Digi_cff')
+process.load('Configuration.StandardSequences.SimL1Emulator_cff')
+process.load('Configuration.StandardSequences.DigiToRaw_cff')
+process.load('Configuration.StandardSequences.EndOfProcess_cff')
+process.load('Configuration.StandardSequences.FrontierConditions_GlobalTag_cff')
+
+process.maxEvents = cms.untracked.PSet(
+    input = cms.untracked.int32(10)
+)
+
+# Input source
+process.source = cms.Source("PoolSource",
+       fileNames = cms.untracked.vstring('/store/mc/PhaseIIMTDTDRAutumn18DR/NeutrinoGun_E_10GeV/FEVT/PU200_103X_upgrade2023_realistic_v2-v1/280000/13B217C2-4935-CE45-BBF7-EC843AFA3D8D.root'),
+       inputCommands=cms.untracked.vstring(
+           'keep *',
+           'drop l1tEMTFHit2016Extras_simEmtfDigis_CSC_HLT',
+           'drop l1tEMTFHit2016Extras_simEmtfDigis_RPC_HLT',
+           'drop l1tEMTFHit2016s_simEmtfDigis__HLT',
+           'drop l1tEMTFTrack2016Extras_simEmtfDigis__HLT',
+           'drop l1tEMTFTrack2016s_simEmtfDigis__HLT',
+           )
+       )
+
+process.options = cms.untracked.PSet(
+
+)
+
+# Production Info
+process.configurationMetadata = cms.untracked.PSet(
+    version = cms.untracked.string('$Revision: 1.20 $'),
+    annotation = cms.untracked.string('SingleElectronPt10_cfi nevts:10'),
+    name = cms.untracked.string('Applications')
+)
+
+# Output definition
+process.TFileService = cms.Service(
+    "TFileService",
+    fileName = cms.string("ntuple.root")
+    )
+
+# Other statements
+from Configuration.AlCa.GlobalTag import GlobalTag
+process.GlobalTag = GlobalTag(process.GlobalTag, 'auto:phase2_realistic', '')
+
+# load HGCAL TPG simulation
+process.load('L1Trigger.L1THGCal.hgcalTriggerPrimitives_cff')
+process.load('L1Trigger.L1THGCalUtilities.hgcalTriggerNtuples_cff')
+from L1Trigger.L1THGCalUtilities.hgcalTriggerChains import HGCalTriggerChains
+import L1Trigger.L1THGCalUtilities.vfe as vfe
+import L1Trigger.L1THGCalUtilities.concentrator as concentrator
+import L1Trigger.L1THGCalUtilities.clustering2d as clustering2d
+import L1Trigger.L1THGCalUtilities.clustering3d as clustering3d
+import L1Trigger.L1THGCalUtilities.customNtuples as ntuple
+
+
+chains = HGCalTriggerChains()
+# Register algorithms
+## VFE
+chains.register_vfe("Floatingpoint8", lambda p : vfe.create_compression(p, 4, 4, True))
+## ECON
+chains.register_concentrator("Threshold", concentrator.create_threshold)
+chains.register_concentrator("Supertriggercell", concentrator.create_supertriggercell)
+## BE1
+chains.register_backend1("Dummy", clustering2d.create_dummy)
+## BE2
+chains.register_backend2("Histomaxvardr", lambda p,i : clustering3d.create_histoMax_variableDr(p,i))
+# Register ntuples
+# Store gen info only in the reference ntuple
+ntuple_list_ref = ['event', 'gen', 'multiclusters']
+ntuple_list = ['event', 'multiclusters']
+chains.register_ntuple("Genclustersntuple", lambda p,i : ntuple.create_ntuple(p,i, ntuple_list_ref))
+chains.register_ntuple("Clustersntuple", lambda p,i : ntuple.create_ntuple(p,i, ntuple_list))
+
+# Register trigger chains
+concentrator_algos = ['Threshold','Supertriggercell']
+backend_algos = ['Histomaxvardr']
+## Make cross product fo ECON and BE algos
+import itertools
+ch_idx = 0 #add gen info to first chain 
+for cc,be in itertools.product(concentrator_algos,backend_algos):
+    if ch_idx == 0: chains.register_chain('Floatingpoint8', cc, 'Dummy', be, ntuple='Genclustersntuple')
+    else: chains.register_chain('Floatingpoint8', cc, 'Dummy', be, ntuple='Clustersntuple')
+    ch_idx += 1
+
+#Add chains to process
+process = chains.create_sequences(process)
+
+# Set MinPt threshold in each chain
+for ch in chains.chain:
+  #Concatenate elements of chain up to backend algo
+  ch_str = ""
+  for element in ch[:4]: ch_str += element
+  #Treat Ref3d and Histomaxvardr separaterly
+  if ch[3] == "Histomaxvardr": 
+    pset = getattr(process,ch_str).ProcessorParameters.C3d_parameters.histoMax_C3d_clustering_parameters
+  else: 
+    print "[ERROR] Backend alg %s not supported. Exiting..."%ch[3]
+    sys.exit(1)
+  #Set minPt threshold
+  pset.minPt_multicluster = cms.double(10.)
+
+# Remove towers from sequence
+process.hgcalTriggerPrimitives.remove(process.hgcalTowerMap)
+process.hgcalTriggerPrimitives.remove(process.hgcalTower)
+
+process.hgcl1tpg_step = cms.Path(process.hgcalTriggerPrimitives)
+process.ntuple_step = cms.Path(process.hgcalTriggerNtuples)
+
+# Schedule definition
+process.schedule = cms.Schedule(process.hgcl1tpg_step,process.ntuple_step)
+
+# Add early deletion of temporary data products to reduce peak memory need
+from Configuration.StandardSequences.earlyDeleteSettings_cff import customiseEarlyDelete
+process = customiseEarlyDelete(process)
+# End adding early deletion
+
diff --git a/L1Trigger/L1CaloTrigger/test/egid_hgcal/plotting/README.md b/L1Trigger/L1CaloTrigger/test/egid_hgcal/plotting/README.md
new file mode 100644
index 0000000000000..21331553ad62b
--- /dev/null
+++ b/L1Trigger/L1CaloTrigger/test/egid_hgcal/plotting/README.md
@@ -0,0 +1,29 @@
+# Plotting scripts
+
+## Efficiency plots
+Plot efficiency of the e/gamma ID as a function of generator-level pT and eta. The efficiency is defined as the number of generator level electrons with a matching 3D cluster passing the e/gamma ID working point, divided by the number of generator level electrons. The gen electrons are required to have pT > 30 GeV, and the 3D clusters with pT > 20 GeV. Note, this script only accommodates the e/gamma ID in the TPG software (not any newly-trained BDTs). 
+
+The script takes as input the original ntuples generated in the `ntuples` subfolder:
+
+```
+python make_eff_plot.py --signalType electron_200PU
+```
+
+You can set the minimum efficiency to plot using the `--minimumEff` option (default is 0.75). The output plots will be saved in the `plots` directory.
+
+## 3D cluster variable plots
+Plot 3d cluster variable distributions from the flat ntuples either in the `cl3d_selection` subfolder, or to plot one of the new BDT scores, in the `training/results` directory. For example, to plot the cluster p_T distribution for both signal (electron 200PU) and background (neutrino 200PU):
+
+```
+python make_cl3dVar_plot.py --inputMap electron_200PU,Histomaxvardr,selection,2,23,Hist:neutrino_200PU,Histomaxvardr,selection,1,23,Hist --variable pt --legendMap "electron 200PU,2,23,L+pile-up,1,23,L"
+```
+
+The `--inputMap` has the following format: (sampleType),(cl3d algo.),(location of ntuples: selection/evaluation),(ROOT colour),(ROOT marker),(ROOT plotting style).
+
+The `--legendMap` option details the entries in the legend. Separate each entry with a "+" symbol. The format is: (text),(ROOT colour),(ROOT marker),(Entry style e.g. L=line)
+
+To suppress output to screen use the option: `--batch 1`.
+
+### Examples
+Example cluster p_T distribution and efficiency vs gen eta plots are stored in the `plots` directory.
+
diff --git a/L1Trigger/L1CaloTrigger/test/egid_hgcal/plotting/make_cl3dVar_plot.py b/L1Trigger/L1CaloTrigger/test/egid_hgcal/plotting/make_cl3dVar_plot.py
new file mode 100644
index 0000000000000..e246e55864cdf
--- /dev/null
+++ b/L1Trigger/L1CaloTrigger/test/egid_hgcal/plotting/make_cl3dVar_plot.py
@@ -0,0 +1,213 @@
+# Script to plot cl3d variable (normalised)
+#  > Input: selected cl3d ntuple in cl3d_selection/ or cl3d ntuple in training/results/
+#  > Output: plot of vl3d variable
+
+import ROOT
+import sys
+import os
+from array import array
+from optparse import OptionParser
+
+print "~~~~~~~~~~~~~~~~~~~~~~~~ CL3D VAR PLOTTER ~~~~~~~~~~~~~~~~~~~~~~~~"
+def leave():
+  print "~~~~~~~~~~~~~~~~~~~~~ CL3D VAR PLOTTER (END) ~~~~~~~~~~~~~~~~~~~~~"
+  sys.exit(1)
+
+# Get options from option parser
+def get_options():
+  parser = OptionParser()
+  parser = OptionParser( usage="usage: python make_cl3dVar_plot.py <options>" )
+  parser.add_option("--inputMap", dest="inputMap", default="electron_200PU,Histomaxvardr,selection,2,23,Hist", help="List of inputs to plot, of the form:  sample type,clustering algo.,ntuple stage [selection,evaluation],colour,marker style,plotting option:..." )
+  parser.add_option("--variable", dest="variable", default="pt", help="Variable to plot")
+  parser.add_option("--legendMap", dest="legendMap", default='', help="List of elements of legend. Separate each element with '+'. Format: <text>,<colour>,<marker>,<option>+<text2>,<colour2>,<marker2>,<option2>+..." )
+  parser.add_option("--legendPosition", dest="legendPosition", default="std", help="Position to plot legend in plot [std,centre,bottom_right]")
+  parser.add_option("--batch", dest="batch", default=0, type='int', help="Suppress output of plots to screen")
+  return parser.parse_args()
+
+(opt,args) = get_options()
+
+# Define sample to tree mapping
+treeMap = {
+  "electron":"e_sig",
+  "photon":"g_sig",
+  "pion":"pi_bkg",
+  "neutrino":"pu_bkg"
+}
+
+# HARDCODED Variable plotting options: [bins,minium,maximum,log scale]
+variable_plotting_options = {
+  'pt':[150,0,150,0], 
+  'eta':[50,-3.14,3.14,0], 
+  'phi':[50,-3.14,3.14,0], 
+  'clusters_n':[60,0,60,0], 
+  'showerlength':[60,0,60,0], 
+  'coreshowerlength':[30,0,30,0], 
+  'firstlayer':[20,0,20,0], 
+  'maxlayer':[50,0,50,0], 
+  'seetot':[100,0,0.20,0], 
+  'seemax':[50,0,0.1,0], 
+  'spptot':[100,0,0.2,1], 
+  'sppmax':[100,0,0.2,1], 
+  'szz':[100,0,100,1], 
+  'srrtot':[150,0,0.03,1], 
+  'srrmax':[150,0,0.03,1], 
+  'srrmean':[50,0,0.01,1], 
+  'emaxe':[60,0,1.2,0], 
+  'bdteg':[50,-1,1.,1], 
+  'quality':[6,-1,5,0]
+}
+if "bdt" in opt.variable: variable_plotting_options[opt.variable] = [50,-1,1.,1]
+if opt.variable not in variable_plotting_options: 
+  print " --> [ERROR] Variables (%s) not supported. Leaving..."%opt.variable
+  leave()
+
+# Dictionaries to store files, trees and histograms from inputs
+fileDict = {}
+treeDict = {}
+histDict = {}
+
+# Extract inputs from map and store in dictionary
+input_list = []
+for _input in opt.inputMap.split(":"):
+  inputInfo = _input.split(",")
+  if len( inputInfo )!=6:
+    print " --> [ERROR] Invalid input. Use the form: <sample type>,<clustering algo.>,<ntuple stage [selection,evaluation]>,<colour>,<marker style>,<plotting option>. Leaving..."
+    leave()
+  input_list.append({})
+  input_list[-1]['sampleType'] = inputInfo[0]
+  input_list[-1]['cl3d_algo'] = inputInfo[1]
+  input_list[-1]['ntuple'] = inputInfo[2]
+  input_list[-1]['colour'] = inputInfo[3]
+  input_list[-1]['marker'] = inputInfo[4]
+  input_list[-1]['option'] = inputInfo[5]
+
+# Store maximum value of all histograms for keeping on same plot
+maximum_value = 0
+#Output options
+if opt.batch: ROOT.gROOT.SetBatch(ROOT.kTRUE)
+# Plotting options
+binning = variable_plotting_options[opt.variable][:-1]
+setLogY = variable_plotting_options[opt.variable][-1]
+
+# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+# CONFIGURE OUTPUT
+ROOT.gStyle.SetOptStat(0)
+canv = ROOT.TCanvas("c","c")
+if setLogY: canv.SetLogy()
+
+# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+# LOOP OVER INPUTS: create histogram from var in tree
+for i in input_list:
+
+  key = "%s_%s_%s"%(i['sampleType'],i['cl3d_algo'],i['ntuple'])
+  if i['ntuple'] == 'selection': fileDict[key] = ROOT.TFile( "%s/cl3d_selection/%s/%s_%s_all.root"%(os.environ['HGCAL_L1T_BASE'],i['sampleType'],i['sampleType'],i['cl3d_algo']) )
+  elif i['ntuple'] == 'evaluation': fileDict[key] = ROOT.TFile( "%s/training/results/%s/%s_%s_all_eval.root"%(os.environ['HGCAL_L1T_BASE'],i['sampleType'],i['sampleType'],i['cl3d_algo']) )
+  else:
+    print " --> [ERROR] Invalid ntuple location. Please use selection or evaluation"
+    leave()
+
+  treeDict[key] = fileDict[key].Get( treeMap[i['sampleType'].split("_")[0]] )
+  histDict[key] = ROOT.TH1F("h_%s"%key, "", binning[0], binning[1], binning[2] )
+
+  for ev in treeDict[key]: histDict[key].Fill( getattr(ev,"cl3d_%s"%opt.variable) )
+
+  # normalise histograms
+  histDict[key].Scale(1./histDict[key].GetEntries())
+
+  # plotting options
+  histDict[key].SetLineWidth(2)
+  histDict[key].SetLineColor(int(i['colour']))
+  histDict[key].SetMarkerColor(int(i['colour']))
+  histDict[key].SetMarkerStyle(int(i['marker']))
+  histDict[key].SetMarkerSize(1.2)
+  histDict[key].GetYaxis().SetTitle("1/N dN/d(%s)"%opt.variable)
+  histDict[key].GetXaxis().SetTitle("%s"%opt.variable)
+  if( i['ntuple'] == "evaluation" )&( "bdt" in opt.variable ): 
+    histDict[key].GetYaxis().SetTitleSize(0.03)
+    histDict[key].GetYaxis().SetTitleOffset(1.3)
+    histDict[key].GetXaxis().SetTitleSize(0.03)
+    histDict[key].GetXaxis().SetTitleOffset(1.3)
+  else: 
+    histDict[key].GetYaxis().SetTitleSize(0.05)
+    histDict[key].GetYaxis().SetTitleOffset(0.95)
+    histDict[key].GetXaxis().SetTitleSize(0.05)
+    histDict[key].GetXaxis().SetTitleOffset(0.95)
+
+  # if maximum > current: save new maximum
+  if histDict[key].GetMaximum() > maximum_value: maximum_value = histDict[key].GetMaximum()
+
+# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+# LOOP OVER INPUTS AGAIN AND PLOT
+for _idx in range( len( input_list ) ):
+  i = input_list[_idx]
+  key = "%s_%s_%s"%(i['sampleType'],i['cl3d_algo'],i['ntuple'])
+  if _idx == 0:
+    if( setLogY ):
+      histDict[key].SetMaximum( 1.3*maximum_value )
+      histDict[key].SetMinimum( 1e-4 )
+    else:
+      histDict[key].SetMaximum( 1.1*maximum_value )
+    histDict[key].Draw("%s"%i['option'])
+  else:
+    histDict[key].Draw("SAME %s"%i['option'])
+
+# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+# FORMAT CANVAS
+lat = ROOT.TLatex()
+lat.SetTextFont(42)
+lat.SetLineWidth(2)
+lat.SetTextAlign(11)
+lat.SetNDC()
+lat.SetTextSize(0.05)
+lat.DrawLatex(0.1,0.92,"#bf{HGCal L1T} #scale[0.75]{#it{Working Progress}}")
+lat.DrawLatex(0.8,0.92,"14 TeV")
+
+# Legend: get from option
+entry_list = []
+#Entry of type: text,colour,option
+if opt.legendMap != "":
+  for _entry in opt.legendMap.split("+"):
+    entryInfo = _entry.split(",")
+    if len(entryInfo) != 4:
+      print "  --> [ERROR] Invalid legend entry. Exiting..."
+      leave()
+    entry_list.append({})
+    entry_list[-1]['text'] = entryInfo[0]
+    entry_list[-1]['colour'] = entryInfo[1]
+    entry_list[-1]['marker'] = entryInfo[2]
+    entry_list[-1]['option'] = entryInfo[3]
+
+  graph_list = []
+  #Create dummy graphs to place in legend
+  for entry in entry_list:
+    gr = ROOT.TGraph()
+    gr.SetFillColor( int(entry['colour']) )
+    gr.SetLineColor( int(entry['colour']) )
+    gr.SetLineWidth( 2 )
+    gr.SetMarkerColor( int(entry['colour']) )
+    gr.SetMarkerStyle( int(entry['marker']) )
+    gr.SetMarkerSize( 1.2 )
+    graph_list.append( gr )
+
+  # Create legend and add entries
+  if opt.legendPosition == "centre": leg = ROOT.TLegend(0.38,0.65,0.62,0.88)
+  elif opt.legendPosition == "bottom_right": leg = ROOT.TLegend(0.65,0.15,0.88,0.38)
+  else: leg = ROOT.TLegend(0.65,0.65,0.88,0.88)
+  #Create legend and add entries
+  leg.SetFillColor(0)
+  leg.SetLineColor(0)
+  for _idx in range( len( entry_list ) ):
+    entry = entry_list[_idx]
+    leg.AddEntry( graph_list[_idx], "%s"%entry['text'], "%s"%entry['option'] )
+  leg.Draw("Same")
+
+canv.Update()
+
+# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#
+# Save canvas
+if not os.path.isdir("./plots"): os.system("mkdir plots")
+canv.SaveAs( "%s/plotting/plots/cl3d_%s.png"%(os.environ['HGCAL_L1T_BASE'],opt.variable) )
+canv.SaveAs( "%s/plotting/plots/cl3d_%s.pdf"%(os.environ['HGCAL_L1T_BASE'],opt.variable) )
+
+if not opt.batch: raw_input("Press any key to continue...")
+leave()
diff --git a/L1Trigger/L1CaloTrigger/test/egid_hgcal/plotting/make_eff_plot.py b/L1Trigger/L1CaloTrigger/test/egid_hgcal/plotting/make_eff_plot.py
new file mode 100644
index 0000000000000..eabdd1ab6d27d
--- /dev/null
+++ b/L1Trigger/L1CaloTrigger/test/egid_hgcal/plotting/make_eff_plot.py
@@ -0,0 +1,262 @@
+# Script for making efficiency plot
+#  > Input: l1t ntuples produced in ./ntuples directory
+#  > Output: eff plot vs pT/eta
+#  > Note, not currently configured to plot efficiency for newly trained egid. For this, please update
+#    L1T TPG software with the new models and remake the ntuples
+
+import ROOT
+import sys
+import os
+import math
+from array import array
+from optparse import OptionParser
+
+# Define eta regions for different trainings
+eta_regions = {"low":[1.5,2.7],"high":[2.7,3.0]}
+
+print "~~~~~~~~~~~~~~~~~~~~~~~~ EFFICIENCY PLOTTER ~~~~~~~~~~~~~~~~~~~~~~~~"
+def leave():
+  print "~~~~~~~~~~~~~~~~~~~~~ EFFICIENCY PLOTTER (END) ~~~~~~~~~~~~~~~~~~~~~"
+  sys.exit(1)
+
+def get_options():
+  parser = OptionParser()
+  parser = OptionParser( usage="usage: python make_eff_plot.py <options>")
+  parser.add_option("--signalType", dest="signalType", default='electron_200PU', help="Signal sample to plot") 
+  parser.add_option("--clusteringAlgo", dest="clusteringAlgo", default="Histomaxvardr", help="Clustering algorithm used in ntuple production")
+  parser.add_option("--minimumEff", dest="minimumEff", default=0.75, type='float', help="Minimum efficiency to plot")
+  return parser.parse_args()
+
+(opt,args) = get_options()
+
+# Mapping to extract TDirectory for different clustering algo
+clusteringAlgoDirDict = {"gen":"Floatingpoint8ThresholdDummyHistomaxvardrGenclustersntuple","Histomaxvardr":"Floatingpoint8ThresholdDummyHistomaxvardrGenclustersntuple","Histomaxvardr_stc":"Floatingpoint8SupertriggercellDummyHistomaxvardrClustersntuple"}
+
+# Mapping: sample type to pdgid used for gen matching
+pdgIdDict = {
+  "electron":[11],
+  "photon":[22]
+}
+pdgid = pdgIdDict[ opt.signalType.split("_")[0] ]
+
+#Check: clustering exists in dir
+clusteringAlgo = opt.clusteringAlgo
+if clusteringAlgo not in clusteringAlgoDirDict:
+  print " --> [ERROR] not configured for %s clustering. Leaving..."%clusteringAlgo
+  leave()
+
+# Check ntuples exist. If so then make combined ntuple using hadd
+if os.path.isdir("%s/ntuples/%s"%(os.environ['HGCAL_L1T_BASE'],opt.signalType)):
+  if not os.path.exists("%s/ntuples/%s/all.root"%(os.environ['HGCAL_L1T_BASE'],opt.signalType)):
+    print " --> Making combined ntuple: ntuples/%s/all.root"%opt.signalType
+    os.system("hadd %s/ntuples/%s/all.root %s/ntuples/%s/ntuple*"%(os.environ['HGCAL_L1T_BASE'],opt.signalType,os.environ['HGCAL_L1T_BASE'],opt.signalType))
+else:
+  print " --> [ERROR] No input ntuples for %s detected. Please make. Leaving..."%opt.signalType
+  leave()
+
+# Extract trees from input ntuple
+f_in = ROOT.TFile("%s/ntuples/%s/all.root"%(os.environ['HGCAL_L1T_BASE'],opt.signalType))
+genTree = f_in.Get("%s/HGCalTriggerNtuple"%clusteringAlgoDirDict["gen"])
+cl3dTree = f_in.Get("%s/HGCalTriggerNtuple"%clusteringAlgoDirDict[ opt.clusteringAlgo ])
+
+# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+# Define histograms for total, gen matched, passin quality flag for pt and eta
+hist_tot_genpt = ROOT.TH1F("h_tot_genpt","",20,0,100)
+hist_tot_geneta = ROOT.TH1F("h_tot_geneta","",64,-3.2,3.2)
+hist_matched_genpt = ROOT.TH1F("h_matched_genpt","",20,0,100)
+hist_matched_geneta = ROOT.TH1F("h_matched_geneta","",64,-3.2,3.2)
+hist_egid_genpt = ROOT.TH1F("h_egid_genpt","",20,0,100)
+hist_egid_geneta = ROOT.TH1F("h_egid_geneta","",64,-3.2,3.2)
+
+# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+# Loop over events
+print " --> Matching gen particles to cl3d"
+for ev_idx in range( genTree.GetEntries() ):
+  if ev_idx % 10000 == 0: print " --> Processing event: %g/%g"%(ev_idx,genTree.GetEntries())
+  genTree.GetEntry(ev_idx)
+  cl3dTree.GetEntry(ev_idx)
+
+  #Extract number of gen particles + cl3d in event
+  N_gen = genTree.gen_n
+  N_cl3d = cl3dTree.cl3d_n
+
+  # Loop over gen particles
+  for gen_idx in range( N_gen ):
+
+    # Apply selection and add to total histogram
+    # If electron ...
+    if abs( genTree.gen_pdgid[gen_idx] ) in pdgid:
+      if genTree.gen_pt[gen_idx] > 30.:
+        
+        #Apply eta selection: slighlty tighter
+        if( abs( genTree.gen_eta[gen_idx] ) > (eta_regions['low'][0]+0.05))&( abs( genTree.gen_eta[gen_idx] ) <= (eta_regions['high'][1]-0.05) ):
+
+          hist_tot_genpt.Fill( genTree.gen_pt[gen_idx] )
+          hist_tot_geneta.Fill( genTree.gen_eta[gen_idx] )
+
+          # Define 4 vector for gen particle
+          gen_p4 = ROOT.TLorentzVector()
+          gen_p4.SetPtEtaPhiE( genTree.gen_pt[gen_idx], genTree.gen_eta[gen_idx], genTree.gen_phi[gen_idx], genTree.gen_energy[gen_idx] )
+  
+          # Loop over clusters in events: pT > 20 GeV (choose highest pT cluster) and dR < 0.2
+          cl3d_genmatched_maxpt_idx = -1
+          cl3d_genmatched_maxpt = -999.
+          for cl3d_idx in range(N_cl3d):
+            # Apply selection
+            if cl3dTree.cl3d_pt < 20.: continue
+            # Define 4 cevtor for cl3d
+            cl3d_p4 = ROOT.TLorentzVector()
+            cl3d_p4.SetPtEtaPhiE( cl3dTree.cl3d_pt[cl3d_idx], cl3dTree.cl3d_eta[cl3d_idx], cl3dTree.cl3d_phi[cl3d_idx], cl3dTree.cl3d_energy[cl3d_idx] )
+
+            # Require clister to be in dR < 0.2
+            if gen_p4.DeltaR(cl3d_p4) < 0.2:
+            
+              # If cluster pT > present max then set
+              if cl3d_p4.Pt() > cl3d_genmatched_maxpt:
+                cl3d_genmatched_maxpt = cl3d_p4.Pt()
+                cl3d_genmatched_maxpt_idx = cl3d_idx
+
+          #If gen matched idx set, then fill gen matched histo
+          hist_matched_genpt.Fill( genTree.gen_pt[gen_idx] )
+          hist_matched_geneta.Fill( genTree.gen_eta[gen_idx] )
+
+          # egid: use quality flag
+          if cl3dTree.cl3d_quality[cl3d_genmatched_maxpt_idx] > 0:
+            hist_egid_genpt.Fill( genTree.gen_pt[gen_idx] )
+            hist_egid_geneta.Fill( genTree.gen_eta[gen_idx] )
+         
+# END OF EVENTS LOOP
+print " --> Finished processing events..."
+# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+# MAKE EFFICIENCY PLOTS
+print " --> Making efficiency plots: function of gen pt and gen eta..." 
+hist_eff_genpt = hist_egid_genpt.Clone()
+hist_eff_genpt.Draw()
+raw_input("Press any key to continue...")
+hist_eff_geneta = hist_egid_geneta.Clone()
+# Deal with uncertainties correctly
+hist_eff_genpt.Sumw2()
+hist_eff_geneta.Sumw2()
+# Divide by total histogram
+hist_eff_genpt.Divide( hist_tot_genpt )
+hist_eff_geneta.Divide( hist_tot_geneta )
+
+# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+# Make canvases
+ROOT.gStyle.SetOptStat(0)
+#genpt
+canv_genpt = ROOT.TCanvas("c_genpt","c_genpt")
+hist_eff_genpt.GetYaxis().SetRangeUser(opt.minimumEff,1.1)
+hist_eff_genpt.GetYaxis().SetTitle("(L1>thr. & matched to GEN)/GEN")
+hist_eff_genpt.GetYaxis().SetLabelSize(0.03)
+hist_eff_genpt.GetYaxis().SetTitleSize(0.04)
+hist_eff_genpt.GetYaxis().SetTitleOffset(1.0)
+hist_eff_genpt.GetXaxis().SetRangeUser(25,100)
+hist_eff_genpt.GetXaxis().SetTitle("p_{T}^{GEN}  [GeV]")
+hist_eff_genpt.GetXaxis().SetLabelSize(0.03)
+hist_eff_genpt.GetXaxis().SetTitleSize(0.04)
+hist_eff_genpt.GetXaxis().SetTitleOffset(0.95)
+
+hist_eff_genpt.SetMarkerColor(2)
+hist_eff_genpt.SetMarkerSize(1.3)
+hist_eff_genpt.SetMarkerStyle(34)
+hist_eff_genpt.SetLineColor(2)
+hist_eff_genpt.SetLineWidth(2)
+
+hist_eff_genpt.Draw("P Hist")
+
+#Draw a line at 100% efficiency
+line_genpt = ROOT.TLine(25,1,100,1)
+line_genpt.SetLineWidth(2)
+line_genpt.SetLineStyle(2)
+line_genpt.Draw("Same")
+
+#Latex
+lat_genpt = ROOT.TLatex()
+lat_genpt.SetTextFont(42)
+lat_genpt.SetLineWidth(2)
+lat_genpt.SetTextAlign(11)
+lat_genpt.SetNDC()
+lat_genpt.SetTextSize(0.04)
+lat_genpt.DrawLatex(0.12,0.85,"%s, %s, p_{T}^{L1} > 20GeV, p_{T}^{GEN} > 30GeV"%(opt.signalType.split("_")[1],opt.clusteringAlgo))
+
+#Legend
+leg_genpt = ROOT.TLegend(0.6,0.2,0.89,0.35)
+leg_genpt.SetFillColor(0)
+leg_genpt.SetLineColor(0)
+leg_genpt.AddEntry(hist_eff_genpt,"TPG (quality flag)","P")
+leg_genpt.Draw("Same")
+
+# Check if directory exists to make plots
+if not os.path.isdir("./plots"): os.system("mkdir plots")
+canv_genpt.SaveAs("./plots/efficiency_vs_genpt.png")
+canv_genpt.SaveAs("./plots/efficiency_vs_genpt.pdf")
+
+# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+# gen eta
+canv_geneta = ROOT.TCanvas("c_geneta","c_geneta")
+hist_eff_geneta.GetYaxis().SetRangeUser(opt.minimumEff,1.1)
+hist_eff_geneta.GetYaxis().SetTitle("(L1>thr. & matched to GEN)/GEN")
+hist_eff_geneta.GetYaxis().SetLabelSize(0.03)
+hist_eff_geneta.GetYaxis().SetTitleSize(0.04)
+hist_eff_geneta.GetYaxis().SetTitleOffset(1.0)
+hist_eff_geneta.GetXaxis().SetRangeUser(-3.2,3.2)
+hist_eff_geneta.GetXaxis().SetTitle("#eta^{GEN}")
+hist_eff_geneta.GetXaxis().SetLabelSize(0.03)
+hist_eff_geneta.GetXaxis().SetTitleSize(0.04)
+hist_eff_geneta.GetXaxis().SetTitleOffset(0.95)
+
+hist_eff_geneta.SetMarkerColor(2)
+hist_eff_geneta.SetMarkerSize(1.3)
+hist_eff_geneta.SetMarkerStyle(34)
+hist_eff_geneta.SetLineColor(2)
+hist_eff_geneta.SetLineWidth(2)
+
+hist_eff_geneta.Draw("P Hist")
+
+#Draw a line at 100% efficiency
+line_geneta = ROOT.TLine(25,1,100,1)
+line_geneta.SetLineWidth(2)
+line_geneta.SetLineStyle(2)
+line_geneta.Draw("Same")
+
+# Also lines for difference eta regions
+eta_lines = {}
+eta_values = []
+for reg in eta_regions:
+  for bound in [0,1]:
+    if eta_regions[reg][bound] in eta_values: continue #avoid double lines
+    else: 
+      eta_values.append( eta_regions[reg][bound] )
+      eta_lines["%s_%g_neg"%(reg,bound)] = ROOT.TLine( -1*eta_regions[reg][bound], opt.minimumEff, -1*eta_regions[reg][bound], 1.03 )
+      eta_lines["%s_%g_neg"%(reg,bound)].SetLineWidth(2)
+      eta_lines["%s_%g_neg"%(reg,bound)].SetLineStyle(2)
+      eta_lines["%s_%g_neg"%(reg,bound)].Draw("Same")
+      eta_lines["%s_%g_pos"%(reg,bound)] = ROOT.TLine( eta_regions[reg][bound], opt.minimumEff, eta_regions[reg][bound], 1.03 )
+      eta_lines["%s_%g_pos"%(reg,bound)].SetLineWidth(2)
+      eta_lines["%s_%g_pos"%(reg,bound)].SetLineStyle(2)
+      eta_lines["%s_%g_pos"%(reg,bound)].Draw("Same")
+
+#Latex
+lat_geneta = ROOT.TLatex()
+lat_geneta.SetTextFont(42)
+lat_geneta.SetLineWidth(2)
+lat_geneta.SetTextAlign(11)
+lat_geneta.SetNDC()
+lat_geneta.SetTextSize(0.04)
+lat_geneta.DrawLatex(0.12,0.85,"%s, %s, p_{T}^{L1} > 20GeV, p_{T}^{GEN} > 30GeV"%(opt.signalType.split("_")[1],opt.clusteringAlgo))
+
+#Legend
+leg_geneta = ROOT.TLegend(0.6,0.2,0.89,0.35)
+leg_geneta.SetFillColor(0)
+leg_geneta.SetLineColor(0)
+leg_geneta.AddEntry(hist_eff_geneta,"TPG (quality flag)","P")
+leg_geneta.Draw("Same")
+
+# Check if directory exists to make plots
+canv_geneta.SaveAs("./plots/efficiency_vs_geneta.png")
+canv_geneta.SaveAs("./plots/efficiency_vs_geneta.pdf")
+
+print " --> All plots saved!"
+
+leave()
diff --git a/L1Trigger/L1CaloTrigger/test/egid_hgcal/plotting/plots/cl3d_pt_example.pdf b/L1Trigger/L1CaloTrigger/test/egid_hgcal/plotting/plots/cl3d_pt_example.pdf
new file mode 100644
index 0000000000000..4542289c2ba65
Binary files /dev/null and b/L1Trigger/L1CaloTrigger/test/egid_hgcal/plotting/plots/cl3d_pt_example.pdf differ
diff --git a/L1Trigger/L1CaloTrigger/test/egid_hgcal/plotting/plots/efficiency_vs_geneta_example.pdf b/L1Trigger/L1CaloTrigger/test/egid_hgcal/plotting/plots/efficiency_vs_geneta_example.pdf
new file mode 100644
index 0000000000000..3ec607e482d95
Binary files /dev/null and b/L1Trigger/L1CaloTrigger/test/egid_hgcal/plotting/plots/efficiency_vs_geneta_example.pdf differ
diff --git a/L1Trigger/L1CaloTrigger/test/egid_hgcal/setup.sh b/L1Trigger/L1CaloTrigger/test/egid_hgcal/setup.sh
new file mode 100644
index 0000000000000..db48ba138d9f7
--- /dev/null
+++ b/L1Trigger/L1CaloTrigger/test/egid_hgcal/setup.sh
@@ -0,0 +1,8 @@
+# HGCal L1T egid setup script
+
+# Setup environment
+source /cvmfs/cms.cern.ch/crab3/crab.sh
+export HGCAL_L1T_BASE=$PWD
+
+#set up grid proxy
+voms-proxy-init --rfc --voms cms
diff --git a/L1Trigger/L1CaloTrigger/test/egid_hgcal/training/README.md b/L1Trigger/L1CaloTrigger/test/egid_hgcal/training/README.md
new file mode 100644
index 0000000000000..4c2bd5b80b7ab
--- /dev/null
+++ b/L1Trigger/L1CaloTrigger/test/egid_hgcal/training/README.md
@@ -0,0 +1,45 @@
+# Training the e/gamma ID
+This subfolder contains all the scripts to train the e/gamma ID BDT, convert to an xml file, evaluate the new BDT and then extract the working points and the ROC curves. 
+
+The BDTs are trained with the `_train.root` output from the `cl3d_selection` subfolder. To train with electron as signal and pile-up as background:
+
+```
+python egid_training.py --signalType electron_200PU --backgroundType neutrino_200PU --bdtConfig baseline
+```
+
+The `--bdtConfig` option corresponds to the set of input features. There are currently two options: `baseline` and `full`. The list of input features are defined in [L36](https://github.com/jonathon-langford/hgcal_l1t_egid/blob/master/training/egid_training.py#L36). For new configurations, add to the `egid_vars` dictionary.
+
+By default, the signal and background clusters are reweighted so the signal and background samples are in effect the same size. This can be turned off using `--reweighting 0`. Also, the hyperparameters of the BDT can be changed using the `--trainParams` option, and specifying the hyperparameters in a comma-separated list.
+
+The `egid_training.py` trains a BDT separately in the high and low eta regions, and outputs both as `.model` files in the `models` directory.  
+
+## Converting to xml
+To convert the `.model` files to `.xml`:
+
+```
+python egid_to_xml.py --signalType electron_200PU --backgroundType neutrino_200PU --bdtConfig baseline
+```
+
+The output `.xml` files are stored in the `xml` directory. These can be added directly into the HGCal L1T TPG software in the [data](https://github.com/PFCal-dev/cmssw/tree/v3.13.4_1061p2/L1Trigger/L1THGCal/data) directory (keep the same naming convention e.g. egamma_id_histomax_370_higheta_v0.xml, where 370 corresponds to the version of the TPG software). You will then need to update [egammaIdentification.py](https://github.com/PFCal-dev/cmssw/blob/v3.13.4_1061p2/L1Trigger/L1THGCal/python/egammaIdentification.py) accordingly.
+
+## Evaluating the newly-trained BDTs
+To evaluate the trained BDTs on the test/all samples you can use the `egid_evaluate.py` script. This uses the output of `egid_to_xml.py` to describe the models. For example, if you have trained BDTs for both the `baseline` and `full` options, and want to evaluate both on the test sample:
+
+```
+python egid_evaluate.py --sampleType electron_200PU --dataset test --bdts electron_200PU_vs_neutrino_200PU:baseline,electron_200PU_vs_neutrino_200PU:full
+```
+
+The output is a flat ntuple of 3d clusters, with the new BDT scores. These are placed in the `results` directory.
+
+## Summarising the performance
+The `egid_summary.py` script outputs the performance of a BDT to the screen, in terms of the background rejections at various signal efficiency working points. The script can be used to compare amongst different BDTs. For example to look at the performance of the `baseline`, `full`, and original `tpg` BDTs you would run:
+
+```
+python egid_summary.py --inputMap electron_200PU,neutrino_200PU,Histomaxvardr,test --bdts electron_200PU_vs_neutrino_200PU:tpg:black,electron_200PU_vs_neutrino_200PU:baseline:blue,electron_200PU_vs_neutrino_200PU:full:red --outputROC 1
+```
+
+This takes as input the `_eval.root` files in the `results` directory, so make sure that you have evaluated all the BDTs that you want to summarise. The `--inputMap` option has the following format (signalType),(backgroundType),(cl3dAlgo),(sample), where sample can be train, test or all.
+
+The script will output the BDT score working points for all bdts in the `wp` directory. These working points can then be added to the [egammaIdentification.py](https://github.com/PFCal-dev/cmssw/blob/v3.13.4_1061p2/L1Trigger/L1THGCal/python/egammaIdentification.py) script in the TPG software.
+
+Finally, if the `--outputROC` option is used (on by default), the script will draw the ROC curves on the same canvas for all BDTs specified. The curves are plotted with the colours defined in the `--bdts` option. 
diff --git a/L1Trigger/L1CaloTrigger/test/egid_hgcal/training/egid_evaluate.py b/L1Trigger/L1CaloTrigger/test/egid_hgcal/training/egid_evaluate.py
new file mode 100644
index 0000000000000..3acd476fe90b8
--- /dev/null
+++ b/L1Trigger/L1CaloTrigger/test/egid_hgcal/training/egid_evaluate.py
@@ -0,0 +1,205 @@
+# Script to evaluate newly trained egid(s)
+#  > Input: selected clusters in cl3d_selection directory
+#  > Output: copy of ntuples + new BDT scores
+#  > Can evaluate multiple BDTs, just need xml input
+
+#usual imports
+import ROOT
+import numpy as np
+import pandas as pd
+import xgboost as xg
+import matplotlib.pyplot as plt
+from matplotlib import colors as mcolors
+import pickle
+from sklearn.preprocessing import LabelEncoder
+from sklearn.metrics import roc_auc_score, roc_curve
+from os import path, system
+import os
+import sys
+from array import array
+
+#Additional functions (if needed)
+from root_numpy import tree2array, fill_hist
+
+# Extract input variables to BDT from egid_training.py: if BDT config not defined there then will fail
+from egid_training import egid_vars, eta_regions
+
+# Define sample to tree mapping
+treeMap = {
+  "electron":"e_sig",
+  "photon":"g_sig",
+  "pion":"pi_bkg",
+  "neutrino":"pu_bkg"
+}
+
+# Configure options
+from optparse import OptionParser
+def get_options():
+  parser = OptionParser()
+  parser.add_option('--clusteringAlgo', dest='clusteringAlgo', default='Histomaxvardr', help="Clustering algorithm with which BDT was trained" )
+  parser.add_option('--sampleType', dest='sampleType', default='electron_200PU', help="Input sample type" )
+  parser.add_option('--bdts', dest='bdts', default='electron_200PU_vs_neutrino_200PU:baseline', help="Comma separated list of BDTs to evaluate. Format is <discrimnator>:<config>,... e.g. electron_200PU_vs_neutrino_200PU:baseline,electron_200PU_vs_neutrino_200PU:full" )
+  parser.add_option('--dataset', dest='dataset', default='test', help="Ntuple to evaluate on [test,train,all]" )
+  return parser.parse_args()
+
+(opt,args) = get_options()
+
+#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+# FUNCTIONS TO INITIATE AND EVALUATE BDTs
+
+# Initialisation: returns BDT and dict of input variables. Takes xml file name as input
+def initialise_egid_BDT( in_xml, in_var_names ):
+  # book mva reader with input variables
+  in_var = {}
+  for var in in_var_names: in_var[var] = array( 'f', [0.] )
+  # initialise TMVA reader and add variables
+  bdt_ = ROOT.TMVA.Reader()
+  for var in in_var_names: bdt_.AddVariable( var, in_var[var] )
+  # book mva with xml file
+  bdt_.BookMVA( "BDT", in_xml )
+  # return initialised BDT and input variables
+  return bdt_, in_var
+
+# Evaluation: calculates BDT score for 3D cluster taking bdt as input
+def evaluate_egid_BDT( _bdt, _bdt_var, in_cl3d, in_var_names ):
+  # Loop over input vars and extract values from tree
+  for var in in_var_names: _bdt_var[var][0]=getattr( in_cl3d, "%s"%var ) 
+  #return BDT score
+  return _bdt.EvaluateMVA("BDT")
+#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+def evaluate_egid():
+
+  print "~~~~~~~~~~~~~~~~~~~~~~~~ egid EVALUATE ~~~~~~~~~~~~~~~~~~~~~~~~"
+
+  # Extract bdt names from input list
+  bdt_list = []
+  for bdt in opt.bdts.split(","): bdt_list.append( "%s_%s"%(bdt.split(":")[0],bdt.split(":")[1]) )
+
+  # Check there is at least one input BDT
+  if len(bdt_list) == 0: 
+    print " --> [ERROR] No input BDT. Leaving..."
+    print "~~~~~~~~~~~~~~~~~~~~~ egid EVALUATE (END) ~~~~~~~~~~~~~~~~~~~~~"
+    sys.exit(1)
+
+  # Check bdts exist (as xml files), if so then add to dict
+  model_xmls = {}
+  for bdt_name in bdt_list:
+    #check if input variables are defined
+    if not bdt_name in egid_vars:
+      print " --> [ERROR] Input variables for BDT %s are not defined. Add key to egid_vars in training. Leaving..."%bdt_name
+      print "~~~~~~~~~~~~~~~~~~~~~ egid EVALUATE (END) ~~~~~~~~~~~~~~~~~~~~~"
+      sys.exit(1)
+    for reg in ['low','high']:
+      if not os.path.exists("./xml/egid_%s_%s_%seta.xml"%(bdt_name,opt.clusteringAlgo,reg)):
+        print " --> [ERROR] no xml file for BDT: %s. Leaving..."%bdt_name
+        print "~~~~~~~~~~~~~~~~~~~~~ egid EVALUATE (END) ~~~~~~~~~~~~~~~~~~~~~"
+        sys.exit(1)
+      else:
+        # passed checks: add xml to dict
+        model_xmls[ "%s_%seta"%(bdt_name,reg) ] = "./xml/egid_%s_%s_%seta.xml"%(bdt_name,opt.clusteringAlgo,reg)
+
+  # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+  # CONFIGURE INPUT NTUPLE
+  # Define input ntuple
+  f_in_name = "%s/cl3d_selection/%s/%s_%s_%s.root"%(os.environ['HGCAL_L1T_BASE'],opt.sampleType,opt.sampleType,opt.clusteringAlgo,opt.dataset)
+  if not os.path.exists( f_in_name ):
+    print " --> [ERROR] Input ntuple %s does not exist. Leaving..."%f_in_name
+    print "~~~~~~~~~~~~~~~~~~~~~ egid EVALUATE (END) ~~~~~~~~~~~~~~~~~~~~~"
+    sys.exit(1)
+
+  # Extract trees
+  f_in = ROOT.TFile.Open( f_in_name )
+  t_in = f_in.Get( treeMap[ opt.sampleType.split("_")[0] ] )
+  print " --> Input ntuple %s read successfully"%f_in_name
+
+  
+  # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+  # CONFIGURE OUTPUT NTUPLE
+  if not os.path.isdir("./results"):
+    print " --> Making ./results directory"
+    os.system("mkdir results")
+  if not os.path.isdir("./results/%s"%opt.sampleType):
+    print " --> Making ./results/%s directory to store ntuples with evaluated bdts"%opt.sampleType
+    os.system("mkdir results/%s"%opt.sampleType)
+  f_out_name = "./results/%s/%s_%s_%s_eval.root"%(opt.sampleType,opt.sampleType,opt.clusteringAlgo,opt.dataset)
+
+  # Variables to store in output ntuple
+  out_var_names = ['pt','eta','phi','clusters_n','showerlength','coreshowerlength','firstlayer','maxlayer','seetot','seemax','spptot','sppmax','szz','srrtot','srrmax','srrmean','emaxe']
+  # Add bdt score from TPG: i.e. one that was calculated in ntuple production
+  out_var_names.append( "bdt_tpg" )
+  # Add new bdt scores
+  for bdt_name in bdt_list: out_var_names.append( "bdt_%s"%bdt_name )
+
+  # Define dict to store output var
+  out_var = {}
+  for var in out_var_names: out_var[var] = array('f',[0.])
+  
+  #Open file: check if already exists (if so ask user if they want to rewrite)
+  if os.path.exists( f_out_name ):
+    recreate = input("Output file %s already exists. Do you want to write over file [yes=1,no=0]:"%f_out_name)
+    if not recreate:
+      print " --> Move %s to a new folder then run again. Leaving..."%f_out_name
+      print "~~~~~~~~~~~~~~~~~~~~~ egid EVALUATE (END) ~~~~~~~~~~~~~~~~~~~~~"
+      sys.exit(1)
+  
+  f_out = ROOT.TFile.Open( f_out_name, "RECREATE" )
+  t_out = ROOT.TTree( treeMap[opt.sampleType.split("_")[0]], treeMap[opt.sampleType.split("_")[0]] )
+    
+  #Add branches to tree
+  for var_name, var in out_var.iteritems(): t_out.Branch("cl3d_%s"%var_name, var, "cl3d_%s/F"%var_name) 
+
+  # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+  # INITIALISE BDTS
+  bdts = {}
+  bdt_input_variables = {} #dict of dicts to store input var for each bdt
+  # Loop over bdts
+  for b in bdt_list:
+    # Loop over eta regions
+    for reg in ['low','high']:
+      bdts["%s_%seta"%(b,reg)], bdt_input_variables["%s_%seta"%(b,reg)] = initialise_egid_BDT( model_xmls["%s_%seta"%(b,reg)], egid_vars[b] )
+      print " --> Initialised BDT (%s) in %s eta region"%(b,reg)
+
+  # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+  # EVALUATE BDTS: + store output variables in tree
+  #Loop over clusters in input tree
+  for cl3d in t_in:
+  
+    #evaluate bdts
+    for b in bdt_list:
+ 
+      #Low eta region: use low eta bdt
+      if(abs(cl3d.cl3d_eta) > eta_regions['low'][0])&(abs(cl3d.cl3d_eta) <= eta_regions['low'][1]):
+        out_var["bdt_%s"%b][0] = evaluate_egid_BDT( bdts["%s_loweta"%b], bdt_input_variables["%s_loweta"%b], cl3d, egid_vars[b] )
+
+      #High eta region: use high eta bdt
+      elif(abs(cl3d.cl3d_eta) > eta_regions['high'][0])&(abs(cl3d.cl3d_eta) <= eta_regions['high'][1]):
+        out_var["bdt_%s"%b][0] = evaluate_egid_BDT( bdts["%s_higheta"%b], bdt_input_variables["%s_higheta"%b], cl3d, egid_vars[b] )
+
+      # Else: outside allowed eta range, give value of -999
+      else: out_var["bdt_%s"%b][0] = -999.
+
+    # Add all other variables to output ntuple
+    for var in out_var_names[:-1*len(bdt_list)]:
+      if "bdt_tpg" in var: out_var[var][0] = getattr(cl3d,"cl3d_bdteg")
+      else: out_var[var][0] = getattr(cl3d,"cl3d_%s"%var)
+
+    # Write cluster with new BDT scores to tree
+    t_out.Fill()
+
+  #END of loop over clusters
+  print " --> Evaluated BDT scores and saved in output: %s"%f_out_name
+
+  # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+  # CLOSE FILES
+  f_in.Close()
+  f_out.Write()
+  f_out.Close()
+
+  print "~~~~~~~~~~~~~~~~~~~~~ egid EVALUATE (END) ~~~~~~~~~~~~~~~~~~~~~"
+# END OF EVALUATION FUNCTION
+
+# Main function for running program
+if __name__ == "__main__": evaluate_egid()
+
+
diff --git a/L1Trigger/L1CaloTrigger/test/egid_hgcal/training/egid_summary.py b/L1Trigger/L1CaloTrigger/test/egid_hgcal/training/egid_summary.py
new file mode 100644
index 0000000000000..79da524f13233
--- /dev/null
+++ b/L1Trigger/L1CaloTrigger/test/egid_hgcal/training/egid_summary.py
@@ -0,0 +1,272 @@
+# Script to summarise performance of egid(s)
+#  > Can be used to directly compare performance of trained and tpg egids
+#  > Input: *_eval.root files, output form egid_evaluate.py
+#  > Output: screen + .txt file defining working points
+
+#usual imports
+import ROOT
+import numpy as np
+import pandas as pd
+import xgboost as xg
+import matplotlib.pyplot as plt
+from matplotlib import colors as mcolors
+import pickle
+from sklearn.preprocessing import LabelEncoder
+from sklearn.metrics import roc_auc_score, roc_curve
+from os import path, system
+import os
+import sys
+from array import array
+
+#Additional functions (if needed)
+from root_numpy import tree2array, fill_hist
+
+# Extract input variables to BDT from egid_training.py: if BDT config not defined there then will fail
+from egid_training import egid_vars, eta_regions
+
+# Define sample to tree mapping
+treeMap = {
+  "electron":"e_sig",
+  "photon":"g_sig",
+  "pion":"pi_bkg",
+  "neutrino":"pu_bkg"
+}
+
+# HARDCODED: working points would like to output
+working_points = [0.995,0.975,0.95,0.9]
+
+# Configure options
+from optparse import OptionParser
+def get_options():
+  parser = OptionParser()
+  parser.add_option('--inputMap', dest='inputMap', default='electron_200PU,neutrino_200PU,Histomaxvardr,all', help='Comma separated list of input info. Format is <signalType>,<backgroundType>,<clustering Algo.>,<dataset [test,train,all]>')
+  parser.add_option('--bdts', dest='bdts', default='electron_200PU_vs_neutrino_200PU:baseline:blue', help="Comma separated list of BDTs to evaluate. Format is <discrimnator>:<config>:<plot colour>... e.g. electron_200PU_vs_neutrino_200PU:baseline:blue,electron_200PU_vs_neutrino_200PU:full:red" )
+  parser.add_option('--outputROC', dest='outputROC', default=1, type='int', help="Display output ROC curves for egids [1=yes,0=no]" )
+  return parser.parse_args()
+
+(opt,args) = get_options()
+
+def leave():
+  print "~~~~~~~~~~~~~~~~~~~~~ egid SUMMARY (END) ~~~~~~~~~~~~~~~~~~~~~"
+  sys.exit(1)
+
+#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+# FUNCTION TO EXTACT PATH TO FILES
+def get_path( _i, _proc ): return "%s/training/results/%s/%s_%s_%s_eval.root"%(os.environ['HGCAL_L1T_BASE'],_i[_proc],_i[_proc],_i['cl3d_algo'],_i['dataset'])
+
+#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+def summary_egid():
+
+  print "~~~~~~~~~~~~~~~~~~~~~~~~ egid SUMMARY ~~~~~~~~~~~~~~~~~~~~~~~~"
+
+  # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+  # CONFIGURE INPUTS
+
+  # Extract input info
+  info = {}
+  _i = opt.inputMap.split(",")
+  if len(_i)!=4:
+    print " --> [ERROR] Incorrect number of input elements. Please use format: <signalType>,<backgroundType>,<clustering Algo.>,<dataset [test,train,all]>"
+    leave()
+  info['signal'] = _i[0]
+  info['background'] = _i[1]
+  info['cl3d_algo'] = _i[2]
+  info['dataset'] = _i[3]
+
+  # Extract bdts names from input list and save plotting colour in map
+  bdt_list = []
+  bdt_colours = {}
+  for bdt in opt.bdts.split(","): 
+    bdt_name = "%s_%s"%(bdt.split(":")[0],bdt.split(":")[1])
+    bdt_list.append( bdt_name )
+    bdt_colours[ bdt_name ] = bdt.split(":")[2] 
+  #Check there is atleast one input bdt
+  if len(bdt_list) == 0:
+    print " --> [ERROR] No input BDT. Leaving..."
+    leave()
+
+  # Define variables to store in dataFrame
+  stored_vars = ["cl3d_eta"]
+  for b in bdt_list: 
+    if "tpg" in b: stored_vars.append( "cl3d_bdt_tpg" )
+    else: stored_vars.append( "cl3d_bdt_%s"%b )
+
+  # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+  # EXTRACT DATAFRAME FROM INPUT SIG AND BKG FILES
+  frames = {}
+  for proc in ['signal','background']:
+    # Extract signal and background files
+    if not os.path.exists( get_path(info,proc) ):
+      print " --> [ERROR] Input %s ntuple does not exists: %s. Please run egid_evaluate first! Leaving..."%(proc,get_path(info,proc))
+      leave()
+    iFile = ROOT.TFile( get_path(info,proc) )
+    iTree = iFile.Get( treeMap[ info[proc].split("_")[0] ] )
+    # Initialise new tree with frame variables
+    _file = ROOT.TFile("tmp.root","RECREATE")
+    _tree = ROOT.TTree("tmp","tmp")
+    _vars = {}
+    for var in stored_vars:
+      _vars[var] = array('f',[-1.])
+      _tree.Branch( '%s'%var, _vars[var], '%s/F'%var )
+    # Loop over clusters in inpu tree and fill new tree
+    for cl3d in iTree:
+      for var in stored_vars: _vars[ var ][0] = getattr( cl3d, '%s'%var )
+      _tree.Fill()
+    # Convert tree to dataFrame
+    frames[proc] = pd.DataFrame( tree2array(_tree) )
+    del _file
+    del _tree
+    system('rm tmp.root')      
+
+    # Add columns to dataFrame to label clusters
+    frames[proc]['proc'] = proc
+    frames[proc]['type'] = info[proc]
+
+  print " --> Extracted dataframes signal and background input ntuples"
+  # Make one combined dataFrame
+  frames_list = []
+  for proc in ['signal','background']: frames_list.append( frames[proc] )
+  frameTotal = pd.concat( frames_list, sort=False )
+
+  # Split into eta regions
+  frames_splitByEta = {}
+  for reg in eta_regions: 
+    frames_splitByEta[reg] = frameTotal[ abs(frameTotal['cl3d_eta']) > eta_regions[reg][0] ]
+    frames_splitByEta[reg] = frames_splitByEta[reg][ abs(frames_splitByEta[reg]['cl3d_eta']) <= eta_regions[reg][1] ]
+  print " --> Dataframes split into eta regions"
+
+  # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+  # DEFINE DICTS TO STORE EFFS FOR EACH BDT
+  eff_signal = {}
+  eff_background = {}
+  bdt_points = {}
+  wp_idx = {}
+
+  # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+  # LOOP OVER BDTS CONFIGS
+  for b in bdt_list:
+
+    print " --> Calculating efficiencies for BDT: %s"%b
+
+    # sort frame according to BDT score
+    if "tpg" in b: bdt_var = "cl3d_bdt_tpg"
+    else: bdt_var = "cl3d_bdt_%s"%b
+
+    # Loop over eta regions
+    for reg, fr in frames_splitByEta.iteritems():
+
+      # Sort frame
+      fr = fr.sort_values(bdt_var)
+
+      # Create key name
+      key = "%s_%s"%(b,reg) 
+
+      # Initiate lists to store efficiencies
+      eff_signal[key] = [1.]
+      eff_background[key] = [1.]
+      bdt_points[key] = [-9999.]
+      wp_idx[key] = []
+
+      # Total number of signal and bkg events in eta region
+      N_sig_total, N_bkg_total = float(len(fr[fr['proc']=='signal'])), float(len(fr[fr['proc']=='background'])) 
+
+      # Iterate over rows in frame and calc eff for given bdt_points
+      N_sig_running, N_bkg_running = 0., 0.
+      for index, row in fr.iterrows():
+        # Add one to running counters depending on proc
+        if row['proc'] == 'signal': N_sig_running += 1.
+        elif row['proc'] == 'background': N_bkg_running += 1.
+        eff_s, eff_b = 1.-(N_sig_running/N_sig_total), 1.-(N_bkg_running/N_bkg_total)
+        # Only add one entry for each bdt output value: i.e. if same as previous then remove last entry
+        if row[bdt_var] == bdt_points[key][-1]:
+          bdt_points[key] = bdt_points[key][:-1]
+          eff_signal[key] = eff_signal[key][:-1] 
+          eff_background[key] = eff_background[key][:-1] 
+        # Add entry
+        bdt_points[key].append( row[bdt_var] )
+        eff_signal[key].append( eff_s )
+        eff_background[key].append( eff_b )
+
+      # Convert lists into numpy arrays
+      bdt_points[key] = np.asarray(bdt_points[key])
+      eff_signal[key] = np.asarray(eff_signal[key])
+      eff_background[key] = np.asarray(eff_background[key])
+
+      # Extract indices of working points
+      for wp in working_points: wp_idx[key].append( abs((eff_signal[key]-wp)).argmin() )
+      print " --> Extracted working points for BDT: %s, eta_region = %s"%(b,reg)
+
+  # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+  # PRINT INFO TO USER
+  print " ~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~"  
+  print " --> INPUT: * signal          = %s"%info['signal']
+  print "            * background      = %s"%info['background']
+  print "            * cl3d_algo       = %s"%info['cl3d_algo']
+  print "            * dataset         = %s"%info['dataset']
+  print " ~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~"  
+  print ""
+  print "   ~.,~.,~.,~.,~.,~.,~.,~.,~.,~.,~.,~.,~.,~.,~.,~.,~>,~.,~.,~.,~"
+  for b in bdt_list:
+    print "   --> BDT:   * discriminator = %s"%("_".join(b.split("_")[:-1]))
+    print "              * config        = %s"%b.split("_")[-1]
+    for reg in eta_regions:
+      key = "%s_%s"%(b,reg)
+      print ""
+      print "   --> Eta region: %s --> %.2f < |eta| < %.2f"%(reg,eta_regions[reg][0],eta_regions[reg][1])
+      print "      --> Working points:"
+      for wp_itr in range(len(working_points)):
+        wp = working_points[wp_itr]
+        print "                  * At epsilon_s = %4.3f ::: BDT cut = %8.7f, epsilon_b = %5.4f"%(wp,bdt_points[key][wp_idx[key][wp_itr]],eff_background[key][wp_idx[key][wp_itr]])
+    
+    print "   ~.,~.,~.,~.,~.,~.,~.,~.,~.,~.,~.,~.,~.,~.,~.,~.,~>,~.,~.,~.,~"
+  # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+  # SAVE WORKING POINTS TO TXT FILE
+  # only save if signal and background match what was used to train BDT
+  if not os.path.isdir("./wp"): os.system("mkdir wp")
+  for b in bdt_list:
+
+    if( info['signal'] in b )&( info['background'] in b ):
+      print " --> Saving working points to .txt files: %s"%b
+      f_out = open("./wp/%s_wp.txt"%b,"w")
+      f_out.write("Working Points: %s\n"%b)
+      for reg in eta_regions:
+        key = "%s_%s"%(b,reg)
+        f_out.write(" --> Eta region: %s\n"%reg)
+        for wp_itr in range(len(working_points)):
+          wp = working_points[wp_itr]
+          f_out.write("          * %.1f : %8.7f\n"%((wp*100),bdt_points[key][wp_idx[key][wp_itr]])) 
+      f_out.close()
+    else: continue
+
+  # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+  # PLOT ROC CURVES
+  if opt.outputROC:
+
+    if not os.path.isdir("%s/plotting/plots"%os.environ['HGCAL_L1T_BASE']): os.system("mkdir %s/plotting/plots"%os.environ['HGCAL_L1T_BASE'])
+
+    print " --> Plotting ROC curves"
+    # Plot high and low eta regions separately
+    plt_itr = 1
+    for reg in eta_regions:
+      plt.figure(plt_itr)
+      for b in bdt_list:
+        key = "%s_%s"%(b,reg)
+        _label = b
+        plt.plot( eff_signal[key], 1-eff_background[key], label=_label, color=bdt_colours[b] )
+      plt.xlabel('Signal Eff. ($\epsilon_s$)')
+      plt.ylabel('1 - Background Eff. ($1-\epsilon_b$)')
+      plt.title('%.2f$ < |\eta| < $%.2f'%(eta_regions[reg][0],eta_regions[reg][1]))
+      axes = plt.gca()
+      axes.set_xlim([0.5,1.1])
+      axes.set_ylim([0.5,1.1])
+      plt.legend(bbox_to_anchor=(0.05,0.1), loc='lower left')
+      plt.savefig( "%s/plotting/plots/ROC_%seta.png"%(os.environ['HGCAL_L1T_BASE'],reg) )
+      plt.savefig( "%s/plotting/plots/ROC_%seta.pdf"%(os.environ['HGCAL_L1T_BASE'],reg) )
+      plt_itr += 1
+      print " --> Saved plot: %s/plotting/plots/ROC_%seta.(png/pdf)"%(os.environ['HGCAL_L1T_BASE'],reg)
+  leave()
+# END OF SUMMARY FUNCTION
+
+# Main function for running program
+if __name__ == "__main__": summary_egid()
diff --git a/L1Trigger/L1CaloTrigger/test/egid_hgcal/training/egid_to_xml.py b/L1Trigger/L1CaloTrigger/test/egid_hgcal/training/egid_to_xml.py
new file mode 100644
index 0000000000000..c75915a5ec8f5
--- /dev/null
+++ b/L1Trigger/L1CaloTrigger/test/egid_hgcal/training/egid_to_xml.py
@@ -0,0 +1,81 @@
+# Script for converting eid xgboost model to xml file (to be used directly in TPG software)
+
+#usual imports
+import numpy as np
+import xgboost as xg
+import pickle
+import pandas as pd
+import ROOT as r
+from root_numpy import tree2array, testdata, list_branches, fill_hist
+from os import system, path
+import os
+import sys
+from optparse import OptionParser
+
+# Extract input variables to BDT from egid_training.py: if BDT config not defined there then will fail
+from egid_training import egid_vars
+
+# Configure options
+def get_options():
+  parser = OptionParser()
+  parser.add_option('--clusteringAlgo', dest='clusteringAlgo', default='Histomaxvardr', help="Clustering algorithm with which to optimise BDT" )
+  parser.add_option('--signalType', dest='signalType', default='electron_200PU', help="Input signal type" )
+  parser.add_option('--backgroundType', dest='backgroundType', default='neutrino_200PU', help="Input background type" )
+  parser.add_option('--bdtConfig', dest='bdtConfig', default='baseline', help="BDT config (accepted values: baseline/full)" )
+  return parser.parse_args()
+
+(opt,args) = get_options()
+
+# Function to convert model into xml
+def egid_to_xml():
+
+  print "~~~~~~~~~~~~~~~~~~~~~~~~ egid TO XML ~~~~~~~~~~~~~~~~~~~~~~~~"
+
+  #Define BDT name
+  bdt_name = "%s_vs_%s_%s"%(opt.signalType,opt.backgroundType,opt.bdtConfig)
+  # Check if model exists
+  if not os.path.exists("./models/egid_%s_%s_loweta.model"%(bdt_name,opt.clusteringAlgo)):
+    print " --> [ERROR] No model exists for this BDT: ./models/egid_%s_%s_loweta.model. Train first! Leaving..."%(bdt_name,opt.clusteringAlgo)
+    print "~~~~~~~~~~~~~~~~~~~~~ egid TRAINING (END) ~~~~~~~~~~~~~~~~~~~~~"
+    sys.exit(1)
+  
+  elif not os.path.exists("./models/egid_%s_%s_higheta.model"%(bdt_name,opt.clusteringAlgo)):
+    print " --> [ERROR] No model exists for this BDT: ./models/egid_%s_%s_higheta.model. Train first! Leaving..."%(bdt_name,opt.clusteringAlgo)
+    print "~~~~~~~~~~~~~~~~~~~~~ egid TRAINING (END) ~~~~~~~~~~~~~~~~~~~~~"
+    sys.exit(1) 
+
+  # Check if input vars for BDT name are defined
+  if not bdt_name in egid_vars: 
+    print " --> [ERROR] Input variables for BDT %s are not defined. Add key to egid_vars dict. Leaving..."%bdt_name
+    print "~~~~~~~~~~~~~~~~~~~~~ egid TO XML (END) ~~~~~~~~~~~~~~~~~~~~~"
+    sys.exit(1)
+
+  #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+  # LOOP OVER ETA REGIONS
+  for reg in ['low','high']:
+  
+    print " --> Loading model for %s eta region: ./models/egid_%s_%s_%seta.model"%(reg,bdt_name,opt.clusteringAlgo,reg)    
+    egid = xg.Booster()
+    egid.load_model( "./models/egid_%s_%s_%seta.model"%(bdt_name,opt.clusteringAlgo,reg) )
+ 
+    #Define name of xml file to save
+    if not os.path.isdir("./xml"):
+      print " --> Making ./xml directory to store models as xml files"
+      os.system("mkdir xml")
+    f_xml = "./xml/egid_%s_%s_%seta.xml"%(bdt_name,opt.clusteringAlgo,reg)
+
+    # Convert to xml: using mlglue.tree functions
+    from mlglue.tree import tree_to_tmva, BDTxgboost, BDTsklearn
+    target_names = ['background','signal']
+    # FIXME: add options for saving BDT with user specified hyperparams
+    bdt = BDTxgboost( egid, egid_vars[bdt_name], target_names, kind='binary', max_depth=6, learning_rate=0.3 )
+    bdt.to_tmva( f_xml )
+
+    print " --> Converted to xml: ./xml/egid_%s_%s_%seta.xml"%(bdt_name,opt.clusteringAlgo,reg)
+  #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+  # END OF LOOP OVER ETA REGIONS
+  print "~~~~~~~~~~~~~~~~~~~~~ egid TO XML (END) ~~~~~~~~~~~~~~~~~~~~~"
+# END OF TO_XML FILE
+
+# Main function for running program
+if __name__ == "__main__": egid_to_xml()
diff --git a/L1Trigger/L1CaloTrigger/test/egid_hgcal/training/egid_training.py b/L1Trigger/L1CaloTrigger/test/egid_hgcal/training/egid_training.py
new file mode 100644
index 0000000000000..263469a25da66
--- /dev/null
+++ b/L1Trigger/L1CaloTrigger/test/egid_hgcal/training/egid_training.py
@@ -0,0 +1,223 @@
+# Train egid (BDT: xgboost) for hgcal l1t: using shower shape variables
+# > Takes as input clusters which pass selection 
+# > Trains separately in different eta regions (1.5-2.7 and 2.7-3.0)
+
+#usual imports
+import ROOT
+import numpy as np
+import pandas as pd
+import xgboost as xg
+import matplotlib.pyplot as plt
+import pickle
+from sklearn.preprocessing import LabelEncoder
+from sklearn.metrics import roc_auc_score, roc_curve
+from os import path, system
+import os
+import sys
+from array import array
+from optparse import OptionParser
+
+#Additional functions (if needed)
+from root_numpy import tree2array, fill_hist
+
+
+# Configure options
+def get_options():
+  parser = OptionParser()
+  parser.add_option('--clusteringAlgo', dest='clusteringAlgo', default='Histomaxvardr', help="Clustering algorithm with which to optimise BDT" )
+  parser.add_option('--signalType', dest='signalType', default='electron_200PU', help="Input signal type" )
+  parser.add_option('--backgroundType', dest='backgroundType', default='neutrino_200PU', help="Input background type" )
+  parser.add_option('--bdtConfig', dest='bdtConfig', default='baseline', help="BDT config (accepted values: baseline/full)" )
+  parser.add_option('--reweighting', dest='reweighting', default=1, type='int', help="Boolean to perform re-weighting of clusters to equalise signal and background [yes=1 (default), no=0]" )
+  parser.add_option('--trainParams',dest='trainParams', default=None, help='Comma-separated list of colon-separated pairs corresponding to (hyper)parameters for the training')
+  return parser.parse_args()
+
+# HARDCODED: input variables to BDT for different configs. Specify config in options. To try new BDT with different inputs variables, then add another key to dict
+egid_vars = {"electron_200PU_vs_neutrino_200PU_baseline":['cl3d_coreshowerlength','cl3d_firstlayer','cl3d_maxlayer','cl3d_srrmean'],'electron_200PU_vs_neutrino_200PU_full':['cl3d_coreshowerlength','cl3d_showerlength','cl3d_firstlayer','cl3d_maxlayer','cl3d_szz','cl3d_srrmean','cl3d_srrtot','cl3d_seetot','cl3d_spptot']}
+
+# Define eta regions for different trainings
+eta_regions = {"low":[1.5,2.7],"high":[2.7,3.0]}
+
+
+#Function to train xgboost model for HGCal L1T egid
+def train_egid():
+
+  (opt,args) = get_options()
+  print "~~~~~~~~~~~~~~~~~~~~~~~~ egid TRAINING ~~~~~~~~~~~~~~~~~~~~~~~~"
+
+  #Set numpy random seed
+  np.random.seed(123456)
+
+  # Training and validation fractions
+  trainFrac = 0.9
+  validFrac = 0.1
+
+  #Define BDT name
+  bdt_name = "%s_vs_%s_%s"%(opt.signalType,opt.backgroundType,opt.bdtConfig)
+  # Check if input vars for BDT name are defined
+  if bdt_name in egid_vars: print " --> Training BDT: %s"%bdt_name
+  else:
+    print " --> [ERROR] Input variables for BDT %s are not defined. Add key to egid_vars dict. Leaving..."%bdt_name
+    print "~~~~~~~~~~~~~~~~~~~~~ egid TRAINING (END) ~~~~~~~~~~~~~~~~~~~~~"
+    sys.exit(1)
+
+  #Dictionaries for sig+bkg type mappings
+  treeMap = {"electron":"e_sig","photon":"g_sig","pion":"pi_bkg","neutrino":"pu_bkg"}
+  procMap = {"electron":"signal", "photon":"signal", "pion":"background", "neutrino":"background"}
+
+  # Add input files to map
+  procFileMap = {}
+  procFileMap[ opt.signalType.split("_")[0] ] = "%s/cl3d_selection/%s/%s_%s_train.root"%(os.environ['HGCAL_L1T_BASE'],opt.signalType,opt.signalType,opt.clusteringAlgo)
+  procFileMap[ opt.backgroundType.split("_")[0] ] = "%s/cl3d_selection/%s/%s_%s_train.root"%(os.environ['HGCAL_L1T_BASE'],opt.backgroundType,opt.backgroundType,opt.clusteringAlgo)
+  procs = procFileMap.keys()
+
+  # Check if models and frames directories exist
+  if not os.path.isdir("./models"):
+    print " --> Making ./models directory to store trained egid models"
+    os.system("mkdir models")
+  if not os.path.isdir("./frames"):
+    print " --> Making ./frames directory to store pandas dataFrames"
+    os.system("mkdir frames")
+
+  #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+  # EXTRACT DATAFRAMES FROM INPUT SELECTED CLUSTERS
+  trainTotal = None
+  trainFrames = {}
+  #extract the trees: turn them into arrays
+  for proc,fileName in procFileMap.iteritems():
+    trainFile = ROOT.TFile("%s"%fileName)
+    trainTree = trainFile.Get( treeMap[proc] )
+    #initialise new tree with only relevant variables
+    _file = ROOT.TFile("tmp.root","RECREATE")
+    _tree = ROOT.TTree("tmp","tmp")
+    _vars = {}
+    for var in egid_vars[bdt_name]:
+      _vars[ var ] = array( 'f', [-1.] )
+      _tree.Branch( '%s'%var, _vars[ var ], '%s/F'%var )
+    #Also add cluster eta to do eta splitting
+    _vars['cl3d_eta'] = array( 'f', [-999.] )
+    _tree.Branch( 'cl3d_eta', _vars['cl3d_eta'], 'cl3d_eta/F' )  
+
+    #loop over events in tree and add to tmp tree
+    for ev in trainTree:
+      for var in egid_vars[bdt_name]: _vars[ var ][0] = getattr( ev, '%s'%var )
+      _vars['cl3d_eta'][0] = getattr( ev, 'cl3d_eta' )
+      _tree.Fill()
+  
+    #Convert tmp tree to pandas dataFrame and delete tmp files
+    trainFrames[proc] = pd.DataFrame( tree2array( _tree ) )
+    del _file
+    del _tree
+    os.system('rm tmp.root')
+
+    #Add columns to dataframe to labl clusters
+    trainFrames[proc]['proc'] = procMap[ proc ]
+    print " --> Extracted %s dataFrame from file: %s"%(proc,fileName)
+
+  #Create one total frame: i.e. concatenate signal and bkg
+  trainList = []
+  for proc in procs: trainList.append( trainFrames[proc] )
+  trainTotal = pd.concat( trainList, sort=False )
+  del trainFrames
+  print " --> Created total dataFrame: signal (%s) and background (%s)"%(opt.signalType,opt.backgroundType)
+
+  # Save dataFrames as pkl file
+  pd.to_pickle( trainTotal, "./frames/%s.pkl"%bdt_name )
+
+  #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+  # TRAIN MODEL: loop over different eta regions
+  print ""
+  for reg in eta_regions:
+
+    print " ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~"
+    print " --> Training for %s eta region: %2.1f < |eta| < %2.1f"%(reg,eta_regions[reg][0],eta_regions[reg][1])
+
+    #Impose eta cuts
+    train_reg = trainTotal[ abs(trainTotal['cl3d_eta'])>eta_regions[reg][0] ]
+    train_reg = train_reg[ abs(train_reg['cl3d_eta'])<=eta_regions[reg][1] ]
+
+    #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+    # REWEIGHTING: 
+    if opt.reweighting:
+      print " --> Reweighting: equalise signal and background samples (same sum of weights)"
+      sum_sig = len( train_reg[ train_reg['proc'] == "signal" ].index )
+      sum_bkg = len( train_reg[ train_reg['proc'] == "background" ].index )
+      weights = list( map( lambda a: (sum_sig+sum_bkg)/sum_sig if a == "signal" else (sum_sig+sum_bkg)/sum_bkg, train_reg['proc'] ) )
+      train_reg['weight'] = weights 
+    else:
+      print " --> No reweighting: assuming same S/B as in input ntuples"
+    
+    #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+    # CONFIGURE DATASETS: shuffle to get train and validation
+    print " --> Configuring training and validation datasets"
+    label_encoder = LabelEncoder()
+    theShape = train_reg.shape[0]  
+    theShuffle = np.random.permutation( theShape )
+    egid_trainLimit = int(theShape*trainFrac)
+    egid_validLimit = int(theShape*validFrac)
+  
+    #Set up dataFrames for training BDT
+    egid_X = train_reg[ egid_vars[bdt_name] ].values
+    egid_y = label_encoder.fit_transform( train_reg['proc'].values )
+    if opt.reweighting: egid_w = train_reg['weight'].values
+
+    #Peform shuffle
+    egid_X = egid_X[theShuffle]
+    egid_y = egid_y[theShuffle]
+    if opt.reweighting: egid_w = egid_w[theShuffle]
+
+    #Define training and validation sets
+    egid_train_X, egid_valid_X, dummy_X = np.split(egid_X, [egid_trainLimit, egid_validLimit+egid_trainLimit] )
+    egid_train_y, egid_valid_y, dummy_y = np.split(egid_y, [egid_trainLimit, egid_validLimit+egid_trainLimit] )
+    if opt.reweighting: egid_train_w, egid_valid_w, dummy_w = np.split(egid_w, [egid_trainLimit, egid_validLimit+egid_trainLimit] )
+ 
+    #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+    # BUILDING THE MODEL
+    if opt.reweighting:
+      training_egid = xg.DMatrix( egid_train_X, label=egid_train_y, weight=egid_train_w, feature_names=egid_vars[bdt_name] )
+      validation_egid = xg.DMatrix( egid_valid_X, label=egid_valid_y, weight=egid_valid_w, feature_names=egid_vars[bdt_name] )
+    else:
+      training_egid = xg.DMatrix( egid_train_X, label=egid_train_y, feature_names=egid_vars[bdt_name] )
+      validation_egid = xg.DMatrix( egid_valid_X, label=egid_valid_y, feature_names=egid_vars[bdt_name] )
+
+    # extract training hyper-parameters for model from input option
+    trainParams = {}
+    trainParams['objective'] = 'binary:logistic'
+    trainParams['nthread'] = 1
+    paramExt = ''
+    if opt.trainParams:
+      paramExt = '__'
+      for paramPair in trainParams:
+        param = paramPair.split(":")[0]
+        value = paramPair.split(":")[1]
+        trainParams[param] = value
+        paramExt += '%s)%s__'%(param_value)
+      paramExt = paramExt[:-2]
+
+    # Train the model
+    print " --> Training the model: %s"%trainParams
+    egid = xg.train( trainParams, training_egid )
+    print " --> Done."
+
+    # Save the model
+    egid.save_model( './models/egid_%s_%s_%seta.model'%(bdt_name,opt.clusteringAlgo,reg) )
+    print " --> Model saved: ./models/egid_%s_%s_%seta.model"%(bdt_name,opt.clusteringAlgo,reg)
+
+    #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+    # CHECKING PERFORMANCE OF MODEL: using trainig and validation sets
+    egid_train_predy = egid.predict( training_egid )
+    egid_valid_predy = egid.predict( validation_egid )
+
+    print "    *************************************************"
+    print "    --> Performance: in %s eta region (%2.1f < |eta| < %2.1f)"%(reg,eta_regions[reg][0],eta_regions[reg][1])
+    print "      * Training set   ::: AUC = %5.4f"%roc_auc_score( egid_train_y, egid_train_predy )
+    print "      * Validation set ::: AUC = %5.4f"%roc_auc_score( egid_valid_y, egid_valid_predy )
+    print "    *************************************************"
+    print ""
+
+  #END OF LOOP OVER ETA REGIONS
+  print "~~~~~~~~~~~~~~~~~~~~~ egid TRAINING (END) ~~~~~~~~~~~~~~~~~~~~~"
+# END OF TRAINING FUNCTION
+
+# Main function for running program
+if __name__ == "__main__": train_egid()
diff --git a/L1Trigger/L1CaloTrigger/test/egid_hgcal/training/mlglue/tree.py b/L1Trigger/L1CaloTrigger/test/egid_hgcal/training/mlglue/tree.py
new file mode 100644
index 0000000000000..00a57887e18af
--- /dev/null
+++ b/L1Trigger/L1CaloTrigger/test/egid_hgcal/training/mlglue/tree.py
@@ -0,0 +1,514 @@
+import re
+import numpy as np
+from sklearn.ensemble import GradientBoostingClassifier, GradientBoostingRegressor
+from sklearn.tree import _tree
+
+class Tree:
+    """Represents a node in a decision tree, identified by a unique integer id
+    
+    Attributes:
+        children (list of int): The id-s of the children associated with this node
+        depth (int): The depth of this node in the tree
+        id (int): The unique id of this node
+        parent (id): The id of the parent node
+        payload (tuple): Describes what this node does,
+            i.e. is it a non-terminal node (cut) or a terminal (leaf) 
+    """
+    def __init__(self, id, children, parent, depth, payload):
+        self.id = id
+        self.children = children
+        self.parent = parent
+        self.depth = depth
+        self.payload = payload
+
+    def __repr__(self):
+        return "Tree id:{id} children:{children} parent:{parent}, depth:{depth}, payload:{payload}".format(**{
+            "id": self.id,
+            "children": str(self.children),
+            "parent": self.parent,
+            "depth": self.depth,
+            "payload": self.payload,
+        })
+
+    def print_out(self, node_dict):
+        """Recursively prints a node and its children, given a dictionary with all the available nodes
+        
+        Args:
+            node_dict (dict id->node): All the available nodes
+        
+        Returns:
+            nothing
+        """
+        print (self.depth + 1) * "-" + str(self)
+        for ch in self.children:
+            node_dict[ch].print_out(node_dict)
+
+    def to_tmva(self, nodetree, scale):
+        """Writes out a TMVA-compatible XML string for a given node in the decision tree
+        
+        Args:
+            nodetree (dict int->Tree): The dictionary of the full tree
+            scale (float): A scaling coefficient for the TMVA leaves (TMVA = sklearn * scale)
+        
+        Returns:
+            string: XML with the node
+        
+        """
+
+        kind = "c"
+        if self.parent != -1:
+            idx = nodetree[self.parent].children.index(self.id)
+            if idx == 0:
+                kind = "l"
+            elif idx == 1:
+                kind = "r"
+        
+        #handle leaf (terminal) node
+        if len(self.children) == 0:
+
+            return '<Node pos="{0}" depth="{1}" NCoef="0" \
+    IVar="{2}" Cut="{3:17E}" cType="1" \
+    res="{4:17E}" rms="0.0e-00" \
+    purity="{5:.8E}" nType="-99">'.format(
+                kind,
+                self.depth + 1,
+                -1,
+                0.0,
+                self.payload[1] * scale,
+                0.0
+            )
+        #handle non-leaf node
+        else:
+            return '<Node pos="{0}" depth="{1}" NCoef="0" \
+    IVar="{2}" Cut="{3:17E}" cType="1" \
+    res="{4:17E}" rms="0.0" \
+    purity="{5:.8E}" nType="0">'.format(
+            kind,
+            self.depth + 1,
+            self.payload[1],
+            self.payload[2],
+            0.0, 0.0
+        )
+
+def sklearn_to_nodetree(cls, nodetree, sklearn_tree, node_id=0, parent_id=-1, depth=-1):
+    """Recursively converts a sklearn GradientBoosting{Classifier,Regressor} to a generic representation
+    
+    Args:
+        nodetree (dict id->Node): The output dictionary with the nodes
+        sklearn_tree (DecisionTreeRegressor): The input decision tree
+        node_id (int): the id of the root node
+        parent_id (int): the id of the parent node
+        depth (int): The current depth
+    
+    Returns:
+        dict int->Tree: The output node tree
+    """
+
+    #if the left (or right) child node id is -1, then this node is already a leaf node
+    if sklearn_tree.children_left[node_id] == _tree.TREE_LEAF:
+        n = Tree(
+            node_id,
+            [],
+            parent_id,
+            depth,
+            ("val", sklearn_tree.value[node_id][0,0]/cls.n_estimators)
+        )
+        nodetree[node_id] = n
+        if nodetree.has_key(parent_id):
+            nodetree[parent_id].children += [node_id]
+    #this is not a leaf node
+    else:
+        n = Tree(
+            node_id,
+            [],
+            parent_id,
+            depth,
+            ("cut", sklearn_tree.feature[node_id], sklearn_tree.threshold[node_id])
+        )
+        nodetree[node_id] = n
+        if nodetree.has_key(parent_id):
+            nodetree[parent_id].children += [node_id]
+
+    left_child = sklearn_tree.children_left[node_id]
+    right_child = sklearn_tree.children_right[node_id]
+    if left_child != _tree.TREE_LEAF:
+        sklearn_to_nodetree(cls, nodetree, sklearn_tree, left_child, node_id, depth+1)
+    if right_child != _tree.TREE_LEAF:
+        sklearn_to_nodetree(cls, nodetree, sklearn_tree, right_child, node_id, depth+1)
+
+    return nodetree
+
+def xgbtree_to_nodetree(tree):
+    """Converts an xgboost tree dump to an internal Tree representation
+    
+    Args:
+        tree (string): The model dump from xgboost using model.booster().get_dump()[ntree]
+    
+    Returns:
+        dict int->Tree: The tree structure
+    """
+    _NODEPAT = re.compile(r'(\d+):\[(.+)\]')
+    _LEAFPAT = re.compile(r'(\d+):(leaf=.+)')
+
+    parent_stack = []
+    prev_depth = -1
+    prev_index = -1
+    nodes = {}
+
+    #print 'ED DEBUG do we even get here?'
+    for node in tree.split("\n"):
+        node_depth = node.count("\t")
+
+        is_node = False
+        is_leaf = False
+
+        match = _NODEPAT.match(node.strip())
+        if match is not None:
+            node_index = int(match.group(1))
+            node_variable, threshold = match.group(2).split("<")
+            node_variable = int(node_variable.replace("f", ""))
+            threshold = float(threshold)
+            is_node = True
+
+        match = _LEAFPAT.match(node.strip())
+        if match is not None:
+            node_index = int(match.group(1))
+            val = float(match.group(2).split("=")[1])
+            is_leaf = True
+
+        if not (is_node or is_leaf):
+            continue
+
+        #keep track of the parent of this node
+        istack = prev_depth
+        while istack < node_depth:
+            parent_stack += [prev_index]
+            istack += 1
+        istack = node_depth
+        while istack < prev_depth:
+            parent_stack.pop()
+            istack += 1
+        my_parent = parent_stack[-1]
+
+        #create the node
+        if is_node:
+            nodes[node_index] = Tree(node_index, [], my_parent, node_depth, ("cut", node_variable, threshold))
+        elif is_leaf:
+            nodes[node_index] = Tree(node_index, [], my_parent, node_depth, ("val", val))
+
+        #insert node into final node dict
+        if nodes.has_key(my_parent):
+            nodes[my_parent].children += [node_index]
+
+        prev_depth = node_depth
+        prev_index = node_index
+        #print 'ED DEBUG: have made a tree with is_node = %s and is_leaf = %s'%(is_node,is_leaf)
+
+    #nodes[0].print_out(nodes)
+
+    return nodes
+
+class BDT(object):
+    def __init__(self, trees, kind, feature_names, target_names, max_depth, learning_rate):
+        self.trees = trees
+        self.kind = kind
+        self.ntrees = len(trees)
+
+        self.feature_names = feature_names
+        self.target_names = target_names
+
+        self.max_depth = max_depth
+        self.learning_rate = learning_rate
+
+
+
+    def to_tmva(self, outfile_name, mva_name="bdt"):
+
+        #Create list of variables
+        #we assume that all variables are 'simple', that is, not expressions
+        varstring = ""
+        for i in range(len(self.feature_names)):
+            varstring += '<Variable VarIndex="{0}" Expression="{1}" Label="{1}" Title="{1}" Unit="" Internal="{1}" Type="F" Min="{2:.64E}" Max="{3:.64E}"/>\n'.format(
+                i, self.feature_names[i], 0, 0
+            )
+
+        if self.kind == "regression":
+            class_string = ""
+            num_classes = 1
+            analysis_type = "Regression"
+
+            #for regression, just one class
+            for icls, clsname in enumerate(["Regression"]):
+                class_string += '<Class Name="{0}" Index="{1}"/>\n'.format(
+                    clsname, icls
+                )
+
+            #as many targets as given (n>1: vector valued regression)
+            target_string = ""
+            num_targets = len(self.target_names)
+            if num_targets > 1:
+                raise Exception("TMVA does not support regression with vector values, need to specify a scalar target")
+            for itgt, tgtname in enumerate(self.target_names):
+                target_string += '<Target Name="{0}" TargetIndex="{1}" Expression="{0}" Label="{0}" Title="{0}" Unit="" Internal="{0}" Type="F" Min="{2:.64E}" Max="{3:.64E}"/>\n'.format(
+                    tgtname, itgt, 0.0, 0.0
+                )
+
+        elif self.kind == "binary" or self.kind == "multiclass":
+            class_string = ""
+            num_classes = len(self.target_names)
+
+            #Decide between multiclass or binary
+            if self.kind == "binary":
+                analysis_type = "Classification"
+            elif self.kind == "multiclass":
+                analysis_type = "Multiclass"
+
+            for icls, clsname in enumerate(self.target_names):
+                class_string += '<Class Name="{0}" Index="{1}"/>\n'.format(
+                    clsname, icls
+                )
+            num_targets = 0
+            target_string = ""
+
+          
+        outfile = open(outfile_name, "w")
+        outfile.write(
+        """
+        <?xml version="1.0"?>
+        <MethodSetup Method="BDT::{mva_name}">
+        <GeneralInfo>
+        <Info name="TMVA Release" value=""/>
+        <Info name="ROOT Release" value=""/>
+        <Info name="Creator" value="mlglue"/>
+        <Info name="Date" value=""/>
+        <Info name="Host" value=""/>
+        <Info name="Dir" value=""/>
+        <Info name="Training events" value="-1"/>
+        <Info name="TrainingTime" value="-1"/>
+        <Info name="AnalysisType" value="{analysis_type}"/>
+        </GeneralInfo>
+        <Options>
+        <Option name="NTrees" modified="Yes">{ntrees}</Option>
+        <Option name="MaxDepth" modified="Yes">{maxdepth}</Option>
+        <Option name="BoostType" modified="Yes">Grad</Option>
+        <Option name="Shrinkage" modified="Yes">{learnrate}</Option>
+        <Option name="UseNvars" modified="Yes">{usenvars}</Option>
+        </Options>
+
+        <Variables NVar="{nvars}">
+        {varstring}
+        </Variables>
+
+        <Classes NClass="{nclasses}">
+        {class_string}
+        </Classes>
+
+        <Targets NTrgt="{ntargets}">
+        {target_string}
+        </Targets>
+
+        <Transformations NTransformations="0"/>
+        <MVAPdfs/>
+        <Weights NTrees="{ntrees}" AnalysisType="1">
+        """.format(**{
+                "analysis_type": analysis_type,
+                "mva_name": mva_name,
+                "ntrees": self.ntrees,
+                "maxdepth": self.max_depth,
+                "usenvars": len(self.feature_names),
+                "nvars": len(self.feature_names),
+                "varstring": varstring,
+                "learnrate": self.learning_rate,
+                
+                "nclasses": num_classes,
+                "class_string": class_string,
+
+                "ntargets": num_targets,
+                "target_string": target_string
+
+                }
+            )
+        )
+
+        #Loop over decision trees, in scikit that's a 2D array (N_estimators, N_classes)
+        #if binary classification, N_classes = 1
+        itree = 0
+        for tree in self.trees:
+            outfile.write(
+                '<BinaryTree type="DecisionTree" boostWeight="0.0" itree="{0}">\n'.format(
+                    itree, self.learning_rate
+                )
+            )
+
+            #convert internal representation to TMVA tree
+            #re-weight each node by 1/N (N - num trees per class)
+            tree_to_tmva(outfile, tree, 0, 1.0)
+
+            outfile.write('</BinaryTree>\n')
+            itree += 1
+
+        #done with output
+        outfile.write("""
+          </Weights>
+        </MethodSetup>
+        """)
+        outfile.close()
+
+    def setup_tmva(self, bdtfile):
+        from ROOT import TMVA
+        self.reader = TMVA.Reader("!V")
+
+        self.vardict = {}
+        #all variables must be float32
+        for ivar in range(0, len(self.feature_names)):
+            self.vardict[ivar] = np.array([0], dtype=np.float32)
+            self.reader.AddVariable("f{0}".format(ivar), self.vardict[ivar])
+        self.tmva = self.reader.BookMVA("bdt", bdtfile)
+
+    def eval_tmva(self, features):
+        for ivar, varname in enumerate(self.feature_names):
+            self.vardict[ivar][0] = features[0, ivar]
+
+        if self.kind == "multiclass":
+            ret = self.reader.EvaluateMulticlass("bdt")
+            ret = np.array([r for r in ret])
+        elif self.kind == "binary":
+            ret = self.reader.EvaluateMVA("bdt")
+        elif self.kind == "regression":
+            ret = self.reader.EvaluateRegression("bdt")
+            ret = np.array([r for r in ret])
+        return ret
+
+class BDTxgboost(BDT):
+    def __init__(self, model, feature_names, target_names, kind=None, max_depth=None, learning_rate=None):
+        #print 'ED DEBUG: initiating the xgboost bdt conversion class'
+        if not kind:
+            self.model = model
+            if model.objective.startswith("binary:logistic"):
+                kind = "binary"
+            elif model.objective.startswith("multiclass"):
+                kind = "multiclass"
+            else:
+                kind = "regression"
+            print model.objective, kind
+       
+        if not max_depth:
+            max_depth = model.max_depth
+
+        if not learning_rate:
+            max_depth = model.learning_rate
+
+        trees = []
+        try: 
+            full_dump = model.booster().get_dump()
+        except TypeError:
+            full_dump = model.get_dump()
+          
+        for tree_dump in full_dump:
+            tree = xgbtree_to_nodetree(tree_dump)
+            trees += [tree]
+
+        super(BDTxgboost, self).__init__(trees, kind, feature_names, target_names, max_depth, learning_rate)
+
+    def eval(self, features):
+        proba = self.model.predict(features)[:, 1]
+
+        #invert sigmoid
+        proba = -np.log(1.0/proba - 1.0)
+
+        #apply TMVA transformation
+        proba = 2.0 / (1.0 + np.exp(-2.0*proba)) - 1
+        
+        return proba
+
+class BDTsklearn(BDT):
+
+    def __init__(self, model, feature_names, target_names):
+        
+        self.model = model
+
+        kind = None
+        if isinstance(model, GradientBoostingRegressor):
+            kind = "regression"
+        elif isinstance(model, GradientBoostingClassifier):
+            if len(target_names) == 2:
+                kind = "binary"
+            else:
+                kind = "multiclass"
+
+        trees = []
+        #Loop over decision trees, in scikit that's a 2D array (N_estimators, N_classes)
+        for sklearn_trees in model.estimators_:
+             #write trees for different classes next to each other
+            for class_tree in sklearn_trees:
+                nodetree = {}
+                sklearn_to_nodetree(model, nodetree, class_tree.tree_, 0, -1, -1)
+                trees += [nodetree]
+
+        super(BDTsklearn, self).__init__(trees, kind, feature_names, target_names, model.max_depth, model.learning_rate)
+
+
+    def eval(self, vals):
+        """A TMVA-compatible evaluation function for a scikit-learn classifier
+        
+        Args:
+            vals (numpy array): An array (n_samples, n_features) of the input variables
+        
+        Returns:
+            numpy array: (n_samples, n_classes) array of the output
+        """
+        
+        #need to scale the same way as done in TMVA    
+        scale = 1.0 / self.model.n_estimators
+
+        if isinstance(self.model, GradientBoostingClassifier):
+            #multiclass classification
+            #according to TMVA::MethodBDT::GetMulticlassValues()
+            if self.model.n_classes_ > 2:
+                ret = np.zeros((vals.shape[0], self.model.n_classes_))
+                for iclass in range(self.model.n_classes_):
+                    for itree, t in enumerate(self.model.estimators_[:, iclass]):
+                        r = t.predict(vals)
+                        ret[:, iclass] += r * scale
+
+                norm = np.zeros(ret.shape)
+                for i in range(self.model.n_classes_):
+                    for j in range(self.model.n_classes_):
+                        if i != j:
+                            norm[:, i] += np.exp(ret[:, j] - ret[:, i])
+
+                ret = 1.0 / (1.0 + norm)        
+                return ret
+            #binary classification
+            elif self.model.n_classes_ == 2:
+                ret = np.zeros(vals.shape[0])
+
+                for itree, t in enumerate(self.model.estimators_[:, 0]):
+                    r = t.predict(vals)
+                    ret += r * scale
+                return 2.0/(1.0 + np.exp(-2.0 * ret)) - 1
+        elif isinstance(self.model, GradientBoostingRegressor):
+            ret = np.zeros((vals.shape[0], self.model.n_classes_))
+            for iclass in range(self.model.n_classes_):
+                for itree, t in enumerate(self.model.estimators_[:, iclass]):
+                    r = t.predict(vals)
+                    ret[:, iclass] += r * scale
+            return ret
+
+def tree_to_tmva(outfile, nodetree, current_node, scale):
+    """Recursively writes out a decision tree as an XML
+    
+    Args:
+        outfile (TYPE): Output file, must be writeable
+        nodetree (TYPE): The dictionary with the nodes
+        current_node (int): current node ID
+        scale (float): The scale factor for each leaf
+    
+    Returns:
+        nothing
+    """
+    outfile.write((nodetree[current_node].depth + 1)*"    " + nodetree[current_node].to_tmva(nodetree, scale) + "\n")
+    for child in nodetree[current_node].children:
+        tree_to_tmva(outfile, nodetree, child, scale)
+    outfile.write((nodetree[current_node].depth + 1)*"    " + "</Node>\n")
+