|
| 1 | +# Electrons on Target |
| 2 | +The number of "Electrons on Target" (EoT) of a simulated or collected data sample |
| 3 | +is an important measure on the quantity of data the sample represents. |
| 4 | +Estimating this number depends on how the sample was constructed. |
| 5 | + |
| 6 | +## Real Data Collection |
| 7 | +For real data, the accelerator folks at SLAC construct a beam that has a few parameters we can |
| 8 | +measure (or they can measure and then tell us). |
| 9 | +- duty cycle \\(d\\): the fraction of time that the beam is delivering bunches |
| 10 | +- rate \\(r\\): how quickly bunches are delivered |
| 11 | +- multiplicity \\(\mu\\): the Poisson average number of electrons per bunch |
| 12 | + |
| 13 | +The first two parameters can be used to estimate the number of bunches the beam delivered to LDMX |
| 14 | +given the amount of time we collected data \\(t\\). |
| 15 | +\\[ |
| 16 | + N_\text{bunch} = d \times r \times t |
| 17 | +\\] |
| 18 | + |
| 19 | +For an analysis that only inspects single-electron events, we can use the Poisson fraction of these |
| 20 | +bunches that corresponds to one-electron to estimate the number of EoT the analysis is inspecting. |
| 21 | +\\[ |
| 22 | + N_\text{EoT} \approx \mu e^{-\mu} N_\text{bunch} |
| 23 | +\\] |
| 24 | +If we are able to include all bunches (regardless on the number of electrons), then we can replace |
| 25 | +the Poisson fraction with the Poisson average. |
| 26 | +\\[ |
| 27 | + N_\text{EoT} \approx \mu N_\text{bunch} |
| 28 | +\\] |
| 29 | +A more precise estimate is certainly possible, but has not been investigated or written down |
| 30 | +to my knowledge. |
| 31 | + |
| 32 | +## Simulation |
| 33 | +For the simulation, we configure the sample generation with a certain number of electrons |
| 34 | +per event (usually one) and we decide how many events to run for. |
| 35 | +Complexity is introduced when we do more complicated generation procedures in order to |
| 36 | +access rarer processes. |
| 37 | + |
| 38 | +~~~admonish warning title="Single Electron Events" |
| 39 | +All of the discussion below is for single-electron events. |
| 40 | +Multiple-electron events are more complicated and have not been studied in as much detail. |
| 41 | +~~~ |
| 42 | + |
| 43 | +In general, a good first estimate for the number of EoT a simulation sample represents is |
| 44 | +\\[ |
| 45 | +N_\text{EoT}^\text{equiv} = \frac{N_\text{sampled}}{\sum w} N_\text{attempt} |
| 46 | +\\] |
| 47 | +where |
| 48 | +- \\(N_\text{sampled}\\) is the number of events in the file(s) that constitute the sample |
| 49 | +- \\(\sum w\\) is the sum of the event weights of those same events |
| 50 | +- \\(N_\text{attempt}\\) is the number of events that were attempted when simulating |
| 51 | + |
| 52 | +~~~admonish note title="Finding Number of Attempted Events" |
| 53 | +Currently, the number of attempted events is stored as the `num_tries_` member of |
| 54 | +the `RunHeader` |
| 55 | +
|
| 56 | +Samples created with ldmx-sw verions newer than v3.3.4 (>= v3.3.5) have an update |
| 57 | +to the processing framework to store this information more directly |
| 58 | +(in the `numTries_` member of the RunHeader or <= v4.4.7 and the `num_tries_` member |
| 59 | +for newer). |
| 60 | +Samples created with ldmx-sw versions older than v3.1.12 (<= v3.1.11) have access |
| 61 | +to the "Events Began" field of the `intParameters_` member of the RunHeader. |
| 62 | +
|
| 63 | +The easiest way to know for certain the number of tries is to just set the maximum |
| 64 | +number of events to your desired number of tries \\(N_\mathrm{attempt}\\) and limit the |
| 65 | +number of tries per output event to one (`p.maxTriesPerEvent = 1` and `p.maxEvents = Nattempt` |
| 66 | +in the config script). |
| 67 | +~~~ |
| 68 | + |
| 69 | +### Inclusive |
| 70 | +The number of EoT of an inclusive sample (a sample that has neither biasing or filtering) |
| 71 | +is just the number of events in the file(s) that constitute the sample. |
| 72 | + |
| 73 | +The general equation above still works for this case. |
| 74 | +Since there is no biasing, the weights for all of the events are one \\(w=1\\) so the sum |
| 75 | +of the weights is equal to the number of events sampled \\(N_\text{sampled}\\). |
| 76 | +Since there is no filtering, the number of events attempted \\(N_\text{attempt}\\) is also equal |
| 77 | +to the number of events sampled. |
| 78 | +\\[ |
| 79 | + N_\text{EoT}^\text{inclusive} = N_\text{sampled} |
| 80 | +\\] |
| 81 | + |
| 82 | +### Filtered |
| 83 | +Often, we filter out "uninteresting" events, but that should not change the EoT the sample |
| 84 | +represents since the "uninteresting" events still represent data volume that was processed. |
| 85 | +This is the motivation behind using number of events attempted \\(N_\text{attempt}\\) instead |
| 86 | +of just using the number in the file(s). |
| 87 | +Without any biasing, the weights for all of the events are one so the sum of the weights |
| 88 | +is again equal to the number of events sampled and the general equation simplifies to |
| 89 | +just the total number of attempts. |
| 90 | +\\[ |
| 91 | + N_\text{EoT}^\text{filtered} = N_\text{attempt} |
| 92 | +\\] |
| 93 | + |
| 94 | + |
| 95 | +### Biased |
| 96 | +Generally, we find the equivalent electrons on target (\\(N_\text{EoT}^\text{equiv}\\)) |
| 97 | +by multiplying the attempted number of electrons on target (\\(N_\mathrm{attempt}\\)) |
| 98 | +by the biasing factor (\\(B\\)) increasing the rate of the process focused on for the sample. |
| 99 | + |
| 100 | +In the thin-target regime, this is satisfactory since the process we care about either |
| 101 | +happens (with the biasing factor applied) or does not (and is filtered out). |
| 102 | +In thicker target scenarios (like the Ecal), we need to account for events where |
| 103 | +unbiased processes happen to a particle _before_ the biased process. |
| 104 | +Geant4 accounts for this while stepping tracks through volumes with |
| 105 | +biasing operators attached and we include the weights of all steps in the overall event |
| 106 | +weight in our simulation. |
| 107 | +We can then calculate an effective biasing factor by dividing the total number of events |
| 108 | +in the output sample (\\(N_\mathrm{sampled}\\)) by the sum of their event weights (\\(\sum w\\)). |
| 109 | +In the thin-target regime (where nothing happens to a biased particle besides the |
| 110 | +biased process), this equation reduces to the simpler \\(B N_\mathrm{attempt}\\) used in other |
| 111 | +analyses since biased tracks in Geant4 begin with a weight of \\(1/B\\). |
| 112 | +\\[ |
| 113 | +N_\text{EoT}^\text{equiv} = \frac{N_\mathrm{sampled}}{\sum w}N_\mathrm{attempt} |
| 114 | +\\] |
| 115 | + |
| 116 | +## Event Yield Estimation |
| 117 | +Each of the simulated events has a weight that accounts for the biasing that was applied. |
| 118 | +This weight quantitatively estimates how many _unbiased_ events this single |
| 119 | +_biased_ event represents. |
| 120 | +Thus, if we want to estimate the total number of events produced for a desired EoT |
| 121 | +(the "event yield"), we would sum the weights and then scale this weight sum by the ratio |
| 122 | +between our desired EoT \\(N_\text{EoT}\\) and our actual simulated EoT \\(N_\text{attempt}\\). |
| 123 | +\\[ |
| 124 | +N_\text{yield} = \frac{N_\text{EoT}}{N_\text{attempt}}\sum w |
| 125 | +\\] |
| 126 | +Notice that \\(N_\text{yield} = N_\text{sampled}\\) if we use |
| 127 | +\\(N_\text{EoT} = N_\text{EoT}^\text{equiv}\\) from before. |
| 128 | +Of particular importance, the scaling factor out front is constant across all events |
| 129 | +for any single simulation sample, so (for example) we can apply it to the contents of a |
| 130 | +histogram so that the bin heights represent the event yield within \\(N_\text{EoT}\\) events |
| 131 | +rather than just the weight sum (which is equivalent to \\(N_\text{EoT} = N_\text{attempt}\\)). |
| 132 | + |
| 133 | +Generally, its a bad idea to scale too far above the equivalent EoT of the sample, so usually |
| 134 | +we keep generating more of a specific simulation sample until \\(N_\text{EoT}^\text{equiv}\\) |
| 135 | +is above the desired \\(N_\text{EoT}\\) for the analysis. |
| 136 | + |
| 137 | +## More Detail |
| 138 | +This is a copy of work done by Einar Elén for [a software development meeting in Jan 2024](https://indico.fnal.gov/event/63045/). |
| 139 | + |
| 140 | +The number of selected events in a sample \\(M = N_\text{sampled}\\) should be binomially distributed |
| 141 | +with two parameters: the number of attempted events \\(N = N_\text{attempt}\\) and probability \\(p\\). |
| 142 | +To make an EoT estimate from a biased sample with \\(N\\) events, we need to know |
| 143 | +how the probability in the biased sample differs from one in an inclusive sample. |
| 144 | + |
| 145 | +Using "i" to stand for "inclusive" and "b" to stand for "biased". |
| 146 | +There are two options that we have used in LDMX. |
| 147 | +1. \\(p_\text{b} = B p_\text{i}\\) where \\(B\\) is the biasing factor. |
| 148 | +2. \\(p_\text{b} = W p_\text{i}\\) where \\(W\\) is the ratio of the average event weights between the two samples. Since the inclusive sample has all event weights equal to one, \\(W = \sum_\text{b} w / N\\) so it represents the EoT estimate described above. |
| 149 | + |
| 150 | +~~~admonish note title="Binomial Basics" |
| 151 | +- Binomials are valid for distributions corresponding to some number of binary yes/no questions. |
| 152 | +- When we select \\(M\\) events out of \\(N\\) generated, the probability estimate is just \\(M/N\\). |
| 153 | +
|
| 154 | +We want \\(C = p_\text{b} / p_\text{i}\\). |
| 155 | +
|
| 156 | +The ratio of two probability parameters is not usually well behaved, but the binomial distribution is special. |
| 157 | +A 95% Confidence Interval can be reliably calculated for this ratio: |
| 158 | +\\[ |
| 159 | + \text{CI}[\ln(C)] = 1.96 \sqrt{\frac{1}{N_i} - \frac{1}{M_i} + \frac{1}{N_b} - \frac{1}{M_b}} |
| 160 | +\\] |
| 161 | +This is good news since now we can extrapolate a 95% CI up to a large enough sample size using smaller |
| 162 | +samples that are easier to generate. |
| 163 | +~~~ |
| 164 | + |
| 165 | +### Biasing and Filtering |
| 166 | +For "normal" samples that use some combination of biasing and/or filtering, \\(W\\) is a good (unbiased) |
| 167 | +estimator of \\(C\\) and thus the EoT estimate described above is also good (unbiased). |
| 168 | + |
| 169 | +For example, a very common sample is the so-called "Ecal PN" sample where we bias and filter for a high-energy |
| 170 | +photon to be produced in the target and then have a photon-nuclear interaction in the Ecal mimicking our missing |
| 171 | +momentum signal. |
| 172 | +We can use this sample (along with an inclusive sample of the same size) to directly calculuate \\(C\\) with |
| 173 | +\\(M/N\\) and compare that value to our two estimators. |
| 174 | +Up to a sample size \\(N\\) of 1e8 (\\(10^8\\) both options for estimating \\(C\\) look okay (first image), |
| 175 | +but we can extrapolate out another order of magnitude and observe that only the second option \\(W\\) stays within |
| 176 | +the CI (second image). |
| 177 | + |
| 178 | + |
| 179 | + |
| 180 | + |
| 181 | +### Photon-Nuclear Re-sampling |
| 182 | +Often we want to not only require there to be a photon-nuclear interaction in the Ecal, but we also want |
| 183 | +that photon-nuclear interaction to produce a specific type of interaction (usually specific types of particles |
| 184 | +and/or how energetic those particles are) -- known as the "event topology". |
| 185 | + |
| 186 | +In order to support this style of sample generation, ldmx-sw is able to be configured such that when the |
| 187 | +PN interaction is happening, it is repeatedly re-sampled until the configured event topology is produced |
| 188 | +and then the simulation continues. |
| 189 | +The estimate \\(W\\) ends up being a slight over-estimate for samples produced via this more complicated process. |
| 190 | +Specifically, a "Kaon" sample where there is not an explicit biasing of the photon-nuclear interaction but |
| 191 | +the photon-nuclear interaction is re-sampled until kaons are produced already shows difference between the |
| 192 | +"true" value for \\(C\\) and our two short-hand estimates from before (image below). |
| 193 | + |
| 194 | + |
| 195 | + |
| 196 | +The overly-simple naive expected bias \\(B\\) is wrong because there is no biasing, |
| 197 | +but the average event weight ratio estimate \\(W\\) is also wrong |
| 198 | +in this case because the current (ldmx-sw v4.5.2) implementation of the re-sampling procedure updates the event |
| 199 | +weights incorrectly. |
| 200 | +[Issue #1858](https://github.com/LDMX-Software/ldmx-sw/issues/1858) documents what we believe is incorrect |
| 201 | +and a path forward to fixing it. |
| 202 | +In the meantime, just remember that if you are using this configuration of the simulation, the estimate for the |
| 203 | +EoT explained above will be slighly higher than the "true" EoT. |
| 204 | +You can repeat this experiment on a medium sample size (here 5e7 = 50M events) where an inclusive sample can |
| 205 | +be produced to calculate \\(C\\) directly. |
0 commit comments