Skip to content

Commit f224027

Browse files
notes on eot (#56)
* start EoT page with appendix from EaT internal note * update eot notes while drafting * copy in Einar's notes from almost 2yr ago * update EoT estimate description based off Einar's response * link to issue
1 parent 53f0e6c commit f224027

File tree

5 files changed

+206
-0
lines changed

5 files changed

+206
-0
lines changed

src/SUMMARY.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@
2626
- [Averages](physics/stats/averages.md)
2727
- [Resolution](physics/stats/resolution.md)
2828
- [Multi-Bin Exclusion with Combine](physics/stats/multi-bin-combine.md)
29+
- [Electrons on Target (EoT)](physics/stats/electrons-on-target.md)
2930
- [ECal](physics/ecal/intro.md)
3031
- [Layer Weights](physics/ecal/layer-weights.md)
3132

Lines changed: 205 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,205 @@
1+
# Electrons on Target
2+
The number of "Electrons on Target" (EoT) of a simulated or collected data sample
3+
is an important measure on the quantity of data the sample represents.
4+
Estimating this number depends on how the sample was constructed.
5+
6+
## Real Data Collection
7+
For real data, the accelerator folks at SLAC construct a beam that has a few parameters we can
8+
measure (or they can measure and then tell us).
9+
- duty cycle \\(d\\): the fraction of time that the beam is delivering bunches
10+
- rate \\(r\\): how quickly bunches are delivered
11+
- multiplicity \\(\mu\\): the Poisson average number of electrons per bunch
12+
13+
The first two parameters can be used to estimate the number of bunches the beam delivered to LDMX
14+
given the amount of time we collected data \\(t\\).
15+
\\[
16+
N_\text{bunch} = d \times r \times t
17+
\\]
18+
19+
For an analysis that only inspects single-electron events, we can use the Poisson fraction of these
20+
bunches that corresponds to one-electron to estimate the number of EoT the analysis is inspecting.
21+
\\[
22+
N_\text{EoT} \approx \mu e^{-\mu} N_\text{bunch}
23+
\\]
24+
If we are able to include all bunches (regardless on the number of electrons), then we can replace
25+
the Poisson fraction with the Poisson average.
26+
\\[
27+
N_\text{EoT} \approx \mu N_\text{bunch}
28+
\\]
29+
A more precise estimate is certainly possible, but has not been investigated or written down
30+
to my knowledge.
31+
32+
## Simulation
33+
For the simulation, we configure the sample generation with a certain number of electrons
34+
per event (usually one) and we decide how many events to run for.
35+
Complexity is introduced when we do more complicated generation procedures in order to
36+
access rarer processes.
37+
38+
~~~admonish warning title="Single Electron Events"
39+
All of the discussion below is for single-electron events.
40+
Multiple-electron events are more complicated and have not been studied in as much detail.
41+
~~~
42+
43+
In general, a good first estimate for the number of EoT a simulation sample represents is
44+
\\[
45+
N_\text{EoT}^\text{equiv} = \frac{N_\text{sampled}}{\sum w} N_\text{attempt}
46+
\\]
47+
where
48+
- \\(N_\text{sampled}\\) is the number of events in the file(s) that constitute the sample
49+
- \\(\sum w\\) is the sum of the event weights of those same events
50+
- \\(N_\text{attempt}\\) is the number of events that were attempted when simulating
51+
52+
~~~admonish note title="Finding Number of Attempted Events"
53+
Currently, the number of attempted events is stored as the `num_tries_` member of
54+
the `RunHeader`
55+
56+
Samples created with ldmx-sw verions newer than v3.3.4 (>= v3.3.5) have an update
57+
to the processing framework to store this information more directly
58+
(in the `numTries_` member of the RunHeader or <= v4.4.7 and the `num_tries_` member
59+
for newer).
60+
Samples created with ldmx-sw versions older than v3.1.12 (<= v3.1.11) have access
61+
to the "Events Began" field of the `intParameters_` member of the RunHeader.
62+
63+
The easiest way to know for certain the number of tries is to just set the maximum
64+
number of events to your desired number of tries \\(N_\mathrm{attempt}\\) and limit the
65+
number of tries per output event to one (`p.maxTriesPerEvent = 1` and `p.maxEvents = Nattempt`
66+
in the config script).
67+
~~~
68+
69+
### Inclusive
70+
The number of EoT of an inclusive sample (a sample that has neither biasing or filtering)
71+
is just the number of events in the file(s) that constitute the sample.
72+
73+
The general equation above still works for this case.
74+
Since there is no biasing, the weights for all of the events are one \\(w=1\\) so the sum
75+
of the weights is equal to the number of events sampled \\(N_\text{sampled}\\).
76+
Since there is no filtering, the number of events attempted \\(N_\text{attempt}\\) is also equal
77+
to the number of events sampled.
78+
\\[
79+
N_\text{EoT}^\text{inclusive} = N_\text{sampled}
80+
\\]
81+
82+
### Filtered
83+
Often, we filter out "uninteresting" events, but that should not change the EoT the sample
84+
represents since the "uninteresting" events still represent data volume that was processed.
85+
This is the motivation behind using number of events attempted \\(N_\text{attempt}\\) instead
86+
of just using the number in the file(s).
87+
Without any biasing, the weights for all of the events are one so the sum of the weights
88+
is again equal to the number of events sampled and the general equation simplifies to
89+
just the total number of attempts.
90+
\\[
91+
N_\text{EoT}^\text{filtered} = N_\text{attempt}
92+
\\]
93+
94+
95+
### Biased
96+
Generally, we find the equivalent electrons on target (\\(N_\text{EoT}^\text{equiv}\\))
97+
by multiplying the attempted number of electrons on target (\\(N_\mathrm{attempt}\\))
98+
by the biasing factor (\\(B\\)) increasing the rate of the process focused on for the sample.
99+
100+
In the thin-target regime, this is satisfactory since the process we care about either
101+
happens (with the biasing factor applied) or does not (and is filtered out).
102+
In thicker target scenarios (like the Ecal), we need to account for events where
103+
unbiased processes happen to a particle _before_ the biased process.
104+
Geant4 accounts for this while stepping tracks through volumes with
105+
biasing operators attached and we include the weights of all steps in the overall event
106+
weight in our simulation.
107+
We can then calculate an effective biasing factor by dividing the total number of events
108+
in the output sample (\\(N_\mathrm{sampled}\\)) by the sum of their event weights (\\(\sum w\\)).
109+
In the thin-target regime (where nothing happens to a biased particle besides the
110+
biased process), this equation reduces to the simpler \\(B N_\mathrm{attempt}\\) used in other
111+
analyses since biased tracks in Geant4 begin with a weight of \\(1/B\\).
112+
\\[
113+
N_\text{EoT}^\text{equiv} = \frac{N_\mathrm{sampled}}{\sum w}N_\mathrm{attempt}
114+
\\]
115+
116+
## Event Yield Estimation
117+
Each of the simulated events has a weight that accounts for the biasing that was applied.
118+
This weight quantitatively estimates how many _unbiased_ events this single
119+
_biased_ event represents.
120+
Thus, if we want to estimate the total number of events produced for a desired EoT
121+
(the "event yield"), we would sum the weights and then scale this weight sum by the ratio
122+
between our desired EoT \\(N_\text{EoT}\\) and our actual simulated EoT \\(N_\text{attempt}\\).
123+
\\[
124+
N_\text{yield} = \frac{N_\text{EoT}}{N_\text{attempt}}\sum w
125+
\\]
126+
Notice that \\(N_\text{yield} = N_\text{sampled}\\) if we use
127+
\\(N_\text{EoT} = N_\text{EoT}^\text{equiv}\\) from before.
128+
Of particular importance, the scaling factor out front is constant across all events
129+
for any single simulation sample, so (for example) we can apply it to the contents of a
130+
histogram so that the bin heights represent the event yield within \\(N_\text{EoT}\\) events
131+
rather than just the weight sum (which is equivalent to \\(N_\text{EoT} = N_\text{attempt}\\)).
132+
133+
Generally, its a bad idea to scale too far above the equivalent EoT of the sample, so usually
134+
we keep generating more of a specific simulation sample until \\(N_\text{EoT}^\text{equiv}\\)
135+
is above the desired \\(N_\text{EoT}\\) for the analysis.
136+
137+
## More Detail
138+
This is a copy of work done by Einar Elén for [a software development meeting in Jan 2024](https://indico.fnal.gov/event/63045/).
139+
140+
The number of selected events in a sample \\(M = N_\text{sampled}\\) should be binomially distributed
141+
with two parameters: the number of attempted events \\(N = N_\text{attempt}\\) and probability \\(p\\).
142+
To make an EoT estimate from a biased sample with \\(N\\) events, we need to know
143+
how the probability in the biased sample differs from one in an inclusive sample.
144+
145+
Using "i" to stand for "inclusive" and "b" to stand for "biased".
146+
There are two options that we have used in LDMX.
147+
1. \\(p_\text{b} = B p_\text{i}\\) where \\(B\\) is the biasing factor.
148+
2. \\(p_\text{b} = W p_\text{i}\\) where \\(W\\) is the ratio of the average event weights between the two samples. Since the inclusive sample has all event weights equal to one, \\(W = \sum_\text{b} w / N\\) so it represents the EoT estimate described above.
149+
150+
~~~admonish note title="Binomial Basics"
151+
- Binomials are valid for distributions corresponding to some number of binary yes/no questions.
152+
- When we select \\(M\\) events out of \\(N\\) generated, the probability estimate is just \\(M/N\\).
153+
154+
We want \\(C = p_\text{b} / p_\text{i}\\).
155+
156+
The ratio of two probability parameters is not usually well behaved, but the binomial distribution is special.
157+
A 95% Confidence Interval can be reliably calculated for this ratio:
158+
\\[
159+
\text{CI}[\ln(C)] = 1.96 \sqrt{\frac{1}{N_i} - \frac{1}{M_i} + \frac{1}{N_b} - \frac{1}{M_b}}
160+
\\]
161+
This is good news since now we can extrapolate a 95% CI up to a large enough sample size using smaller
162+
samples that are easier to generate.
163+
~~~
164+
165+
### Biasing and Filtering
166+
For "normal" samples that use some combination of biasing and/or filtering, \\(W\\) is a good (unbiased)
167+
estimator of \\(C\\) and thus the EoT estimate described above is also good (unbiased).
168+
169+
For example, a very common sample is the so-called "Ecal PN" sample where we bias and filter for a high-energy
170+
photon to be produced in the target and then have a photon-nuclear interaction in the Ecal mimicking our missing
171+
momentum signal.
172+
We can use this sample (along with an inclusive sample of the same size) to directly calculuate \\(C\\) with
173+
\\(M/N\\) and compare that value to our two estimators.
174+
Up to a sample size \\(N\\) of 1e8 (\\(10^8\\) both options for estimating \\(C\\) look okay (first image),
175+
but we can extrapolate out another order of magnitude and observe that only the second option \\(W\\) stays within
176+
the CI (second image).
177+
178+
![EoT Estimate for ECal PN Sample on a Linear Scale](figs/eot/ecal-pn-linear-scale.png)
179+
![EoT Estimate for ECal PN Sample Extrapolated on a LogLog Scale](figs/eot/ecal-pn-extrapolate-loglog-scale.png)
180+
181+
### Photon-Nuclear Re-sampling
182+
Often we want to not only require there to be a photon-nuclear interaction in the Ecal, but we also want
183+
that photon-nuclear interaction to produce a specific type of interaction (usually specific types of particles
184+
and/or how energetic those particles are) -- known as the "event topology".
185+
186+
In order to support this style of sample generation, ldmx-sw is able to be configured such that when the
187+
PN interaction is happening, it is repeatedly re-sampled until the configured event topology is produced
188+
and then the simulation continues.
189+
The estimate \\(W\\) ends up being a slight over-estimate for samples produced via this more complicated process.
190+
Specifically, a "Kaon" sample where there is not an explicit biasing of the photon-nuclear interaction but
191+
the photon-nuclear interaction is re-sampled until kaons are produced already shows difference between the
192+
"true" value for \\(C\\) and our two short-hand estimates from before (image below).
193+
194+
![EoT Estimate Failure for Kaon PN Resampling](figs/eot/kaon-pn-resampling.png)
195+
196+
The overly-simple naive expected bias \\(B\\) is wrong because there is no biasing,
197+
but the average event weight ratio estimate \\(W\\) is also wrong
198+
in this case because the current (ldmx-sw v4.5.2) implementation of the re-sampling procedure updates the event
199+
weights incorrectly.
200+
[Issue #1858](https://github.com/LDMX-Software/ldmx-sw/issues/1858) documents what we believe is incorrect
201+
and a path forward to fixing it.
202+
In the meantime, just remember that if you are using this configuration of the simulation, the estimate for the
203+
EoT explained above will be slighly higher than the "true" EoT.
204+
You can repeat this experiment on a medium sample size (here 5e7 = 50M events) where an inclusive sample can
205+
be produced to calculate \\(C\\) directly.
45.7 KB
Loading
47.9 KB
Loading
42.1 KB
Loading

0 commit comments

Comments
 (0)