Skip to content

Percentiles for multi-select are inaccurate #44

@jakemsnyder

Description

@jakemsnyder

Right now, when a user selects multiple counties, it averages the percentiles to get the new value. Mathematically, this is incorrect. We need to recalculate based on the estimates.

Per SVI documentation, CDC uses the excel function PERCENTRANK.INC on the corresponding EP field with 4 significant digits. Unfortunately (most of) the EP fields are also a percentage.

To be 100% accurate, we will need to go back to the estimate field, recalculate the percentage for that multi-county/tract selection, then calculate the percentile using that new percentage as it compares to the rest of the counties or tracts. The EP field calculation is also in the documentation above.

Including this data will not be possible at the tract level, as our mbtiles file is already at the maximum file size. We could do this at the county level though.

At the tract level, some fields we will be able to do this anyway (EP_PCI is the estimate, not the percentage, so we can still aggregate this accurately). Some fields we just need to multiply by the population estimate to get the correct estimate. But there will be some fields we cannot calculate an accurate percentile (ex. EP_CROWD, which requires the estimated household units as the quotient).

I'll work on identifying which fields we can calculate with the data we already have (and how to do so), and which fields we cannot calculate accurately with our current data.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions