-
Notifications
You must be signed in to change notification settings - Fork 212
chore: refactor sensitive data capture #2647
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Hey @luke-belton! 👋 |
|
Size Change: -3.68 kB (-0.07%) Total Size: 5.07 MB
ℹ️ View Unchanged
|
| export function getAugmentPropertiesFromElement(elem: Element): Properties { | ||
| const shouldCaptureEl = shouldCaptureElement(elem) | ||
| if (!shouldCaptureEl) { | ||
| if (isExplicitNoCapture(elem)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
basically, let ph-no-capture take precedence here. But otherwise if a user explicitly sets an attribute on an element then we should capture it as a property on the event.
| /* | ||
| * Check whether a DOM event should be "captured" or if it may contain sensitive data | ||
| * using a variety of heuristics. | ||
| * Check whether a DOM event should be "captured" using a variety of heuristics. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wasn't really checking for sensitive data
| expect(getSafeText(el)).toBe(`Mixed "double" and 'single' quotes`) | ||
| }) | ||
|
|
||
| it(`should collect text from sensitive elements with ph-include class`, () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adding tests for a couple of cases we weren't really checking
| // don't include hidden or password fields | ||
| const type = (el as HTMLInputElement).type || '' | ||
| if (isString(type)) { | ||
| // it's possible for el.type to be a DOM element if el is a form with a child input[name="type"] | ||
| switch (type.toLowerCase()) { | ||
| case 'hidden': | ||
| return false | ||
| case 'password': | ||
| return false | ||
| } | ||
| } | ||
|
|
||
| // filter out data from fields that look like sensitive fields | ||
| const name = (el as HTMLInputElement).name || el.id || '' | ||
| // See https://github.com/posthog/posthog-js/issues/165 | ||
| // Under specific circumstances a bug caused .replace to be called on a DOM element | ||
| // instead of a string, removing the element from the page. Ensure this issue is mitigated. | ||
| if (isString(name)) { | ||
| // it's possible for el.name or el.id to be a DOM element if el is a form with a child input[name="name"] | ||
| const sensitiveNameRegex = | ||
| /^cc|cardnum|ccnum|creditcard|csc|cvc|cvv|exp|pass|pwd|routing|seccode|securitycode|securitynum|socialsec|socsec|ssn/i | ||
| if (sensitiveNameRegex.test(name.replace(/[^a-zA-Z0-9]/g, ''))) { | ||
| return false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have a more broader scope (i.e. safer) check for all of this in isSensitiveElement
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one thing we'll need to consider here is whether we're introducing breaking behaviour
if someone is relying on old behaviour for PII avoidance we need to be sure they're still not capturing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's a bit of a dilemma because I think the current behavior is broken anyway, meaning that in some cases we'd still capture sensitive data even when defended elsewhere by shouldCaptureElement. I still need to test a bit more to confirm exactly what conditions the current 'gap' happens under. But knowing it's broken in at least some cases already, this leaves us with a couple of options:
- fix the bug that could leak sensitive data based on the previous implementation
- just remove the previous implementation and try to have one path for sensitive data identification/capture going forwards (and don't rely on fuzzy heuristics like looking for suspect sub-strings in element
id/nameattributes!) - something else I'm missing...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the joy of a public API in an SDK :)
|
|
||
| if (shouldCaptureElement(el) && !isSensitiveElement(el) && el.childNodes && el.childNodes.length) { | ||
| if ( | ||
| (isExplicitCapture(el) || (!isExplicitNoCapture(el) && !isSensitiveElement(el))) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see note below too, but the checks we removed from shouldCaptureElement are covered more broadly inside isSensitiveElement
| it(`should return true for elements with class "ph-include"`, () => { | ||
| const el = document!.createElement(`div`) | ||
| el.className = `test1 ph-include test2` | ||
| expect(isExplicitCapture(el)).toBe(true) | ||
| }) | ||
|
|
||
| it(`should return false for elements without class "ph-include"`, () => { | ||
| const el = document!.createElement(`div`) | ||
| el.className = `test1 test2` | ||
| expect(isExplicitCapture(el)).toBe(false) | ||
| }) | ||
| }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we actually don't document ph-include anymore as far as I can see, but keeping it around for backward compatibility
| it(`should not include fields with sensitive names`, () => { | ||
| const sensitiveNames = [ | ||
| `cc_name`, | ||
| `card-num`, | ||
| `ccnum`, | ||
| `credit-card_number`, | ||
| `credit_card[number]`, | ||
| `csc num`, | ||
| `CVC`, | ||
| `Expiration`, | ||
| `password`, | ||
| `pwd`, | ||
| `routing`, | ||
| `routing-number`, | ||
| `security code`, | ||
| `seccode`, | ||
| `security number`, | ||
| `social sec`, | ||
| `SsN`, | ||
| ] | ||
| sensitiveNames.forEach((name) => { | ||
| input.name = '' | ||
| expect(shouldCaptureElement(input)).toBe(true) | ||
|
|
||
| input.name = name | ||
| expect(shouldCaptureElement(input)).toBe(false) | ||
| }) | ||
| }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was in some ways redundant because we'd still capture the element with sensitive names inside elements_chain
| it('should collect augment from the hidden element value', () => { | ||
| const props = getAugmentPropertiesFromElement(hidden) | ||
|
|
||
| expect(props).toStrictEqual({ 'on-the-hidden': 'is on the hidden' }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if a user has added a data-ph-capture-attribute attribute, then we should take that as explicit opt-in that they want the attribute captured - same for tests below
| const elTarget = document.createElement('span') | ||
| const elAnchor = document.createElement('a') | ||
| elAnchor.appendChild(elTarget) | ||
| elAnchor.setAttribute('id', 'password') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this probably would have failed before because of the checks inside shouldCaptureElement on 'sensitive names'
d7dfaa5 to
f3eb314
Compare
f3eb314 to
12f8b83
Compare
The logic for whether or not to capture certain elements (including sensitive data) was a bit confusing to follow, so trying to clear that up in this PR.
Originally I opened #2643 but that didn't really address the core issue.
In particular there were some cases such as that identified in #42648 where we'd identify sensitive data and prevent capturing of attributes, but still capture the actual elements (including attributes) inside
$elements_chain.Changes
shouldCaptureElement(the remaining logic is now split intoisExplicitNoCapture/isExplicitCapture)password,ccnum,ssnand blocked captureisSensitiveElementalready protects ALL input/select/textarea elements by typeshouldCaptureValuestill filters actual CC/SSN patterns from values$elements_chain) and caused false positives (e.g., blocking href capture anddata-ph-capture-attributeattributes from<a id="password" data-ph-capture-attribute-key="value">Reset password</a>)ph-no-capture/ph-sensitiveclasses still works as expectedRelease info Sub-libraries affected
Libraries affected
Checklist
If releasing new changes
pnpm changesetto generate a changeset file