-
Notifications
You must be signed in to change notification settings - Fork 2k
Shared: Improvements to SensitiveDataHeuristics.qll #21806
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
b6155ff
dc863c3
d95001f
07d4df1
cb84e63
b60ce3c
213ab90
6e2fb6f
5ed78d1
f2f4f4c
809da0f
7c72898
0f8b0a7
ea711b0
1c704a0
df37b50
3694631
af0124f
51dae16
c8196e4
59dbd68
a4b2c0f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,4 @@ | ||
| --- | ||
| category: minorAnalysis | ||
| --- | ||
| * The sensitive data heuristics used to identify code that handles passwords and private data have been improved. Most of the changes permit more variations of established patterns, thereby finding more sensitive data. Queries that use the sensitive data library (for example `js/clear-text-logging`) may find more correct results and fewer false positive results after these changes. |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,4 @@ | ||
| --- | ||
| category: minorAnalysis | ||
| --- | ||
| * The sensitive data heuristics used to identify code that handles passwords and private data have been improved. Most of the changes permit more variations of established patterns, thereby finding more sensitive data. Queries that use the sensitive data library (for example `py/clear-text-logging-sensitive-data`) may find more correct results and less fewer positive results after these changes. |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,4 @@ | ||
| --- | ||
| category: minorAnalysis | ||
| --- | ||
| * The sensitive data heuristics used to identify code that handles passwords and private data have been improved. Most of the changes permit more variations of established patterns, thereby finding more sensitive data. Queries that use the sensitive data library (for example `rust/cleartext-logging`) may find more correct results and fewer false positive results after these changes. |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -76,7 +76,7 @@ module HeuristicNames { | |
| string maybePassword() { | ||
| result = | ||
| "(?is).*(pass(wd|word|code|.?phrase)(?!.*question)|(auth(entication|ori[sz]ation)?).?key|oauth|" | ||
| + "api.?(key|token)|([_-]|\\b)mfa([_-]|\\b)).*" | ||
| + "api.?(key|tok)|([_-]|\\b)mfa([_-]|\\b)).*" | ||
| } | ||
|
|
||
| /** | ||
|
|
@@ -104,8 +104,9 @@ module HeuristicNames { | |
| // Geographic location - where the user is (or was) | ||
| "latitude|longitude|nationality|" + | ||
| // Financial data - such as credit card numbers, salary, bank accounts, and debts | ||
| "(credit|debit|bank|visa).?(card|num|no|acc(ou)?nt)|acc(ou)?nt.?(no|num|credit)|routing.?num|" | ||
| "(credit|debit|bank|visa).?(card|num|no|acc(ou)?nt)|(card|acc(ou)?nt).?(no|num|credit)|routing.?num|" | ||
| + "salary|billing|beneficiary|credit.?(rating|score)|([_-]|\\b)(ccn|cvv|iban)([_-]|\\b)|" + | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nit: The new regex accepts strings like |
||
| "security.?code|" + | ||
| // Communications - e-mail addresses, private e-mail messages, SMS text messages, chat logs, etc. | ||
| // "e(mail|_mail)|" + // this seems too noisy | ||
| // Health - medical conditions, insurance status, prescription records | ||
|
|
@@ -145,13 +146,13 @@ module HeuristicNames { | |
| * suggesting nouns within the string do not represent the meaning of the whole string (e.g. a URL or a SQL query). | ||
| * | ||
| * We also filter out common words like `certain` and `concert`, since otherwise these could | ||
| * be matched by the certificate regular expressions. Same for `accountable` (account), or | ||
| * `secretarial` (secret). | ||
| * be matched by the certificate regular expressions. Same for `accountable` (account), | ||
| * `secretarial` (secret), `wildcard` (card), `coauthor` (oauth). | ||
|
geoffw0 marked this conversation as resolved.
|
||
| */ | ||
| string notSensitiveRegexp() { | ||
| result = | ||
| "(?is).*([^\\w$.-]|redact|censor|obfuscate|hash|md5|sha|random|((?<!un)(en))?(crypt|(?<!pass)code)|" | ||
| + "certain|concert|secretar|account(ant|ab|ing|ed)|file|path|([_-]|\\b)url).*" | ||
| "(?is).*([^\\w$.-]|redact|censor|obfuscate|hash|md5|sha|random|(?<!unen)crypt|(?<!un)encode|" + | ||
| "certain|concert|secretar|wildcard|coauthor|account(ant|ab|ing|ed)|(?<!pro)file|path|([_-]|\\b)url).*" | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The new regex no longer accepts |
||
| } | ||
|
|
||
| /** | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,4 @@ | ||
| --- | ||
| category: minorAnalysis | ||
| --- | ||
| * The sensitive data heuristics used to identify code that handles passwords and private data have been improved. Most of the changes permit more variations of established patterns, thereby finding more sensitive data. Queries that use the sensitive data library (for example `swift/cleartext-logging`) may find more correct results and fewer false positive results after these changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This no longer accepts token, e.g.
api-tokenbut does accept acceptsapi-tok, which seems somewhat strange.Should
tokbetok(en)?