On my search results, I need to hide some specific events from the output? Currently I am running a search to find if there are any credit card data available in the logs. I am using luhn lookup to fetch the results. But, this is creating 100's of false positive alerts a day. Mostly the false positive results are generated based on the session id or support id which we knows for sure, they are false positive.
As a solution, I have extracted the session id and support id and assigned it to a variable named CARD using regex. Then i have used that variable to check whether its a false positive or not? If false positive dont display the result.
Here is my logic, could you please correct my logic as it is not working?
Note: session id and support id is 18-20 alphanumeric. say u8956742397238567a. So if the digits matches PAN logic, then it will triggers the false positive alerts.
rex field=orig_raw "\sessiond_id:\s<[a-z]?(?\d+) [a-z]?>" | eval falsepositive = if (PAN == CARD, "0" ,"1") | where falsepositive = "0"
Why not just do something like this:
index=*
| regex _raw = "(?\d{4}-\d{4}-\d{4}-\d{4})|\d{16}"
| "rex max_match=0 "(?<PotentialCCN>(?\d{4}-\d{4}-\d{4}-\d{4})|\d{16})"
| table _raw "Support id" "Session id" And Any Other Field That Looks Like A CCN HERE
| mvexpand PotentialCCN
| eval FALSE_POSITIVE=case(PotentialCCN = 'Support id', 1,
PotentialCCN = 'Session id', 1,
...,
true(), 0)
| search FALSE_POSITIVE=0
Why not just do something like this:
index=*
| regex _raw = "(?\d{4}-\d{4}-\d{4}-\d{4})|\d{16}"
| "rex max_match=0 "(?<PotentialCCN>(?\d{4}-\d{4}-\d{4}-\d{4})|\d{16})"
| table _raw "Support id" "Session id" And Any Other Field That Looks Like A CCN HERE
| mvexpand PotentialCCN
| eval FALSE_POSITIVE=case(PotentialCCN = 'Support id', 1,
PotentialCCN = 'Session id', 1,
...,
true(), 0)
| search FALSE_POSITIVE=0
hooray it worked.. thanks a lot Woodcock..
The RegEx should probably be adjusted to say that the match must not begin nor end with a digit.
If you were to share representative events, I am sure that we could create an optimal search that does what you need.
HI Woodcock,
I hope this helps. Please let me know, if you need any further information.
2017-02-16T08:30:25+00:00 abc-f5-01 crit dcc[12345]:0131000 [secsv] Request violations: HTTP protocol compliance failed. HTTP protocol compliance sub violations:N/A. Virus name:N/A. Support id: 1235398573867863879, source ip:10.50.20.20, xff, ip 10.228.200.1, source port: 56753, destination ip: 10.228.200.1, destination port:80, route_domain:0, HTTP classifer: /Common/mso-appfarm-MRP, scheme HTTP, geographic locations: request:,session_id:<340586286825705c>
Output of the alert:
Host- abc-f5-01
PAN - 340586286825705
IIN_issue - American Express
2017-02-16T09:30:25+00:00 abc-f5-01 crit dcc[5667]:0131078 [secsv] Request violations: HTTP protocol compliance failed. HTTP protocol compliance sub violations:N/A. Virus name:N/A. Support id: 8967869087298456298, source ip:10.50.20.20, xff, ip 10.228.200.1, source port: 56729, destination ip: 10.228.200.1, destination port:80, route_domain:0, HTTP classifer: /Common/mso-appfarm-MRP, scheme HTTP, geographic locations: request:,session_id:<5178081683798099>
Host- abc-f5-01
PAN - 5178081683798099
IIN_issue - Master Card
What I would do is identify the session Id at the very beginning, and then sed it out of the _raw so that none of the other algorithms need to be consulted. Probably save you a whole lot of lookup and cleaning time.
Do something like this at the very front of the search -
| eval orig_raw=_raw
| rex field=_raw mode=sed "s/session_id:\s\w+//g"
Thanks DalJeanis: Valid point, but it was initially turned down by Security team. Forgot the actual reason. That's why I am going through this painful process.
Um, I'm not saying to change the actual _raw, I m saying that the session ID is not relevant to what you are trying to detect, so remove the session ID from the data to be inspected before you pass it through the credit card number detector.
If Security has a problem with that strategy, just ask, bluntly, "What method would be used to hide a credit card number within a session ID, that could not be used to hide the credit card more thoroughly than that?"
If a bad actor has the ability to use the session ID or other fields for encryption, then they could use ANYTHING for encryption and you will never detect the card number.
where is falsepositive coming from? And you're extracting "orig_raw" correctly? I think you might be using rex wrong. You need to specify what field to run rex on "rex field=", that's probably _raw if you haven't extracted orig_raw. Then you'll want to specify a field name in your capture group "\sessiond_id:\s<[a-z]?(?\d+) [a-z]?>" etc.
not sure where you're getting this, I used this example:
| stats count | eval foo="tom,sally,foo"| makemv foo delim="," | eval bar="tom, bob, world"| makemv foo delim="," | makemv bar delim="," | mvexpand foo | mvexpand bar | eval equals=if(foo==bar,"1", "0") | where equals="1"
this gives the result:
count bar equals foo
0 tom 1 tom
you should also be able to skip a step and do:
| stats count | eval foo="tom,sally,foo"| makemv foo delim="," | eval bar="tom, bob, world"| makemv foo delim="," | makemv bar delim="," | mvexpand foo | mvexpand bar | where foo=bar
and forget the eval. if you need to escape the field, wrap it in $
| stats count | eval foo="tom,sally,foo"| makemv foo delim="," | eval bar="tom, bob, world"| makemv foo delim="," | makemv bar delim="," | mvexpand foo | mvexpand bar | where foo=$bar$
seems like your rex may not be mirroring the fields.
Hi bbingham, thanks for your prompt response. My actual search is this - NOT sourcetype=stash | get_integer_seq
| lookup luhn_lookup _raw OUTPUTNEW pii,pii_clean | eval pii_length=len(pii_clean) | lookup iin_lookup iin as pii_clean,length as pii_length OUTPUTNEW iin_issuer | search iin_issuer=* | get_event_id
| rename event_id as orig_event_id | eval orig_raw=_raw | fields - _raw | fields + orig_event_id,orig_raw,host,pii,iin_issuer | eval pii_hash=sha1(pii) | eval orig_time=_time |rex field=orig_raw "\sessiond_id:\s<[a-z]?(?CARD\d+) [a-z]?>" | eval falsepositive = if (PAN == CARD, "0" ,"1") | where falsepositive = "0" rename pii as PAN, orig_raw AS "Event log" .. there was a typo in the earlier
Alerts which are generating false positive,
2017-02-16T08:30:25+00:00 abc-f5-01 crit dcc[12345]................... session_id:
2017-02-16T09:30:25+00:00 abc-f5-01 crit dcc[12345]................... session_id:
2017-02-16T10:30:25+00:00 abc-f5-01 crit dcc[12345]................... session_id:<745648609869456bc>
2017-02-16T11:30:25+00:00 abc-f5-01 crit dcc[12345]................... session_id:<765756896789367389>
As per your reply i tried this - correct me if I am wrong - rex field=orig_raw "\sessiond_id:\s<(?[a-z]?)(?\d+)(?[a-z]?>" | eval CARD="char1,CARD,char2" | makemv CARD delim="" allowempty=t | --- here i am confused, which one i need to use to compare against.
k so what I did was make "fake" events, I was just proving your logic functions. all of this "| stats count | eval foo="tom,sally,foo"| makemv foo delim="," | eval bar="tom, bob, world"| makemv foo delim="," | makemv bar delim="," | mvexpand foo | mvexpand bar | " is just to make fake events, then I run the where clause as you had. In your case, I have a feeling the where clause is breaking due to something in the event piece of the search not extracting the vars right. Can you send a couple of events that have the fields you're looking for broken out? like:
timestamp host level pid session
2017-02-16T08:30:25+00:00 abc-f5-01 crit dcc[12345]................... session_id
and then explain what makes the fals positive? is it the fact there is no session id?
thanks a lot for your prompt response. I will explain in detail. We are using luhn_lookup command to check whether any given logs has 16 digit credit card number (16 digits must be in sequence, same like 16 digit long number on our credit card). To check a 16 digit number is a credit card number or not use this URL (https://planetcalc.com/2465/)
For eg: if an URL has a 16 digit number and if it matches the credit card number (through luhn_lookup algorithm), it will create an alert. It is bit weird as it generates 100's of false positive alerts a day. It is not just restricted to URL, session_ id/support id in a log, kernel error.
Currently, I am interested in session_id in the log which are generated by f5 devices. session_id will be generated in 4 types:
1. starts with an alphabet then only digits like - a68789758978678
2. starts and ends with alphabets and digits in the middle - h8676589765897608976c
3. starts with digits and ends with alphabet - 979679067896078609784a
4. only digits - 4586749867496749409
From the above 4 types, if the luhn lookup finds a match with a credit card number then it triggers an alert.
Our business units agreed that this is a false alert and as a team we have to make sure that on the F5 logs, if the alert is triggered on session_id we have to suppress the alert. Apart from the session_id field, if it triggers the alert, then we need to display it as a notable event.
As an example: 2017-02-16T08:30:25+00:00 abc-f5-01 crit dcc[12345]:0131000 [secsv] Request violations: HTTP protocol compliance failed........ HTTP1.1\r\n\Content-type:text:xml; charset=uf.,username:, session_id:<340586286825705c>
luhn_lookup matches the above session_id,( if you copy paste these numbers to the planetcalc.com link it says, the issuer is American Express.) But we know this is a session_id and not related to any credit card number. So its a false positive.
One more session_id:<5178081683798099> - this is also a false positive.
if i get a session id:<0123456789101112> - this wont trigger any alert as it is not a credit card number as per the luhn_lookup
Hope this helps.