I'm essentially looking to compare my index field values against an index that has known-bad field values to determine if these bad field values exist in my environment - namely ip values. I have a functioning search, however, the limitations of the join command [50,000] result kind of takes away from the effectiveness of the search. It outputs the following... "Subsearch produced 50000 results, truncating to maxout 50000"
.
Which other options do I have? Writing the second field to a lookup table? Using a different approach to the SPL?
My example code is below:
index=foo source=oof
| fields blah, blah1, blah2
| rex field=_raw "(?\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})"
| join ip
[ search index=doo sourcetype="aaf" NOT ip="NULL"
| dedup ip
| fields ip]
| table host ip
Unless I limit by time I'll hit the 50k cap using this, is there another approach anyone is aware of? Ideally this should output bad ip addresses that have been fond in my data.
Is index foo is your data good ips?
Is index doo is your known bad ips?
Are you regexing your good ips?
If you combine both sets of data with the OR statement then you'll have all data in one searchable bucket. Then dedup by index and ip so that you have an ip from each index to compare. Then compare those two by counting by ip (if each index contains an ip then the count will be 2).
(index=foo source=foo) OR (index=doo sourcetype="aaf" NOT ip="NULL)
| rex field=_raw "(?<ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})"
| dedup index ip
| stats count by ip
| where count > 1
Each result in this search means that it matches your good data (your indexed data) and is also on the known bad list.
Is index foo is your data good ips?
Is index doo is your known bad ips?
Are you regexing your good ips?
If you combine both sets of data with the OR statement then you'll have all data in one searchable bucket. Then dedup by index and ip so that you have an ip from each index to compare. Then compare those two by counting by ip (if each index contains an ip then the count will be 2).
(index=foo source=foo) OR (index=doo sourcetype="aaf" NOT ip="NULL)
| rex field=_raw "(?<ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})"
| dedup index ip
| stats count by ip
| where count > 1
Each result in this search means that it matches your good data (your indexed data) and is also on the known bad list.
Yes, the foo index contains good ips and the doo index contains bad ips. The regex on good ips pulls out all ip addresses from the index. The index with bad ips already has ips that are formatted and wouldn't need regex. Would this still work?
There are many ways to do a lot of things in Splunk. This should work but that rex will be attempting to extract from all events, including both indexes. If you can get ips extracted through props/transforms then that would be much better. Give it a shot and provide feedback.
This SPL works. Thanks jazzy!
Try this
index=foo source=oof OR ( index=doo sourcetype="aaf" NOT ip="NULL")
| dedup ip | rex field=_raw "(?<ip1>\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3})" | eval ip=coalesce(ip1,ip)| stats values(host) as host , values(blah) as blah, values(blah1) as blah1 by ip
Where does the actual comparison take place? I want to see if any values inside index2 exist in index1.
Could you explain the purpose of the first "OR"?
@bmoody3 - for getting values that reside in index2 based on index 1 you could use eventstats .If the count of foo>0 and doo>0 from below result, then the ip is present in both the index
index=foo source=oof OR ( index=doo sourcetype="aaf" NOT ip="NULL")
| dedup ip | rex field=_raw "(?<ip1>\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3})" | eval ip=coalesce(ip1,ip)| eventstats count(eval(index=foo)) as foo, count eval(index=doo) as doo by ip| where foo>0 and doo>0 | table <your fields>