Getting Data In

Identify Two Digit Year Timestamp Events Query (2020 Timestamp Issue)

Path Finder

We are trying to identify how much of our data is impacted by the latest timestamp bug. I was wondering if there was a query that used regex to search _raw for events that have 2 digit years. This will greatly help us analyze the risk of doing a last minute production upgrade...

1 Solution

SplunkTrust
SplunkTrust

Hi jordanking1992,

I posted that in the slack channel #splunky2k channel:

index=* TERM(19) 
| regex _raw="[\\\/\|-](19)" 
| rex "(?<myField>[^\s]+19)" 
| search myField!="*2019*" 
| stats count by index sourcetype

It later got this little enhancement:

index=* TERM(19)
| eval sample=substr(_raw,0,128), search="index=".index." sourcetype=".sourcetype." TERM(19)"
| regex sample="(((?:^|\D)\d{1,2}[-\/]\d{1,2}[-\/]19[^\d])|((?:^|\D)19[-\/]\d{1,2}[-\/]\d{1,2}[^\d])|((?:^|\D)\d{1,2}\s[-\/]\s\d{1,2}\s[-\/]\s19[^\d])|((?:^|\D)19\s[-\/]\s\d{1,2}\s[-\/]\s\d{1,2}[^\d])|((?:^|\D)([a-zA-Z]{3}[- \/]+\d{1,2}[- \/]+19[^:\d]))|((?:^|\D)19[- \/][a-zA-Z]{3}[- \/]\d{1,2}[^:\d])|((?:^|\D)\d{1,2}[- \/]+[a-zA-Z]{3}[- \/]+19[^:\d]))"
| stats count last(sample) as sample by search

Please be aware that this is a very hungry, resource intensive search!

Hope this helps ...

cheers, MuS

UPDATE modifications to the regex and the substr() uses the first 128 characters of the event.

View solution in original post

Esteemed Legend

You can use this search to find potentially problematic events. DISClAIMER: This is NOT a guarantee because we have no way to tell with SPL whether Indexers are using datetime.xml or proper Magic 6 settings. It will show you events that IF the indexers are using datetime.xml, will be broken without the fix.

index="*" AND sourcetype="*" AND timestartpos="*" earliest=-7d latest=now
| dedup punct sourcetype index
| eval timestr=substr(_raw, timestartpos+1, timeendpos-timestartpos)
| regex timestr="(((?:^|\D)\d{1,2}[-\/]\d{1,2}[-\/]19[^\d])|((?:^|\D)19[-\/]\d{1,2}[-\/]\d{1,2}[^\d])|((?:^|\D)\d{1,2}\s[-\/]\s\d{1,2}\s[-\/]\s19[^\d])|((?:^|\D)19\s[-\/]\s\d{1,2}\s[-\/]\s\d{1,2}[^\d])|((?:^|\D)([a-zA-Z]{3}[- \/]+\d{1,2}[- \/]+19[^:\d]))|((?:^|\D)19[- \/][a-zA-Z]{3}[- \/]\d{1,2}[^:\d])|((?:^|\D)\d{1,2}[- \/]+[a-zA-Z]{3}[- \/]+19[^:\d]))"
| table punct sourcetype index timestr time*pos _time _raw time*
| stats list(*) AS * BY index sourcetype

If this search returns nothing, then you have nothing to fix. Do note that this search will return the same results BEFORE and AFTER you deploy the fix. It only shows your potential risk, not your actual.

0 Karma

SplunkTrust
SplunkTrust

Hi jordanking1992,

I posted that in the slack channel #splunky2k channel:

index=* TERM(19) 
| regex _raw="[\\\/\|-](19)" 
| rex "(?<myField>[^\s]+19)" 
| search myField!="*2019*" 
| stats count by index sourcetype

It later got this little enhancement:

index=* TERM(19)
| eval sample=substr(_raw,0,128), search="index=".index." sourcetype=".sourcetype." TERM(19)"
| regex sample="(((?:^|\D)\d{1,2}[-\/]\d{1,2}[-\/]19[^\d])|((?:^|\D)19[-\/]\d{1,2}[-\/]\d{1,2}[^\d])|((?:^|\D)\d{1,2}\s[-\/]\s\d{1,2}\s[-\/]\s19[^\d])|((?:^|\D)19\s[-\/]\s\d{1,2}\s[-\/]\s\d{1,2}[^\d])|((?:^|\D)([a-zA-Z]{3}[- \/]+\d{1,2}[- \/]+19[^:\d]))|((?:^|\D)19[- \/][a-zA-Z]{3}[- \/]\d{1,2}[^:\d])|((?:^|\D)\d{1,2}[- \/]+[a-zA-Z]{3}[- \/]+19[^:\d]))"
| stats count last(sample) as sample by search

Please be aware that this is a very hungry, resource intensive search!

Hope this helps ...

cheers, MuS

UPDATE modifications to the regex and the substr() uses the first 128 characters of the event.

View solution in original post

Path Finder

Thank you so much. No more headaches!

0 Karma
State of Splunk Careers

Access the Splunk Careers Report to see real data that shows how Splunk mastery increases your value and job satisfaction.

Find out what your skills are worth!