Splunk Search
Highlighted

With regex, can you help us extract the first word that comes after the timestamp?

Communicator

I wanted to extract the first word that comes after the timestamp.

The time stamps are of varied formats

example event1 :

2019-02-05 11:89:17,642 EST BROCOD bla bla bla ......

example event2 :

2019-02-05 19:35:18,642 MARC bla bla bla........

I wanted to parse BROCOD and MARC

I tried the following....it should work..but I'm not sure why it is not showing me any result

| rex "^(?:[^ \n]* ){3}(?P<level>\w+)" | table  level 
0 Karma
Highlighted

Re: With regex, can you help us extract the first word that comes after the timestamp?

SplunkTrust
SplunkTrust

You can check this out - https://regex101.com/r/cQF8aS/1
You need something like

^.*,\d+\s+(?:EST)?\s?(?\w+)

0 Karma
Highlighted

Re: With regex, can you help us extract the first word that comes after the timestamp?

Communicator

Thanks Lakshman.
When I try this it says "unrecognized character after (? or (?-"
Also what is the field name where the extraction is getting stored at?

0 Karma
Highlighted

Re: With regex, can you help us extract the first word that comes after the timestamp?

SplunkTrust
SplunkTrust

Hey zacksoft,

this one is a bit complicated as you can never be sure if ther will be an abbreviated timezone or not.

https://regex101.com/r/n1RYOu/2

So I found this solution for you, which might look a bit convuluted at first, but basically matches all the possible time-zone-abbreviations we have at the moment. And only, if they are there.

So please give it a careful look and ask me questions about it if you have any.

Regards,
pyro_wood

View solution in original post

Highlighted

Re: With regex, can you help us extract the first word that comes after the timestamp?

Communicator

Thanks @pyro_wood

Just to confirm this is the regex right ? I am a bit new to this regex arena !!

index=DEMOhost=anything sourcetype="something.something"
rex "^\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2},\d+\s(?:\b(?:ACDT|ACST|ACT|ACT|ACWST|ADT|AEDT|AEST|AFT|AKDT|AKST|AMST|AMT|AMT|ART|AST|AST|AWST|AZOST|AZOT|AZT|BDT|BIOT|BIT|BOT|BRST|BRT|BST|BST|BST|BTT|CAT|CCT|CDT|CDT|CEST|CET|CHADT|CHAST|CHOT|CHOST|CHST|CHUT|CIST|CIT|CKT|CLST|CLT|COST|COT|CST|CST|CST|CT|CVT|CWST|CXT|DAVT|DDUT|DFT|EASST|EAST|EAT|ECT|ECT|EDT|EEST|EET|EGST|EGT|EIT|EST|FET|FJT|FKST|FKT|FNT|GALT|GAMT|GET|GFT|GILT|GIT|GMT|GST|GST|GYT|HDT|HAEC|HST|HKT|HMT|HOVST|HOVT|ICT|IDLW|IDT|IOT|IRDT|IRKT|IRST|IST|IST|IST|JST|KALT|KGT|KOST|KRAT|KST|LHST|LHST|LINT|MAGT|MART|MAWT|MDT|MET|MEST|MHT|MIST|MIT|MMT|MSK|MST|MST|MUT|MVT|MYT|NCT|NDT|NFT|NPT|NST|NT|NUT|NZDT|NZST|OMST|ORAT|PDT|PET|PETT|PGT|PHOT|PHT|PKT|PMDT|PMST|PONT|PST|PST|PYST|PYT|RET|ROTT|SAKT|SAMT|SAST|SBT|SCT|SDT|SGT|SLST|SRET|SRT|SST|SST|SYOT|TAHT|THA|TFT|TJT|TKT|TLT|TMT|TRT|TOT|TVT|ULAST|ULAT|UTC|UYST|UYT|UZT|VET|VLAT|VOLT|VOST|VUT|WAKT|WAST|WAT|WEST|WET|WIT|WST|YAKT|YEKT)\b\s)?(?\w+)"
| table match

If, yes I tried this..but it yielded no result !!! 😞

0 Karma
Highlighted

Re: With regex, can you help us extract the first word that comes after the timestamp?

SplunkTrust
SplunkTrust

Hi @zacksoft,

try this one and tell me if it works.

index=DEMO host=anything sourcetype=something 
| rex "^\d{4}-\d{2}-\d{2}\s*\d{2}:\d{2}:\d{2},\d+\s(?:\b(?:ACDT|ACST|ACT|ACT|ACWST|ADT|AEDT|AEST|AFT|AKDT|AKST|AMST|AMT|AMT|ART|AST|AST|AWST|AZOST|AZOT|AZT|BDT|BIOT|BIT|BOT|BRST|BRT|BST|BST|BST|BTT|CAT|CCT|CDT|CDT|CEST|CET|CHADT|CHAST|CHOT|CHOST|CHST|CHUT|CIST|CIT|CKT|CLST|CLT|COST|COT|CST|CST|CST|CT|CVT|CWST|CXT|DAVT|DDUT|DFT|EASST|EAST|EAT|ECT|ECT|EDT|EEST|EET|EGST|EGT|EIT|EST|FET|FJT|FKST|FKT|FNT|GALT|GAMT|GET|GFT|GILT|GIT|GMT|GST|GST|GYT|HDT|HAEC|HST|HKT|HMT|HOVST|HOVT|ICT|IDLW|IDT|IOT|IRDT|IRKT|IRST|IST|IST|IST|JST|KALT|KGT|KOST|KRAT|KST|LHST|LHST|LINT|MAGT|MART|MAWT|MDT|MET|MEST|MHT|MIST|MIT|MMT|MSK|MST|MST|MUT|MVT|MYT|NCT|NDT|NFT|NPT|NST|NT|NUT|NZDT|NZST|OMST|ORAT|PDT|PET|PETT|PGT|PHOT|PHT|PKT|PMDT|PMST|PONT|PST|PST|PYST|PYT|RET|ROTT|SAKT|SAMT|SAST|SBT|SCT|SDT|SGT|SLST|SRET|SRT|SST|SST|SYOT|TAHT|THA|TFT|TJT|TKT|TLT|TMT|TRT|TOT|TVT|ULAST|ULAT|UTC|UYST|UYT|UZT|VET|VLAT|VOLT|VOST|VUT|WAKT|WAST|WAT|WEST|WET|WIT|WST|YAKT|YEKT)\b\s*)?(?<level>\w+)"
0 Karma
Highlighted

Re: With regex, can you help us extract the first word that comes after the timestamp?

Communicator

@pyro_wood - This is the most insane looking query. But it is awesome.. it works perfectly ......
You're a genius. Thank you very much.

0 Karma
Highlighted

Re: With regex, can you help us extract the first word that comes after the timestamp?

SplunkTrust
SplunkTrust

@zacksoft,

I agree that it looks complicated at first and I'm glad that it works out for you.

But it's not so complicated.
I will explain to you why it isn't as complicated as it might look.
^ this is called an anchor, and points to the start of the line (will always be there)
\d{4}-\d{2}-\d{2}\s*\d{2}:\d{2}:\d{2},\d+\s* this traverses over the date and timefields (will always be there)
(?:\b(?:ACDT|ACST|ACT|ACWST...|BOT|...|WST|YAKT|YEKT)\b\s*)? this will look for a valid timezone abbreviation. A list of all valid timezone abbreviations I found on the web.
It basically is a OR-list. If it doesn't find ACDT, it will look if it finds ACST, if not it looks if it finds ACT and so on. The very last ? question mark makes the entire statement that is encased in paranteshis optional. It means, that the timezone might be there or not. (optional)
(?<level>\w+) regardless of the existence of the optional timezone field, the field that matches your text comes afterwards (will always be there)

You might have notice the \b in the regex. \b marks a word-boundary. Long story short it makes sure that the timezone matching instruction doesn't match words like for example "ACTION", "BOTTOM", "PETS", "PHOTO" or "WESTWARDS".

Hope this helps a bit.
Regards,
pyro_wood

0 Karma
Highlighted

Re: With regex, can you help us extract the first word that comes after the timestamp?

Communicator

Thanks for explaining each step. Now I understand.

0 Karma
Highlighted

Re: With regex, can you help us extract the first word that comes after the timestamp?

Influencer

I tried below and worked for me

rex field=x "\d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2},\d{3}\s{0,1}\w{0,3}\s(?<level>\w+)"

Example-

|makeresults| eval x="2019-02-05 11:89:17,642 EST BROCOD bla bla bla" |appendpipe[|eval x="2019-02-05 19:35:18,642 MARC bla bla bla"]| rex field=x "\d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2},\d{3}\s{0,1}\w{0,3}\s(?<level>\w+)"
0 Karma