Splunk Search

With regex, can you help us extract the first word that comes after the timestamp?

zacksoft
Contributor

I wanted to extract the first word that comes after the timestamp.

The time stamps are of varied formats

example event1 :

2019-02-05 11:89:17,642 EST BROCOD bla bla bla ......

example event2 :

2019-02-05 19:35:18,642 MARC bla bla bla........

I wanted to parse BROCOD and MARC

I tried the following....it should work..but I'm not sure why it is not showing me any result

| rex "^(?:[^ \n]* ){3}(?P<level>\w+)" | table  level 
0 Karma
1 Solution

horsefez
Motivator

Hey zacksoft,

this one is a bit complicated as you can never be sure if ther will be an abbreviated timezone or not.

https://regex101.com/r/n1RYOu/2

So I found this solution for you, which might look a bit convuluted at first, but basically matches all the possible time-zone-abbreviations we have at the moment. And only, if they are there.

So please give it a careful look and ask me questions about it if you have any.

Regards,
pyro_wood

View solution in original post

Vijeta
Influencer

I tried below and worked for me

rex field=x "\d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2},\d{3}\s{0,1}\w{0,3}\s(?<level>\w+)"

Example-

|makeresults| eval x="2019-02-05 11:89:17,642 EST BROCOD bla bla bla" |appendpipe[|eval x="2019-02-05 19:35:18,642 MARC bla bla bla"]| rex field=x "\d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2},\d{3}\s{0,1}\w{0,3}\s(?<level>\w+)"
0 Karma

zacksoft
Contributor

Thanks Vijeta....
I am wondering how to implement it....
Instead of .......|appendpipe[|eval x="2019-02-05 19: ...........
I replaced with ...|appendpipe[|eval x=_raw ...........
so it will scan it all events ...but it gives many errors

index=myIndex host=myhost sourcetype="my.source.type"  |makeresults| eval x=_raw |appendpipe[|eval x=_raw]| rex field=x "\d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2},\d{3}\s{0,1}\w{0,3}\s(?<level>\w+)" | table level
0 Karma

Vijeta
Influencer

@zacksoft - did you try the below

You need not use makeresults, it was just for creating sample events for me. Your query can be-

index=myIndex host=myhost sourcetype="my.source.type"  |rex field=_raw "\d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2},\d{3}\s{0,1}\w{0,3}\s(?<level>\w+)" | table level
0 Karma

horsefez
Motivator

Hey zacksoft,

this one is a bit complicated as you can never be sure if ther will be an abbreviated timezone or not.

https://regex101.com/r/n1RYOu/2

So I found this solution for you, which might look a bit convuluted at first, but basically matches all the possible time-zone-abbreviations we have at the moment. And only, if they are there.

So please give it a careful look and ask me questions about it if you have any.

Regards,
pyro_wood

zacksoft
Contributor

Thanks @horsefez

Just to confirm this is the regex right ? I am a bit new to this regex arena !!

index=DEMOhost=anything sourcetype="something.something"
rex "^\d{4}-\d{2}-\d{2}\s*\d{2}:\d{2}:\d{2},\d+\s(?:\b(?:ACDT|ACST|ACT|ACT|ACWST|ADT|AEDT|AEST|AFT|AKDT|AKST|AMST|AMT|AMT|ART|AST|AST|AWST|AZOST|AZOT|AZT|BDT|BIOT|BIT|BOT|BRST|BRT|BST|BST|BST|BTT|CAT|CCT|CDT|CDT|CEST|CET|CHADT|CHAST|CHOT|CHOST|CHST|CHUT|CIST|CIT|CKT|CLST|CLT|COST|COT|CST|CST|CST|CT|CVT|CWST|CXT|DAVT|DDUT|DFT|EASST|EAST|EAT|ECT|ECT|EDT|EEST|EET|EGST|EGT|EIT|EST|FET|FJT|FKST|FKT|FNT|GALT|GAMT|GET|GFT|GILT|GIT|GMT|GST|GST|GYT|HDT|HAEC|HST|HKT|HMT|HOVST|HOVT|ICT|IDLW|IDT|IOT|IRDT|IRKT|IRST|IST|IST|IST|JST|KALT|KGT|KOST|KRAT|KST|LHST|LHST|LINT|MAGT|MART|MAWT|MDT|MET|MEST|MHT|MIST|MIT|MMT|MSK|MST|MST|MUT|MVT|MYT|NCT|NDT|NFT|NPT|NST|NT|NUT|NZDT|NZST|OMST|ORAT|PDT|PET|PETT|PGT|PHOT|PHT|PKT|PMDT|PMST|PONT|PST|PST|PYST|PYT|RET|ROTT|SAKT|SAMT|SAST|SBT|SCT|SDT|SGT|SLST|SRET|SRT|SST|SST|SYOT|TAHT|THA|TFT|TJT|TKT|TLT|TMT|TRT|TOT|TVT|ULAST|ULAT|UTC|UYST|UYT|UZT|VET|VLAT|VOLT|VOST|VUT|WAKT|WAST|WAT|WEST|WET|WIT|WST|YAKT|YEKT)\b\s*)?(?\w+)"
| table match

If, yes I tried this..but it yielded no result !!! 😞

0 Karma

horsefez
Motivator

Hi @zacksoft,

try this one and tell me if it works.

index=DEMO host=anything sourcetype=something 
| rex "^\d{4}-\d{2}-\d{2}\s*\d{2}:\d{2}:\d{2},\d+\s(?:\b(?:ACDT|ACST|ACT|ACT|ACWST|ADT|AEDT|AEST|AFT|AKDT|AKST|AMST|AMT|AMT|ART|AST|AST|AWST|AZOST|AZOT|AZT|BDT|BIOT|BIT|BOT|BRST|BRT|BST|BST|BST|BTT|CAT|CCT|CDT|CDT|CEST|CET|CHADT|CHAST|CHOT|CHOST|CHST|CHUT|CIST|CIT|CKT|CLST|CLT|COST|COT|CST|CST|CST|CT|CVT|CWST|CXT|DAVT|DDUT|DFT|EASST|EAST|EAT|ECT|ECT|EDT|EEST|EET|EGST|EGT|EIT|EST|FET|FJT|FKST|FKT|FNT|GALT|GAMT|GET|GFT|GILT|GIT|GMT|GST|GST|GYT|HDT|HAEC|HST|HKT|HMT|HOVST|HOVT|ICT|IDLW|IDT|IOT|IRDT|IRKT|IRST|IST|IST|IST|JST|KALT|KGT|KOST|KRAT|KST|LHST|LHST|LINT|MAGT|MART|MAWT|MDT|MET|MEST|MHT|MIST|MIT|MMT|MSK|MST|MST|MUT|MVT|MYT|NCT|NDT|NFT|NPT|NST|NT|NUT|NZDT|NZST|OMST|ORAT|PDT|PET|PETT|PGT|PHOT|PHT|PKT|PMDT|PMST|PONT|PST|PST|PYST|PYT|RET|ROTT|SAKT|SAMT|SAST|SBT|SCT|SDT|SGT|SLST|SRET|SRT|SST|SST|SYOT|TAHT|THA|TFT|TJT|TKT|TLT|TMT|TRT|TOT|TVT|ULAST|ULAT|UTC|UYST|UYT|UZT|VET|VLAT|VOLT|VOST|VUT|WAKT|WAST|WAT|WEST|WET|WIT|WST|YAKT|YEKT)\b\s*)?(?<level>\w+)"
0 Karma

zacksoft
Contributor

@pyro_wood - This is the most insane looking query. But it is awesome.. it works perfectly ......
You're a genius. Thank you very much.

0 Karma

horsefez
Motivator

@zacksoft,

I agree that it looks complicated at first and I'm glad that it works out for you.

But it's not so complicated.
I will explain to you why it isn't as complicated as it might look.
^ this is called an anchor, and points to the start of the line (will always be there)
\d{4}-\d{2}-\d{2}\s*\d{2}:\d{2}:\d{2},\d+\s* this traverses over the date and timefields (will always be there)
(?:\b(?:ACDT|ACST|ACT|ACWST...|BOT|...|WST|YAKT|YEKT)\b\s*)? this will look for a valid timezone abbreviation. A list of all valid timezone abbreviations I found on the web.
It basically is a OR-list. If it doesn't find ACDT, it will look if it finds ACST, if not it looks if it finds ACT and so on. The very last ? question mark makes the entire statement that is encased in paranteshis optional. It means, that the timezone might be there or not. (optional)
(?<level>\w+) regardless of the existence of the optional timezone field, the field that matches your text comes afterwards (will always be there)

You might have notice the \b in the regex. \b marks a word-boundary. Long story short it makes sure that the timezone matching instruction doesn't match words like for example "ACTION", "BOTTOM", "PETS", "PHOTO" or "WESTWARDS".

Hope this helps a bit.
Regards,
pyro_wood

0 Karma

zacksoft
Contributor

Thanks for explaining each step. Now I understand.

0 Karma

lakshman239
Influencer

You can check this out - https://regex101.com/r/cQF8aS/1
You need something like

^.*,\d+\s+(?:EST)?\s?(?\w+)

0 Karma

zacksoft
Contributor

Thanks Lakshman.
When I try this it says "unrecognized character after (? or (?-"
Also what is the field name where the extraction is getting stored at?

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...