Splunk Search

How to extract two fields from a group?

kmhanson
Explorer

I am new to Regex expressions and trying to figure them out.

I am trying to extract two sections of the following log field: 

5002:fromhost=999.99.99.99:fromport=3299:sid=92ac3498-d95d-11ed-af19-92eb6037d638:respcode=OK:resptime=7:node=999999ss03:nodePort=5002:cosId=asasasa

I want the IP address that shows after fromhost and the COSID value asasasa at the end of the field and not having much luck

Labels (1)
0 Karma

woodcock
Esteemed Legend

Install the TA.  It will do all of this.

Otherwise you can do this:
... | rex "fromhost=(?<fromhost>[^=:]+).*:cosId=(?<cosId>.*)$"

But it would be better to setup a sourcetype-based global extraction (which the TA surely does), like this:
(?<_KEY_1>[^=:]+)=(?<_VAL_1>[^=:]+)

0 Karma

yuanliu
SplunkTrust
SplunkTrust

Is there any reason why you must use regex?  For rigidly formatted strings like this, the easiest - in fact the cheapest solution is kv aka extract.  Assuming your field name is log:

 

| rename _raw as temp, log as _raw
| kv pairdelim=":" kvdelim="="
| rename _raw as log, temp as _raw

 

Your sample data should give you

cosIdfromhostfromportnodenodePortrespcoderesptimesid
asasasa999.99.99.993299999999ss035002OK792ac3498-d95d-11ed-af19-92eb6037d638

 

Tags (1)
0 Karma

kmhanson
Explorer

so is the full command: | rex field=port mode=sed fromhost=(?<fromhost>[^:]+) 

0 Karma

dtsariapkin
Splunk Employee
Splunk Employee

@kmhanson 
1) If you adamant in doing it all in single expression. You can do it like that:
fromhost=(?<fromhost>[^:]+)(.*cosId=(?<cosid>.*))?

Notice I put second part in brackets and put question mark at the end. That means that whatever is in parenthesis before can match once or not match at all. 

2) stick with the basic mode first. SED is for replacing things.
3) And you do not want field port do you? Not sure it does not exactly state that. Or I am being stupid. 
4) So I would assume you will be extracting from RAW log -> Original log. 

And your final test search would be:

| rex field=_raw "fromhost=(?<fromhost>[^:]+)" | rex field=_raw "cosId=(?<cosid>.*)" 


OR! 

| rex field=_raw "fromhost=(?<fromhost>[^:]+)(.*cosId=(?<cosid>.*))?" 



All in all in this command you say from which field you want to extract. "_raw" gives you the whole event. And then you place Regular expression inside the quotes. 

If you find any of the solutions good. Do not forget to mark it as answered/solved. 

Dmitrii T.
0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

No - mode=sed is for stream editing, which is not required when you are just extracting fields, and assuming you have already extract the port field holding all this information (which was clear from your original post)

| rex field=port "fromhost=(?<fromhost>[^:]+)"

 

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

fromhost=(?<fromhost>[^:]+).*cosId=(?<cosid>.*)

https://regex101.com/r/rZq5Gn/1

0 Karma

dtsariapkin
Splunk Employee
Splunk Employee

That is very elegant solution by @ITWhisperer here. Depending on how many logs you have and how far you go with your REGEX learning you might want to start doing a bit more defined groups too e.g.:
fromhost=(?<fromhost>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}).*cosId=(?<cosid>[^\s]+)

Which now looks for exact pattern of the IP address. Or something more convoluted but doing same thing:
fromhost=(?<fromhost>(\d{1,3}(\.)?){4}).+cosId=(?<cosid>[^\s]+)

Where numbers placed in curly brackets tell you how much the preceding pattern would repeat:
e.g {1,3} from one to 3 times. As in \d{1,3} means any digit one or three times like in the IP address. 

Why things like that important? Is because the more you start work with Splunk and the more events you parse that way. The more patterns of .* might start to impact you and hogging your CPU. Seen too many issues when the patterns start to go out of control. 

Good luck and welcome to the wonderful world of Regular Expressions. 

Dmitrii T.
0 Karma

kmhanson
Explorer

I have hundreds of thousands of logs. I can put it in excel but would rather get it done in sprint so the rest of the team can run the same command.  It is more important to get that ip address, the COSID isn't so important

Tags (1)
0 Karma

kmhanson
Explorer

If I just want the IP address and not the COSID, what do I cut out?  Turns out COSID isn't always there

 

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Use two separate expressions

fromhost=(?<fromhost>[^:]+)

cosId=(?<cosid>.*)

That way, you will get the field if the anchor matches, and it will be null if the anchor isn't found

0 Karma

kmhanson
Explorer

rex field=user mode=sed and then the expression?

 

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

No - mode=sed is for stream editing, which is not required when you are just extracting fields

0 Karma

kmhanson
Explorer

I did play and get it to work.  A big help. Thanks so much to both of you for the help.

0 Karma

dtsariapkin
Splunk Employee
Splunk Employee

@kmhanson Good to know. Don't forget to mark the answer as solution. And give people who helped Karma. That keeps them going! 

Dmitrii T.
0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...