Splunk Search

How to split field into multiple fields for comparison

owie6466
Explorer

Hello, need help from the experts.

My search results (_raw) is this:
Event
1 minute ago, vmrit-c4ca0001.lm.lmig.com, windows 6.3.9600.

I would like to split this into 3 different fields so I can do some comparison. Something like:

Field1 Field2 Field3
1 minute ago vmrit-c4ca001.lm.lmig.com windows 6.3.9600.

Appreciate any help you can give to this newbie!

Thank you.

0 Karma
1 Solution

rbechtold
Communicator

Hey Owie,

This can easily be accomplished using regular expression to extract the fields from your data (using the rex command)!

Try adding this to your search:

...Base Search...
| rex field=_raw "(?<Field1>[^\,]+)\,(?<Field2>[^\,]+)\,(?<Field3>.*)\."

This should work based on the example data you provided, however if you run into any trouble or it doesn't work let me know and I can help you write a more precise extraction.

Rex command documentation: https://docs.splunk.com/Documentation/Splunk/7.3.1/SearchReference/Rex
Also, if you're interested in learning regex, I recommend checking out this site: https://regexone.com/

I hope this helps!

View solution in original post

0 Karma

rbechtold
Communicator

Hey Owie,

This can easily be accomplished using regular expression to extract the fields from your data (using the rex command)!

Try adding this to your search:

...Base Search...
| rex field=_raw "(?<Field1>[^\,]+)\,(?<Field2>[^\,]+)\,(?<Field3>.*)\."

This should work based on the example data you provided, however if you run into any trouble or it doesn't work let me know and I can help you write a more precise extraction.

Rex command documentation: https://docs.splunk.com/Documentation/Splunk/7.3.1/SearchReference/Rex
Also, if you're interested in learning regex, I recommend checking out this site: https://regexone.com/

I hope this helps!

0 Karma

owie6466
Explorer

hi there, didn't yield any results for me. 😞

0 Karma

rbechtold
Communicator

That is strange!

I'm trying to think of why this wouldn't work. Would you mind sharing your base search with me? I'm wondering if we got rid of the _raw field somewhere in the search.

Here is a search showing how the extraction should have worked on the data you provided (copy and paste this into Splunk):

|makeresults count=1
| eval _raw = "1 minute ago, vmrit-c4ca0001.lm.lmig.com, windows 6.3.9600."
| rex field=_raw "(?<Field1>[^\,]+)\,(?<Field2>[^\,]+)\,(?<Field3>.*)\."
0 Karma

owie6466
Explorer

hi rbechtold here's the base search:

| search index=os sourcetype=chef:csv host="vxkip-v87k6btx" AND source=/home/hab/node_status.csv
| eval n=split(_raw,",")
| eval Name=mvindex(n, 1)
| eval OS=mvindex(n, 2)
| search OS = "windows"
| search OS != "ubuntu"
| where len(Name) > 2
| eval Name=lower(Name)
| eval Name = trim(replace(Name,".lm.lmig.com.",""))
| replace ".lm.lmig.com" WITH "" IN Name
| eval Name = trim(replace(Name,".kc.lmig.com",""))
| eval Name = trim(replace(Name,".lmx.lmig.com",""))
| eval Name = trim(replace(Name,".lmxt.lmig.com",""))
| eval Name = trim(replace(Name,".lmig.com",""))
| eval Name = trim(replace(Name,".lm",""))
| eval Name = trim(replace(Name,".dsm.pin.safeco.com",""))
| table Name OS
| sort Name
| rename Name as host
|join type=left host
[search index=wineventlog* sourcetype=WinEventLog:Application SourceName=Chef
| stats values() as * by host
| rex field=host "(?[^.]+)."
| rex field=_raw "(?[^\,]+)\,(?[^\,]+)\,(?.
)."

| eval host = lower(host)]

0 Karma

rbechtold
Communicator

Ahh, that makes more sense!

We're losing the _raw field with this line:

...
|stats values(*) AS * by host
...

Just as a heads up, I've slightly modified the regex to make it more specific to your dataset.
I have two possible solutions for you --

The first is to put the extraction before the stats command to make sure we have the _raw field available for extracting like this:

...BASE SEARCH...
|join type=left host
[search index=wineventlog* sourcetype=WinEventLog:Application SourceName=Chef 
| rex field=_raw "(?<Field1>\d+\s(?:seconds?|minutes?|hours?|days?)\sago)\,\s(?<Field2>[^\,]+)\,\s(?<Field3>[^\n]+)\."
| stats values(*) as * by host
...

The second solution would be to check to see what fields we're actually working with after we run the stats command.

I would need you to run this for me:

index=wineventlog* sourcetype=WinEventLog:Application SourceName=Chef 
| stats values() as by host

and then tell me what the field name is that contains the logs that look like this:
1 minute ago, vmrit-c4ca0001.lm.lmig.com, windows 6.3.9600.

It's possible we should be extracting from a field other than _raw.

Let me know how it goes!

0 Karma

owie6466
Explorer

hi i ran this:
index=wineventlog* sourcetype=WinEventLog:Application SourceName=Chef
| stats values() by host

search results:
in the statistics tab and events tab there's no field that actually shows what we're looking for.

but if i include the base search coming from the csv file,
search results:
statistics tab does not show any field that includes the log that we're looking for.
in the events tab, _raw format shows the log as the first line. and when i expand a specific event, field "n" has all these information

n
0 minutes ago

vmpit-hctxdb01.lm.lmig.com

windows 6.3.9600

thank you so much for the help!

i thought this was going to be an easy search and chart.

0 Karma

rbechtold
Communicator

Excellent feedback! I appreciate you working with me to figure this out. I think we've arrived at the root of the problem.

I mistakenly assumed the data we were trying to extract was in the joined dataset, but it appears that it's actually coming from the main search!

The "n" fields is created in the first eval pipe, where the _raw is split on commas. This "n" field has all of the data we're looking for -- just in multi-value format.

In order to index these multi-value fields so we can work with them, we use the mvindex. You can see in your base search you have two of these values indexed already -- Name and OS. In order to add the time field to the results, we'll need to mvindex that field as well.

First, try this for me:

| search index=os sourcetype=chef:csv host="vxkip-v87k6btx" AND source=/home/hab/node_status.csv
| eval n=split(_raw,",")
| eval Time_Field=mvindex(n, 0)
| eval Name=mvindex(n, 1)
| eval OS=mvindex(n, 2)
| search OS = "windows"
| search OS != "ubuntu"
| where len(Name) > 2
| eval Name=lower(Name) 
| eval Name = trim(replace(Name,".lm.lmig.com.","")) 
| replace ".lm.lmig.com" WITH "" IN Name 
| eval Name = trim(replace(Name,".kc.lmig.com",""))
| eval Name = trim(replace(Name,".lmx.lmig.com",""))
| eval Name = trim(replace(Name,".lmxt.lmig.com",""))
| eval Name = trim(replace(Name,".lmig.com",""))
| eval Name = trim(replace(Name,".lm",""))
| eval Name = trim(replace(Name,".dsm.pin.safeco.com",""))
| table Time_Field Name OS 
| sort Name
| rename Name as host

Once you've confirmed that your three fields are there, go ahead and add the join statement, and everything should show up as expected.

As a bonus in the case that you're interested, you could use the rex command to accomplish the same thing (in place of the split/mvindex method) like this:

| search index=os sourcetype=chef:csv host="vxkip-v87k6btx" AND source=/home/hab/node_status.csv
 | rex field=_raw "(?<Time_Field>\d+\s(?:seconds?|minutes?|hours?|days?)\sago)\,\s(?<Name>[^\,]+)\,\s(?<OS>[^\n]+)\." 
| search OS = "windows"
| search OS != "ubuntu"
| where len(Name) > 2
| eval Name=lower(Name) 
| eval Name = trim(replace(Name,".lm.lmig.com.","")) 
| replace ".lm.lmig.com" WITH "" IN Name 
| eval Name = trim(replace(Name,".kc.lmig.com",""))
| eval Name = trim(replace(Name,".lmx.lmig.com",""))
| eval Name = trim(replace(Name,".lmxt.lmig.com",""))
| eval Name = trim(replace(Name,".lmig.com",""))
| eval Name = trim(replace(Name,".lm",""))
| eval Name = trim(replace(Name,".dsm.pin.safeco.com",""))
| table Time_Field Name OS 
| sort Name
| rename Name as host

You also have much more flexibility with regular expressions, but both ways should work for this scenario.

I apologize for misunderstanding the question, I hope this is the solution that ends up working for you!

0 Karma

owie6466
Explorer

i got the Time_Field showing now! thank you!

looks i have to strip the Time_Field to only get the number and then do a comparison

you're great! might have to ask you one more thing. i will run my search and will let you know!

0 Karma

owie6466
Explorer

here's what the Time_Field looks like.

Time_Field
0 minutes ago

now i have to take '0' and compare if >4 then Process_Down

0 Karma

rbechtold
Communicator

Perfect! I'm glad to hear everything is working as expected.

In order to convert the string "x minutes ago" into an integer we can work with, we can use a combination of rex to extract the data we need from the Time_Field, and eval to create a new "Converted_Minutes" field based on those values.

We'll then use another eval statement to create a "Status" field that decides if the process is up or down based on the output of the Converted_Minutes field.

Go ahead and change your base search to this:

| search index=os sourcetype=chef:csv host="vxkip-v87k6btx" AND source=/home/hab/node_status.csv 
| eval n=split(_raw,",") 
| eval Time_Field=mvindex(n, 0) 
| eval Name=mvindex(n, 1) 
| eval OS=mvindex(n, 2) 
| search OS = "windows" 
| search OS != "ubuntu" 
| where len(Name) > 2 
| eval Name=lower(Name) 
| eval Name = trim(replace(Name,".lm.lmig.com.","")) 
| replace ".lm.lmig.com" WITH "" IN Name 
| eval Name = trim(replace(Name,".kc.lmig.com","")) 
| eval Name = trim(replace(Name,".lmx.lmig.com","")) 
| eval Name = trim(replace(Name,".lmxt.lmig.com","")) 
| eval Name = trim(replace(Name,".lmig.com","")) 
| eval Name = trim(replace(Name,".lm","")) 
| eval Name = trim(replace(Name,".dsm.pin.safeco.com","")) 
| rex field=Time_Field "(?<digit>\d+)\s(?<time_unit>second|minute|hour|day)s?" 
| eval Converted_Minutes = if(time_unit = "second", round(digit / 60,0), if(time_unit = "minute", digit, if(time_unit = "hour", digit * 60, if(time_unit = "day", digit * 1440, "Time Length not Accounted For")))) 
| eval Status = if(Converted_Minutes > 4, "Process_Down", "Process_Up") 
| table Time_Field Converted_Minutes Status Name OS
| sort Name 
| rename Name as host

The Converted_Minutes field will translate any time from the Time_Field (Seconds, Minutes, Hours, Days) into minutes. You can use this field for visualizations and comparisons since it will always output as an integer.

You should be able to add your join after you confirm the "Converted_Minutes" field and the "Status" field are showing up properly!

0 Karma

owie6466
Explorer

YOU ARE AWESOME! this is working perfectly now! thank you so much for all the help and patience!

owie6466
Explorer

thanks rbechtold! i will go ahead and try this one. will also check out the site you recommended.

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...