Solved: Re: How to split a string into multiple fields usi...

cedmarjls32 · ‎06-26-2015

Hi,
I have a field called "details" with the following value:

details

GAP 16 GAP PLI 31                               
MR 400 AGAP V41.81 PLI 31                               
MR 400 AGAP V39.80 PLI 31                               
MR 300 AGAP V89.96 PLI 31                               
MR 400 AGAP V89.95 Oos.                                 
MR 400 AGAP V89.95 PLI 31                               
MR 400 AGAP V89.95 PLI 31                               
MR 400 AGAP V40.80 Oos.                                 
MR 300 AGAP V87.91 PLI 31                               
MR 300 AGAP V87.91 PLI 31                               
Mobile 100 AGAP V64.60 PLI 31                               
GAP GAP PLI 31                              
Mobile 200 AGAP V51.43 PLI 31                               
MR 400 AGAP V40.80 PLI 31                               
Mobile 200 AGAP V52.43 PLI 31                               
Mobile 200 AGAP V51.43 PLI 31                               
TSC UA V01.3C PLI 31

I'd like to split these values into several fields and regroup them so that the final result would look like this:

alt text

I tried using ..| eval temp=split(details," ") | eval field1=mvindex(temp,0) | etc.. but it looked dirty and wrong..

Would a regular expression do the trick?

richgalloway · ‎06-26-2015

If you can make some assumptions about the values in the Details field, the following regex string will parse it, at least using your sample data.

(?P<field1>.*) (?P<field2>GAP|AGAP|UA) (?P<field3>\w+\.\w+)?\s*(?P<field4>.*$)

---
If this reply helps you, Karma would be appreciated.

View solution in original post

martin_mueller · ‎06-26-2015

While that may indeed be a CSV file, the data in question appears to be in one column with no separators between your individual values. You're in deep deep ahem

richgalloway · ‎06-26-2015

If you can make some assumptions about the values in the Details field, the following regex string will parse it, at least using your sample data.

(?P<field1>.*) (?P<field2>GAP|AGAP|UA) (?P<field3>\w+\.\w+)?\s*(?P<field4>.*$)

---
If this reply helps you, Karma would be appreciated.

cedmarjls32 · ‎06-26-2015

Ok, just found the solution using richgalloways regex string with split/mvindex commands

  ..  | rex field=details "(?P<field1>.*) (?P<field2>GAP|AGAP|UA) (?P<field3>\w+\.\w+)?\s*(?P<field4>.*$)" | eval temp=split(field4,"  ") | eval field5=mvindex(temp,0) | eval field6=mvindex(temp,1) | table details field1 field2 field5 field6 | fillnull | eval field7=if(field6=0, "PLI 31", 'field6') | replace "PLI 31" with " " in field5 | table details field1 field2 field5 field7

alt text

aholzel · ‎06-26-2015

Like martin_mueller already said, your data is ambiguous there is no way to create a catch all regex, also split wil not work correct I think (at least it will not deliver the data in the form that you want). The problem is mainly in rows 1, 12 and 17.
Row 1: misses a field and there is no way to determine that because there is just one space between field 2 and 4.
- Split will probably have this problem to.
Row 17: The layout of the first field is different than in all the other fields, all other fields are < word >< space >< digit > these two are just < word >
- Again split will probably also have this problem because these rows have "less fields" (based on the amount of spaces in the row).
Row 12: Is a combination of the above two points, the layout of the first field is different and it is missing a field
If the above problems are resolved by for example filling empty fields with a "-", making sure that the first field always has the same layout, OR by putting a field separator in the data life will be al lot easier... till that time the below regex will hit on most rows correct (it will also hit on row 1...)

For rows 2-11 and 13-16:

(?ms)^(?<field1>\w+\s\d+)\s(?<field2>[^\s]+)\s(?<field3>[^\s]+)\s(?<field4>.+?)$

cedmarjls32 · ‎06-26-2015

These values comes from a csv file which is formatted as follows:

alt text

This csv file is generated by a script running on a device, then is indexed by Splunk so there's no way for me to make any modification.

martin_mueller · ‎06-26-2015

It means your data is ambiguous. Do you have a source with commas, tabs, etc. separating the values or quoted values to distinguish spaces within a value from spaces separating two values?

martin_mueller · ‎06-26-2015

A regular expression cannot distinguish a value that contains two words separated by a space from two one-word values separated by a space.

cedmarjls32 · ‎06-26-2015

So it means I'm stuck with the split / mvindex command then?

How to split a string into multiple fields using whitespace as delimiter

Pro Tips for First-Time .conf Attendees: Advice from SplunkTrust

Raise Your Skills at the .conf25 Builder Bar: Your Splunk Developer Destination

Hunt Smarter, Not Harder: Discover New SPL “Recipes” in Our Threat Hunting Webinar

Are you a member of the Splunk Community?

How to split a string into multiple fields using whitespace as delimiter

Pro Tips for First-Time .conf Attendees: Advice from SplunkTrust

Raise Your Skills at the .conf25 Builder Bar: Your Splunk Developer Destination

Hunt Smarter, Not Harder: Discover New SPL “Recipes” in Our Threat Hunting Webinar