Splunk Search

How to split a string into multiple fields using whitespace as delimiter

cedmarjls32
New Member

Hi,
I have a field called "details" with the following value:

details

GAP 16 GAP PLI 31                               
MR 400 AGAP V41.81 PLI 31                               
MR 400 AGAP V39.80 PLI 31                               
MR 300 AGAP V89.96 PLI 31                               
MR 400 AGAP V89.95 Oos.                                 
MR 400 AGAP V89.95 PLI 31                               
MR 400 AGAP V89.95 PLI 31                               
MR 400 AGAP V40.80 Oos.                                 
MR 300 AGAP V87.91 PLI 31                               
MR 300 AGAP V87.91 PLI 31                               
Mobile 100 AGAP V64.60 PLI 31                               
GAP GAP PLI 31                              
Mobile 200 AGAP V51.43 PLI 31                               
MR 400 AGAP V40.80 PLI 31                               
Mobile 200 AGAP V52.43 PLI 31                               
Mobile 200 AGAP V51.43 PLI 31                               
TSC UA V01.3C PLI 31                                

I'd like to split these values into several fields and regroup them so that the final result would look like this:

alt text

I tried using ..| eval temp=split(details," ") | eval field1=mvindex(temp,0) | etc.. but it looked dirty and wrong..

Would a regular expression do the trick?

Tags (4)
0 Karma
1 Solution

richgalloway
SplunkTrust
SplunkTrust

If you can make some assumptions about the values in the Details field, the following regex string will parse it, at least using your sample data.

(?P<field1>.*) (?P<field2>GAP|AGAP|UA) (?P<field3>\w+\.\w+)?\s*(?P<field4>.*$)
---
If this reply helps you, Karma would be appreciated.

View solution in original post

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

While that may indeed be a CSV file, the data in question appears to be in one column with no separators between your individual values. You're in deep deep ahem

0 Karma

richgalloway
SplunkTrust
SplunkTrust

If you can make some assumptions about the values in the Details field, the following regex string will parse it, at least using your sample data.

(?P<field1>.*) (?P<field2>GAP|AGAP|UA) (?P<field3>\w+\.\w+)?\s*(?P<field4>.*$)
---
If this reply helps you, Karma would be appreciated.
0 Karma

cedmarjls32
New Member

Ok, just found the solution using richgalloways regex string with split/mvindex commands

  ..  | rex field=details "(?P<field1>.*) (?P<field2>GAP|AGAP|UA) (?P<field3>\w+\.\w+)?\s*(?P<field4>.*$)" | eval temp=split(field4,"  ") | eval field5=mvindex(temp,0) | eval field6=mvindex(temp,1) | table details field1 field2 field5 field6 | fillnull | eval field7=if(field6=0, "PLI 31", 'field6') | replace "PLI 31" with " " in field5 | table details field1 field2 field5 field7

alt text

0 Karma

aholzel
Communicator

Like martin_mueller already said, your data is ambiguous there is no way to create a catch all regex, also split wil not work correct I think (at least it will not deliver the data in the form that you want). The problem is mainly in rows 1, 12 and 17.
Row 1: misses a field and there is no way to determine that because there is just one space between field 2 and 4.
- Split will probably have this problem to.
Row 17: The layout of the first field is different than in all the other fields, all other fields are < word >< space >< digit > these two are just < word >
- Again split will probably also have this problem because these rows have "less fields" (based on the amount of spaces in the row).
Row 12: Is a combination of the above two points, the layout of the first field is different and it is missing a field
If the above problems are resolved by for example filling empty fields with a "-", making sure that the first field always has the same layout, OR by putting a field separator in the data life will be al lot easier... till that time the below regex will hit on most rows correct (it will also hit on row 1...)

For rows 2-11 and 13-16:

(?ms)^(?<field1>\w+\s\d+)\s(?<field2>[^\s]+)\s(?<field3>[^\s]+)\s(?<field4>.+?)$
0 Karma

cedmarjls32
New Member

These values comes from a csv file which is formatted as follows:

alt text

This csv file is generated by a script running on a device, then is indexed by Splunk so there's no way for me to make any modification.

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

It means your data is ambiguous. Do you have a source with commas, tabs, etc. separating the values or quoted values to distinguish spaces within a value from spaces separating two values?

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

A regular expression cannot distinguish a value that contains two words separated by a space from two one-word values separated by a space.

0 Karma

cedmarjls32
New Member

So it means I'm stuck with the split / mvindex command then?

0 Karma
Get Updates on the Splunk Community!

Enterprise Security Content Update (ESCU) | New Releases

In September, the Splunk Threat Research Team had two releases of new security content via the Enterprise ...

New in Observability - Improvements to Custom Metrics SLOs, Log Observer Connect & ...

The latest enhancements to the Splunk observability portfolio deliver improved SLO management accuracy, better ...

Improve Data Pipelines Using Splunk Data Management

  Register Now   This Tech Talk will explore the pipeline management offerings Edge Processor and Ingest ...