Hi,
I have a field called "details" with the following value:
details
GAP 16 GAP PLI 31
MR 400 AGAP V41.81 PLI 31
MR 400 AGAP V39.80 PLI 31
MR 300 AGAP V89.96 PLI 31
MR 400 AGAP V89.95 Oos.
MR 400 AGAP V89.95 PLI 31
MR 400 AGAP V89.95 PLI 31
MR 400 AGAP V40.80 Oos.
MR 300 AGAP V87.91 PLI 31
MR 300 AGAP V87.91 PLI 31
Mobile 100 AGAP V64.60 PLI 31
GAP GAP PLI 31
Mobile 200 AGAP V51.43 PLI 31
MR 400 AGAP V40.80 PLI 31
Mobile 200 AGAP V52.43 PLI 31
Mobile 200 AGAP V51.43 PLI 31
TSC UA V01.3C PLI 31
I'd like to split these values into several fields and regroup them so that the final result would look like this:
I tried using ..| eval temp=split(details," ") | eval field1=mvindex(temp,0) | etc..
but it looked dirty and wrong..
Would a regular expression do the trick?
If you can make some assumptions about the values in the Details field, the following regex string will parse it, at least using your sample data.
(?P<field1>.*) (?P<field2>GAP|AGAP|UA) (?P<field3>\w+\.\w+)?\s*(?P<field4>.*$)
While that may indeed be a CSV file, the data in question appears to be in one column with no separators between your individual values. You're in deep deep ahem
If you can make some assumptions about the values in the Details field, the following regex string will parse it, at least using your sample data.
(?P<field1>.*) (?P<field2>GAP|AGAP|UA) (?P<field3>\w+\.\w+)?\s*(?P<field4>.*$)
Ok, just found the solution using richgalloways regex string with split/mvindex commands
.. | rex field=details "(?P<field1>.*) (?P<field2>GAP|AGAP|UA) (?P<field3>\w+\.\w+)?\s*(?P<field4>.*$)" | eval temp=split(field4," ") | eval field5=mvindex(temp,0) | eval field6=mvindex(temp,1) | table details field1 field2 field5 field6 | fillnull | eval field7=if(field6=0, "PLI 31", 'field6') | replace "PLI 31" with " " in field5 | table details field1 field2 field5 field7
Like martin_mueller already said, your data is ambiguous there is no way to create a catch all regex, also split wil not work correct I think (at least it will not deliver the data in the form that you want). The problem is mainly in rows 1, 12 and 17.
Row 1: misses a field and there is no way to determine that because there is just one space between field 2 and 4.
- Split will probably have this problem to.
Row 17: The layout of the first field is different than in all the other fields, all other fields are < word >< space >< digit > these two are just < word >
- Again split will probably also have this problem because these rows have "less fields" (based on the amount of spaces in the row).
Row 12: Is a combination of the above two points, the layout of the first field is different and it is missing a field
If the above problems are resolved by for example filling empty fields with a "-", making sure that the first field always has the same layout, OR by putting a field separator in the data life will be al lot easier... till that time the below regex will hit on most rows correct (it will also hit on row 1...)
For rows 2-11 and 13-16:
(?ms)^(?<field1>\w+\s\d+)\s(?<field2>[^\s]+)\s(?<field3>[^\s]+)\s(?<field4>.+?)$
These values comes from a csv file which is formatted as follows:
This csv file is generated by a script running on a device, then is indexed by Splunk so there's no way for me to make any modification.
It means your data is ambiguous. Do you have a source with commas, tabs, etc. separating the values or quoted values to distinguish spaces within a value from spaces separating two values?
A regular expression cannot distinguish a value that contains two words separated by a space from two one-word values separated by a space.
So it means I'm stuck with the split / mvindex command then?