Solved: Help with field extraction from table values

dhirendra761 · ‎12-01-2022

HI Splunkers,

We are getting below value inside one of field "data" in tabular format:

Source success Total_Count
0 abc.csv True 200
1 some_string_1 False 34
2 some_string_2 True 12
3 some_string_3 False 4
4 some_string_4 True 63
5 some_string_5 False 2
6 some_string_6 True 108

Can we extract these values in different fields. Thank you in advance for your reply

yuanliu · ‎12-02-2022

multikv is your friend. But you need to handle the tabulation's ordinal column.

| rename _raw as temp, data AS _raw
| rex mode=sed "s/^Source/ord Source/"
| multikv
| rename temp as _raw ``` assuming field data is no longer useful but you still need _raw for some reason ```

Using your sample data, I get

Source	TotalCount	org	success
abc.csv	200	0	True
some_string_1	34	1	False
some_string_2	12	2	True
some_string_3	4	3	False
some_string_4	63	4	True
some_string_5	2	5	False
some_string_6	108	6	True

View solution in original post

nehamvinchankar · ‎12-02-2022

you can try expand and split

dhirendra761 · ‎12-02-2022

This is not multivalue field so mvexpand is not working.

yuanliu · ‎12-02-2022

multikv is your friend. But you need to handle the tabulation's ordinal column.

| rename _raw as temp, data AS _raw
| rex mode=sed "s/^Source/ord Source/"
| multikv
| rename temp as _raw ``` assuming field data is no longer useful but you still need _raw for some reason ```

Using your sample data, I get

Source	TotalCount	org	success
abc.csv	200	0	True
some_string_1	34	1	False
some_string_2	12	2	True
some_string_3	4	3	False
some_string_4	63	4	True
some_string_5	2	5	False
some_string_6	108	6	True

dhirendra761 · ‎12-02-2022

Its working as expected.

Thank you so much.

dhirendra761 · ‎12-01-2022

NOTE: values contains multiple white space in it.

ex: 1<multple_space> some_string_1<multple_space>False<multple_space>34

yuanliu · ‎12-02-2022

Unless the developer of your data source has considered data structure, you have no way of doing this. Period. You'll have to reject this source altogether and send them back to developer's desk.

One way the developer may have enforced structure is by having all the strings with no space in between, i.e., any number of space is a field separator. In that case, you can do

| rename _raw as temp, data AS _raw
| rex mode=sed "s/^Source/ord Source/"
| rex mode=sed "s/ +/ /g"
| multikv
| rename temp as _raw ``` assuming field data is no longer useful but you still need _raw for some reason ```

Another way structure may have been enforced is, for example, to have a maximum number of consecutive spaces in legitimate strings, and use a field separator made of spaces larger than this maximum. For example, if maximum legitimate space is 1, you can do

| rename _raw as temp, data AS _raw
| rex mode=sed "s/^Source/ord Source/"
| rex mode=sed "s/  +/,/g"
| multikv
| rename temp as _raw ``` assuming field data is no longer useful but you still need _raw for some reason ```

A common problem, given that your ordinal has no corresponding header, could be that the tabular data is poorly converted from a spreadsheet, when original field separator is tab character but got "expanded" into normal spaces during conversion. In that case, you'll have to go back to the source person and demand a conformant conversion.

No amount of SPL can save extremely poor input.

PickleRick · ‎12-02-2022

As long as you can variable number of spaces in the middle of just one field, you can get around it by "fixing" the other fields' positions and just picking the "remains" as the field value. But that's ugly and prone to errors in case something changes.

But yes, I fully agree with "No amount of SPL can save extremely poor input."

PickleRick · ‎12-02-2022

If the data was a bit more "strictly" structured, you could do indexed extractions and could get away with the softwared shuffling column order (that can be a problem with csv if suddenly the format changes).

If your fields order is constant, you can define your extraction taking into account possible multiple spaces (you usually define extractions this way anyway).

Something like:

(?<id>\d+)\s+(?<Source>.*?)\s+(?<success>True|False)\s+(?<count>\d+)

But I'd be very cautious about naming your field "Source" since "source" is one of the default fields.

Help with field extraction from table values

field extraction

Introducing the 2024 SplunkTrust!

Introducing the 2024 Splunk MVPs!

Splunk Custom Visualizations App End of Life