Splunk Search

Help with field extraction from table values

dhirendra761
Contributor

HI Splunkers,

We are getting below value inside one of field "data" in tabular format:

Source success Total_Count
0 abc.csv True 200
1 some_string_1 False 34
2 some_string_2 True 12
3 some_string_3 False 4
4 some_string_4 True 63
5 some_string_5 False 2
6 some_string_6 True 108

Can we extract these values in different fields. Thank you in advance for your reply

Labels (1)
0 Karma
1 Solution

yuanliu
SplunkTrust
SplunkTrust

multikv is your friend. But you need to handle the tabulation's ordinal column.

| rename _raw as temp, data AS _raw
| rex mode=sed "s/^Source/ord Source/"
| multikv
| rename temp as _raw ``` assuming field data is no longer useful but you still need _raw for some reason ```

Using your sample data, I get

SourceTotalCountorgsuccess
abc.csv2000True
some_string_1341False
some_string_2122True
some_string_343False
some_string_4634True
some_string_525False
some_string_61086True

View solution in original post

Tags (1)

nehamvinchankar
Engager

you can try expand and split

0 Karma

dhirendra761
Contributor

This is not multivalue field so mvexpand is not working.

0 Karma

yuanliu
SplunkTrust
SplunkTrust

multikv is your friend. But you need to handle the tabulation's ordinal column.

| rename _raw as temp, data AS _raw
| rex mode=sed "s/^Source/ord Source/"
| multikv
| rename temp as _raw ``` assuming field data is no longer useful but you still need _raw for some reason ```

Using your sample data, I get

SourceTotalCountorgsuccess
abc.csv2000True
some_string_1341False
some_string_2122True
some_string_343False
some_string_4634True
some_string_525False
some_string_61086True
Tags (1)

dhirendra761
Contributor

Its working as expected.

Thank you so much.

0 Karma

dhirendra761
Contributor

NOTE: values contains multiple white space in it.

ex: 1<multple_space> some_string_1<multple_space>False<multple_space>34

0 Karma

yuanliu
SplunkTrust
SplunkTrust

Unless the developer of your data source has considered data structure, you have no way of doing this. Period. You'll have to reject this source altogether and send them back to developer's desk.

One way the developer may have enforced structure is by having all the strings with no space in between, i.e., any number of space is a field separator. In that case, you can do

| rename _raw as temp, data AS _raw
| rex mode=sed "s/^Source/ord Source/"
| rex mode=sed "s/ +/ /g" | multikv | rename temp as _raw ``` assuming field data is no longer useful but you still need _raw for some reason ```

Another way structure may have been enforced is, for example, to have a maximum number of consecutive spaces in legitimate strings, and use a field separator made of spaces larger than this maximum. For example, if maximum legitimate space is 1, you can do

| rename _raw as temp, data AS _raw
| rex mode=sed "s/^Source/ord Source/"
| rex mode=sed "s/ +/,/g" | multikv | rename temp as _raw ``` assuming field data is no longer useful but you still need _raw for some reason ```

A common problem, given that your ordinal has no corresponding header, could be that the tabular data is poorly converted from a spreadsheet, when original field separator is tab character but got "expanded" into normal spaces during conversion. In that case, you'll have to go back to the source person and demand a conformant conversion.

No amount of SPL can save extremely poor input.

0 Karma

PickleRick
Ultra Champion

As long as you can variable number of spaces in the middle of just one field, you can get around it by "fixing" the other fields' positions and just picking the "remains" as the field value. But that's ugly and prone to errors in case something changes.

But yes, I fully agree with "No amount of SPL can save extremely poor input."

0 Karma

PickleRick
Ultra Champion

If the data was a bit more "strictly" structured, you could do indexed extractions and could get away with the softwared shuffling column order (that can be a problem with csv if suddenly the format changes).

If your fields order is constant, you can define your extraction taking into account possible multiple spaces (you usually define extractions this way anyway).

Something like:

(?<id>\d+)\s+(?<Source>.*?)\s+(?<success>True|False)\s+(?<count>\d+)

But I'd be very cautious about naming your field "Source" since "source" is one of the default fields.

0 Karma
Get Updates on the Splunk Community!

Observability Highlights | January 2023 Newsletter

 January 2023New Product Releases Splunk Network Explorer for Infrastructure MonitoringSplunk unveils Network ...

Security Highlights | January 2023 Newsletter

January 2023 Splunk Security Essentials (SSE) 3.7.0 ReleaseThe free Splunk Security Essentials (SSE) 3.7.0 app ...

Platform Highlights | January 2023 Newsletter

 January 2023Peace on Earth and Peace of Mind With Business ResilienceAll organizations can start the new year ...