Splunk Search

Help with field extraction from table values

dhirendra761
Contributor

HI Splunkers,

We are getting below value inside one of field "data" in tabular format:

Source success Total_Count
0 abc.csv True 200
1 some_string_1 False 34
2 some_string_2 True 12
3 some_string_3 False 4
4 some_string_4 True 63
5 some_string_5 False 2
6 some_string_6 True 108

Can we extract these values in different fields. Thank you in advance for your reply

Labels (1)
0 Karma
1 Solution

yuanliu
SplunkTrust
SplunkTrust

multikv is your friend. But you need to handle the tabulation's ordinal column.

| rename _raw as temp, data AS _raw
| rex mode=sed "s/^Source/ord Source/"
| multikv
| rename temp as _raw ``` assuming field data is no longer useful but you still need _raw for some reason ```

Using your sample data, I get

SourceTotalCountorgsuccess
abc.csv2000True
some_string_1341False
some_string_2122True
some_string_343False
some_string_4634True
some_string_525False
some_string_61086True

View solution in original post

Tags (1)

nehamvinchankar
Path Finder

you can try expand and split

0 Karma

dhirendra761
Contributor

This is not multivalue field so mvexpand is not working.

0 Karma

yuanliu
SplunkTrust
SplunkTrust

multikv is your friend. But you need to handle the tabulation's ordinal column.

| rename _raw as temp, data AS _raw
| rex mode=sed "s/^Source/ord Source/"
| multikv
| rename temp as _raw ``` assuming field data is no longer useful but you still need _raw for some reason ```

Using your sample data, I get

SourceTotalCountorgsuccess
abc.csv2000True
some_string_1341False
some_string_2122True
some_string_343False
some_string_4634True
some_string_525False
some_string_61086True
Tags (1)

dhirendra761
Contributor

Its working as expected.

Thank you so much.

0 Karma

dhirendra761
Contributor

NOTE: values contains multiple white space in it.

ex: 1<multple_space> some_string_1<multple_space>False<multple_space>34

0 Karma

yuanliu
SplunkTrust
SplunkTrust

Unless the developer of your data source has considered data structure, you have no way of doing this. Period. You'll have to reject this source altogether and send them back to developer's desk.

One way the developer may have enforced structure is by having all the strings with no space in between, i.e., any number of space is a field separator. In that case, you can do

| rename _raw as temp, data AS _raw
| rex mode=sed "s/^Source/ord Source/"
| rex mode=sed "s/ +/ /g" | multikv | rename temp as _raw ``` assuming field data is no longer useful but you still need _raw for some reason ```

Another way structure may have been enforced is, for example, to have a maximum number of consecutive spaces in legitimate strings, and use a field separator made of spaces larger than this maximum. For example, if maximum legitimate space is 1, you can do

| rename _raw as temp, data AS _raw
| rex mode=sed "s/^Source/ord Source/"
| rex mode=sed "s/ +/,/g" | multikv | rename temp as _raw ``` assuming field data is no longer useful but you still need _raw for some reason ```

A common problem, given that your ordinal has no corresponding header, could be that the tabular data is poorly converted from a spreadsheet, when original field separator is tab character but got "expanded" into normal spaces during conversion. In that case, you'll have to go back to the source person and demand a conformant conversion.

No amount of SPL can save extremely poor input.

0 Karma

PickleRick
SplunkTrust
SplunkTrust

As long as you can variable number of spaces in the middle of just one field, you can get around it by "fixing" the other fields' positions and just picking the "remains" as the field value. But that's ugly and prone to errors in case something changes.

But yes, I fully agree with "No amount of SPL can save extremely poor input."

0 Karma

PickleRick
SplunkTrust
SplunkTrust

If the data was a bit more "strictly" structured, you could do indexed extractions and could get away with the softwared shuffling column order (that can be a problem with csv if suddenly the format changes).

If your fields order is constant, you can define your extraction taking into account possible multiple spaces (you usually define extractions this way anyway).

Something like:

(?<id>\d+)\s+(?<Source>.*?)\s+(?<success>True|False)\s+(?<count>\d+)

But I'd be very cautious about naming your field "Source" since "source" is one of the default fields.

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...