HI Splunkers,
We are getting below value inside one of field "data" in tabular format:
Source success Total_Count
0 abc.csv True 200
1 some_string_1 False 34
2 some_string_2 True 12
3 some_string_3 False 4
4 some_string_4 True 63
5 some_string_5 False 2
6 some_string_6 True 108
Can we extract these values in different fields. Thank you in advance for your reply
multikv is your friend. But you need to handle the tabulation's ordinal column.
| rename _raw as temp, data AS _raw | rex mode=sed "s/^Source/ord Source/" | multikv | rename temp as _raw ``` assuming field data is no longer useful but you still need _raw for some reason ```
Using your sample data, I get
Source | TotalCount | org | success |
abc.csv | 200 | 0 | True |
some_string_1 | 34 | 1 | False |
some_string_2 | 12 | 2 | True |
some_string_3 | 4 | 3 | False |
some_string_4 | 63 | 4 | True |
some_string_5 | 2 | 5 | False |
some_string_6 | 108 | 6 | True |
you can try expand and split
This is not multivalue field so mvexpand is not working.
multikv is your friend. But you need to handle the tabulation's ordinal column.
| rename _raw as temp, data AS _raw | rex mode=sed "s/^Source/ord Source/" | multikv | rename temp as _raw ``` assuming field data is no longer useful but you still need _raw for some reason ```
Using your sample data, I get
Source | TotalCount | org | success |
abc.csv | 200 | 0 | True |
some_string_1 | 34 | 1 | False |
some_string_2 | 12 | 2 | True |
some_string_3 | 4 | 3 | False |
some_string_4 | 63 | 4 | True |
some_string_5 | 2 | 5 | False |
some_string_6 | 108 | 6 | True |
Its working as expected.
Thank you so much.
NOTE: values contains multiple white space in it.
ex: 1<multple_space> some_string_1<multple_space>False<multple_space>34
Unless the developer of your data source has considered data structure, you have no way of doing this. Period. You'll have to reject this source altogether and send them back to developer's desk.
One way the developer may have enforced structure is by having all the strings with no space in between, i.e., any number of space is a field separator. In that case, you can do
| rename _raw as temp, data AS _raw | rex mode=sed "s/^Source/ord Source/"
| rex mode=sed "s/ +/ /g" | multikv | rename temp as _raw ``` assuming field data is no longer useful but you still need _raw for some reason ```
Another way structure may have been enforced is, for example, to have a maximum number of consecutive spaces in legitimate strings, and use a field separator made of spaces larger than this maximum. For example, if maximum legitimate space is 1, you can do
| rename _raw as temp, data AS _raw | rex mode=sed "s/^Source/ord Source/"
| rex mode=sed "s/ +/,/g" | multikv | rename temp as _raw ``` assuming field data is no longer useful but you still need _raw for some reason ```
A common problem, given that your ordinal has no corresponding header, could be that the tabular data is poorly converted from a spreadsheet, when original field separator is tab character but got "expanded" into normal spaces during conversion. In that case, you'll have to go back to the source person and demand a conformant conversion.
No amount of SPL can save extremely poor input.
As long as you can variable number of spaces in the middle of just one field, you can get around it by "fixing" the other fields' positions and just picking the "remains" as the field value. But that's ugly and prone to errors in case something changes.
But yes, I fully agree with "No amount of SPL can save extremely poor input."
If the data was a bit more "strictly" structured, you could do indexed extractions and could get away with the softwared shuffling column order (that can be a problem with csv if suddenly the format changes).
If your fields order is constant, you can define your extraction taking into account possible multiple spaces (you usually define extractions this way anyway).
Something like:
(?<id>\d+)\s+(?<Source>.*?)\s+(?<success>True|False)\s+(?<count>\d+)
But I'd be very cautious about naming your field "Source" since "source" is one of the default fields.