Splunk Search

Extract fields from single event(s) consisting of mutiple lines

Contributor

I have a csv file that has only one column without any header. The data set includes values for userid, property1, property2, property3 and then again userid, propperty1, property2, property3 and so on. How can I extract fields useird, property1, property2 and property3 ?

Tried something like below (e.g. for userid), does not work

.....| rex field=_raw "(?<userid>^(.*)\n)"
0 Karma
1 Solution

SplunkTrust
SplunkTrust

Hi ashabc,

take a look at this answer http://answers.splunk.com/answers/305727/why-is-my-rex-statement-unable-to-extract-the-fiel.html#ans... to learn about pcregextest And how you can Test regex in Splunk.

cheers, MuS

View solution in original post

SplunkTrust
SplunkTrust

Hi ashabc,

take a look at this answer http://answers.splunk.com/answers/305727/why-is-my-rex-statement-unable-to-extract-the-fiel.html#ans... to learn about pcregextest And how you can Test regex in Splunk.

cheers, MuS

View solution in original post

SplunkTrust
SplunkTrust

Based on the just provided examples you can try this:

| gentimes start=-1 | eval foo="user1
101253
DTZ
Penrith, Cumberland
user2
2151614
FCC
Balnd, Temora" | rex max_match=0 field=foo "user\d[\r\n](?<userID>[^\r\n]*)[\r\n](?<property1>[^\r\n]*)[\r\n](?<property2>[^\r\n]*)" | table userID, property1, property2

or use the internal pcregextest like this:

$SPLUNK_HOME/bin/splunk cmd pcregextest mregex="user\d[\r\n](?<userID>[^\r\n]*)[\r\n](?<property1>[^\r\n]*)[\r\n](?<property2>[^\r\n]*)" test_str="user1
>     101253
>     DTZ
>     Penrith, Cumberland
>     user2
>     2151614
>     FCC
>     Balnd, Temora"
Original Pattern: 'user\d[\r\n](?<userID>[^\r\n]*)[\r\n](?<property1>[^\r\n]*)[\r\n](?<property2>[^\r\n]*)'
Expanded Pattern: 'user\d[\r\n](?<userID>[^\r\n]*)[\r\n](?<property1>[^\r\n]*)[\r\n](?<property2>[^\r\n]*)'
Regex compiled successfully. Capture group count = 3. Named capturing groups = 3.
SUCCESS - match against: 'user1
    101253
    DTZ
    Penrith, Cumberland
    user2
    2151614
    FCC
    Balnd, Temora'

#### Capturing group data ##### 
Group |            Name | Value
--------------------------------------
    1 |          userID |     101253
    2 |       property1 |     DTZ
    3 |       property2 |     Penrith, Cumberland
0 Karma

Contributor

Its kind of work.

What I still don't get it is you used eval foo="data_string". Its OK for 2 sets of sample data. When I have thousands of data in the csv file, how can I tackle that?

0 Karma

Motivator

He was using gentimes and the eval as a way to test the methodology. If you do the search as

... | rex max_match=0 field=foo "user\d[\r\n](?<userID>[^\r\n]*)[\r\n](?<property1>[^\r\n]*)[\r\n](?<property2>[^\r\n]*)" | table userID, property1, property2

It should work. Certainly change the "field=" part of the rex command to fit the sourcetype and field that contains the data.

Motivator

Can you post an example of the data? Does the data just contain the values or is there something unique to each line that could be useful to key in on for the extraction process.

0 Karma

Contributor

Here is sample data for 2 users. It basically contains a set of strings and numbers. The userid will be string, followed by some other form of id (property1) in number form, then 2 other properties, both strings and so on.

user1
101253
DTZ
Penrith, Cumberland
user2
2151614
FCC
Balnd, Temora

0 Karma