Solved: Extract fields from single event(s) consisting of ...

ashabc · ‎09-11-2015

I have a csv file that has only one column without any header. The data set includes values for userid, property1, property2, property3 and then again userid, propperty1, property2, property3 and so on. How can I extract fields useird, property1, property2 and property3 ?

Tried something like below (e.g. for userid), does not work

.....| rex field=_raw "(?<userid>^(.*)\n)"

MuS · ‎09-12-2015

Hi ashabc,

take a look at this answer http://answers.splunk.com/answers/305727/why-is-my-rex-statement-unable-to-extract-the-fiel.html#ans... to learn about pcregextest And how you can Test regex in Splunk.

cheers, MuS

View solution in original post

MuS · ‎09-12-2015

Hi ashabc,

take a look at this answer http://answers.splunk.com/answers/305727/why-is-my-rex-statement-unable-to-extract-the-fiel.html#ans... to learn about pcregextest And how you can Test regex in Splunk.

cheers, MuS

MuS · ‎09-13-2015

Based on the just provided examples you can try this:

| gentimes start=-1 | eval foo="user1
101253
DTZ
Penrith, Cumberland
user2
2151614
FCC
Balnd, Temora" | rex max_match=0 field=foo "user\d[\r\n](?<userID>[^\r\n]*)[\r\n](?<property1>[^\r\n]*)[\r\n](?<property2>[^\r\n]*)" | table userID, property1, property2

or use the internal pcregextest like this:

$SPLUNK_HOME/bin/splunk cmd pcregextest mregex="user\d[\r\n](?<userID>[^\r\n]*)[\r\n](?<property1>[^\r\n]*)[\r\n](?<property2>[^\r\n]*)" test_str="user1
>     101253
>     DTZ
>     Penrith, Cumberland
>     user2
>     2151614
>     FCC
>     Balnd, Temora"
Original Pattern: 'user\d[\r\n](?<userID>[^\r\n]*)[\r\n](?<property1>[^\r\n]*)[\r\n](?<property2>[^\r\n]*)'
Expanded Pattern: 'user\d[\r\n](?<userID>[^\r\n]*)[\r\n](?<property1>[^\r\n]*)[\r\n](?<property2>[^\r\n]*)'
Regex compiled successfully. Capture group count = 3. Named capturing groups = 3.
SUCCESS - match against: 'user1
    101253
    DTZ
    Penrith, Cumberland
    user2
    2151614
    FCC
    Balnd, Temora'

#### Capturing group data ##### 
Group |            Name | Value
--------------------------------------
    1 |          userID |     101253
    2 |       property1 |     DTZ
    3 |       property2 |     Penrith, Cumberland

ashabc · ‎09-16-2015

Its kind of work.

What I still don't get it is you used eval foo="data_string". Its OK for 2 sets of sample data. When I have thousands of data in the csv file, how can I tackle that?

Runals · ‎09-16-2015

He was using gentimes and the eval as a way to test the methodology. If you do the search as

... | rex max_match=0 field=foo "user\d[\r\n](?<userID>[^\r\n]*)[\r\n](?<property1>[^\r\n]*)[\r\n](?<property2>[^\r\n]*)" | table userID, property1, property2

It should work. Certainly change the "field=" part of the rex command to fit the sourcetype and field that contains the data.

Runals · ‎09-11-2015

Can you post an example of the data? Does the data just contain the values or is there something unique to each line that could be useful to key in on for the extraction process.

ashabc · ‎09-13-2015

Here is sample data for 2 users. It basically contains a set of strings and numbers. The userid will be string, followed by some other form of id (property1) in number form, then 2 other properties, both strings and so on.

user1
101253
DTZ
Penrith, Cumberland
user2
2151614
FCC
Balnd, Temora

Extract fields from single event(s) consisting of mutiple lines

[Puzzles] Solve, Learn, Repeat: Dynamic formatting from XML events

Enter the Agentic Era with Splunk AI Assistant for SPL 1.4

Stronger Security with Federated Search for S3, GCP SQL & Australian Threat ...

Join the Conversation

Extract fields from single event(s) consisting of mutiple lines

[Puzzles] Solve, Learn, Repeat: Dynamic formatting from XML events

Enter the Agentic Era with Splunk AI Assistant for SPL 1.4

Stronger Security with Federated Search for S3, GCP SQL & Australian Threat ...