Splunk Search

How can I dynamically split my sample data using regex or any other options are available?

Shan
Builder

I have data in a log file as mentioned below. Can I split it using regex or any other options are available?

0010213002040538

I want to split the data above like this:

001 02 13 
002 04 0538 

For example, we can take:

001 02 13 

001 is a transaction code
02 is length of next value's value
13 is the value

Based on the length, I need to split the value dynamically.

So, how can I dynamically write the rex search to split it? If "02" appears as the length, I need to use that length and split the next value "13".
If the length is "04" then, I need to split based on the length to get "0538".

Thanks in advance
Kindly help me.

Tags (2)
0 Karma

alacercogitatus
SplunkTrust
SplunkTrust

This cannot be currently done. The regular expressions won't ever match properly, and using .* gets way to much data to be useful. The only fix here is to edit the source of the data (or perform prior processing with a script) to sed the data correctly.

Here is a sample bash script that will separate out the portions you need.

#!/bin/bash
data="001021300204053800309123d5-78900404data00503get"
myIndex=0
while [ $myIndex -lt ${#data} ]
do
  txnid=${data:$myIndex:3}
  myIndex=$[$myIndex+3]
  txnlen=`echo ${data:$myIndex:2}|sed 's/^0*//'`
  myIndex=$[$myIndex+2]
  txnstr=${data:$myIndex:$txnlen}
  myIndex=$[$myIndex+$txnlen]
  echo "txnid=$txnid txnlen=$txnlen txnstr=\"$txnstr\" "
done

This can be setup as a scripted input (passing in the correct values for data from command line) or by running it on the logs on the server, placing the output into a new location, and using the forwarder on the new logs with proper parsing. Then this is consumed and search like:

<your_scripted_input> | table txnid txnlen txnstr

woodcock
Esteemed Legend

Not with a single rex but with this chain of commands:

 ... | rex "(?<TransactionCode>.{3})(?<FieldValueLen>.{2})(?<FieldValue>.*)" | eval FieldValue=substr(FieldValue,1,FieldValueLen)
0 Karma

Shan
Builder

Woodcock,

First of all. Thank you very much for your valuable reply.
When I use the above rex search, it's splitting the first value and stopped there itself. How can I make use of the same rex for multiple value separation?

Sample data:

001021300204053800309123d5-78900404data00503get

Current Search:

sourcetype=testrex | table * | rex field=_raw "(?&lt;TransactionCode&gt;.{3})(?<FieldValueLen>.{2})(?<FieldValue>.&#42;)" | eval FieldValue=substr(FieldValue,1,FieldValueLen) | table TransactionCode FieldValueLen FieldValue

Desired Result:

001 02 13
002 04 0538
003 09 123d5-789
004 04 data
005 03 get

Current Result:

TransactionCode FieldValueLen FieldValue
001 02 13
001 02 13

Regards,
Shankar

0 Karma

woodcock
Esteemed Legend

Hopefully you have a limited chain otherwise an iterative approach like mine won't work. Let's assume you can have at most 4 in a chain; this should work:

... | rex "(?<TransactionCode>.{3})(?<FieldValueLen>.{2})(?<TempFieldValue>.*)"
| eval FieldValue=substr(TempFieldValue,1,FieldValueLen)
| eval TempFieldValue=substr(TempFieldValue,1+FieldValueLen)
| eval subevent=TransactionCode . ":::" . FieldValueLen . ":::" . FieldValue
| rex "(?<TempTransactionCode>.{3})(?<TempFieldValueLen>.{2})(?<TempFieldValue>.*)"
| eval TransactionCode=mvappend(TransactionCode, TempTransactionCode)
| eval FieldValueLen=mvappend(FieldValueLen, TempFieldValueLen)
| eval FieldValue=mvppend(FieldValue, substr(TempFieldValue,1,TempFieldValueLen)
| eval TempFieldValue=substr(TempFieldValue,1+TempFieldValueLen)
| eval subevent=mvappend(subevent, TempTransactionCode . ":::" . TempFieldValueLen . ":::" . TempFieldValue)
| rex "(?<TempTransactionCode>.{3})(?<TempFieldValueLen>.{2})(?<TempFieldValue>.*)"
| eval TransactionCode=mvappend(TransactionCode, TempTransactionCode)
| eval FieldValueLen=mvappend(FieldValueLen, TempFieldValueLen)
| eval FieldValue=mvppend(FieldValue, substr(TempFieldValue,1,TempFieldValueLen)
| eval TempFieldValue=substr(TempFieldValue,1+TempFieldValueLen)
| eval subevent=mvappend(subevent, TempTransactionCode . ":::" . TempFieldValueLen . ":::" . TempFieldValue)
| rex "(?<TempTransactionCode>.{3})(?<TempFieldValueLen>.{2})(?<TempFieldValue>.*)"
| eval TransactionCode=mvappend(TransactionCode, TempTransactionCode)
| eval FieldValueLen=mvappend(FieldValueLen, TempFieldValueLen)
| eval FieldValue=mvppend(FieldValue, substr(TempFieldValue,1,TempFieldValueLen)
| eval subevent=mvappend(subevent, TempTransactionCode . ":::" . TempFieldValueLen . ":::" . TempFieldValue)

Each event has several new multivalued fields and if you need to break out each subevent into a separate event, you add this:

| mvexpand subevent | rex field=subevent "(?<TransactionCode>.*?):::(?<FieldValueLen>.*?):::(?<FieldValue>.*)"  | table TransactionCode FieldValueLen FieldValue
0 Karma

Shan
Builder

Hai Woodcock,

Thank you very much.
I will try it with another sample file.

0 Karma

woodcock
Esteemed Legend

Don't forget to "Accept" the answer to close this question.

0 Karma
Get Updates on the Splunk Community!

The All New Performance Insights for Splunk

Splunk gives you amazing tools to analyze system data and make business-critical decisions, react to issues, ...

Good Sourcetype Naming

When it comes to getting data in, one of the earliest decisions made is what to use as a sourcetype. Often, ...

See your relevant APM services, dashboards, and alerts in one place with the updated ...

As a Splunk Observability user, you have a lot of data you have to manage, prioritize, and troubleshoot on a ...