Getting Data In

Index selected lines in a multiline event

gbv
Explorer

Hi,

Nowadays, we have indexed multiline events and when we search, for example, in a time windows of today, Splunk needs a lot of time. We need to use this information to create a realtime alert, and we need to reduce the time spend searching results.
The multiline event has between 150 and 250 lines, but we only need 10 lines, and if we can filter and keep, only these lines we could have a high improvement.
The problem is that only the first line has the time, and if we index the events split by line, the events are indexed not in order, because the events write in log very close in time, and we can't know the relationship with each other.
Anybody knows any solution to index only the lines that we want in order?

One example of an event is:

09:58:12:859 DATA (82373276236368) = {
request: 1111, type: 'x' - [238.11025]->{ [238.12] [238.28] [238.29] } (0)
userType = 6
DataReply (456476567560) = {
request: 221212, type: 'x' - [233.10]->233.44
userType = 6
<--------------------------->
<---------- REPLY ---------->
<--------------------------->
Fixed fields = { key : 0 - no : 995 - typeMessage: 88 'O' - classOrder : 'O'
typeReply : 65 'A' - index : 243376 - nbRequestReply : 1
}
Record (54353453) = {
0 (aa) = "VALUE1"
1 (bb) = "VALUE2"
2 (cc) = "VALUE3"
...
51 (abv) = "VALUE4"
52 (sdf) = "VALUE5"
53 (erf) = "VALUE6"
...
240 (wer) = "VALUE7"
241 (tyr) = "VALUE8"
242 (yhr) = "VALUE9"
}
}
}

In this example, we only want first line with the hour, the line with request, and lines with codes: 1, 52, 241 and 242.

We are waiting for any help
Thanks in advance

0 Karma
1 Solution

gbv
Explorer

After a period making a lot of test, we find a solution. Instead, we keep only the lines we want, we transform the raw text with the concatenation of fields extracted in these lines. The code in the configuration files is:

TRANSFORMS.CONF
[change_raw]
REGEX = (?:(?:INFO)\s+)(\d{2}:\d{2}:\d{2}:\d{3}).[\n\r]+.request:\s([^,]+).(?:(?:[\n\r]+.?)+(1\s(bb)\s=\s\"\w+\")+)?(?:(?:[\n\r]+.?)+(52\s(sdf)\s=\s\"?\w+\"?)+)?(?:(?:[\n\r]+.?)+(241\s(tyr)\s=\s\"[^\"]+\")+)?(?:(?:[\n\r]+.?)+(242\s(yhr)\s=\s\"\w+\")+)?(?:(?:[\n\r]+.*?)+)?
DEST_KEY = _raw
LOOKAHEAD = 100000
MATCH_LIMIT = 1000000
FORMAT = $1 request: $2 $3 $4 $5 $6
WRITE_META = true

PROPS.CONF
[sourcetype]
BREAK_ONLY_BEFORE = ^\w*\s+\d+:\d+:\d+:\d+\s+\w+\s+(\d+x[^)]+)\s+=\s+{
MAX_TIMESTAMP_LOOKAHEAD = 12
NO_BINARY_CHECK = true
SHOULD_LINEMERGE = true
TIME_FORMAT = %H:%M:%S:%3N
TRUNCATE = 0
MAX_EVENTS = 512
TIME_PREFIX = ^\w*\s+
DATETIME_CONFIG =
disabled = false
pulldown_type = true
TRANSFORMS-filtro = change_raw

View solution in original post

0 Karma

gbv
Explorer

After a period making a lot of test, we find a solution. Instead, we keep only the lines we want, we transform the raw text with the concatenation of fields extracted in these lines. The code in the configuration files is:

TRANSFORMS.CONF
[change_raw]
REGEX = (?:(?:INFO)\s+)(\d{2}:\d{2}:\d{2}:\d{3}).[\n\r]+.request:\s([^,]+).(?:(?:[\n\r]+.?)+(1\s(bb)\s=\s\"\w+\")+)?(?:(?:[\n\r]+.?)+(52\s(sdf)\s=\s\"?\w+\"?)+)?(?:(?:[\n\r]+.?)+(241\s(tyr)\s=\s\"[^\"]+\")+)?(?:(?:[\n\r]+.?)+(242\s(yhr)\s=\s\"\w+\")+)?(?:(?:[\n\r]+.*?)+)?
DEST_KEY = _raw
LOOKAHEAD = 100000
MATCH_LIMIT = 1000000
FORMAT = $1 request: $2 $3 $4 $5 $6
WRITE_META = true

PROPS.CONF
[sourcetype]
BREAK_ONLY_BEFORE = ^\w*\s+\d+:\d+:\d+:\d+\s+\w+\s+(\d+x[^)]+)\s+=\s+{
MAX_TIMESTAMP_LOOKAHEAD = 12
NO_BINARY_CHECK = true
SHOULD_LINEMERGE = true
TIME_FORMAT = %H:%M:%S:%3N
TRUNCATE = 0
MAX_EVENTS = 512
TIME_PREFIX = ^\w*\s+
DATETIME_CONFIG =
disabled = false
pulldown_type = true
TRANSFORMS-filtro = change_raw

0 Karma

richgalloway
SplunkTrust
SplunkTrust

@gbv If your problem is resolved, please accept an answer to help future readers.

---
If this reply helps you, Karma would be appreciated.
0 Karma

woodcock
Esteemed Legend

I am completely unclear what is to be kept and what is to be stripped. Instead of the ellipses, put the real text back in. Then clearly mark which lines are to stay and which ones are to be removed. The real problem here is that you are doing realtime searches. See here: https://answers.splunk.com/answers/734767/why-are-realtime-searches-disliked-in-the-splunk-w.html

0 Karma

nickhills
Ultra Champion

Firstly, I would get your event breaking working nicely first - can you share your props?
However you probably want to try and use something like this:

LINE_BREAKER=(^)\d{2}:\d{2}:\d{2}:\d{3}\sDATA
TIME_FORMAT=%H:%M:%S:%N3

This should break your events nicely, albeit with all the 250 values.
This may improve searching on its own, but lets tackle that separately.

If my comment helps, please give it a thumbs up!
0 Karma

ehowardl3
Path Finder

If you don’t need the full events in Splunk, you should write a script to parse out the lines you need before ingesting into Splunk. This will speed up your search time and greatly reduce your licensing costs.

0 Karma

gbv
Explorer

Thanks for your answer. The problem here is we need real-time, and if we use a script, we lose that.
On the other hand, the source is not ours, so we only can change configuration on indexers.
Any suggestions?

0 Karma
Get Updates on the Splunk Community!

Introduction to Splunk Observability Cloud - Building a Resilient Hybrid Cloud

Introduction to Splunk Observability Cloud - Building a Resilient Hybrid Cloud  In today’s fast-paced digital ...

Observability protocols to know about

Observability protocols define the specifications or formats for collecting, encoding, transporting, and ...

Take Your Breath Away with Splunk Risk-Based Alerting (RBA)

WATCH NOW!The Splunk Guide to Risk-Based Alerting is here to empower your SOC like never before. Join Haylee ...