Solved: Index selected lines in a multiline event

gbv · ‎03-22-2019

Hi,

Nowadays, we have indexed multiline events and when we search, for example, in a time windows of today, Splunk needs a lot of time. We need to use this information to create a realtime alert, and we need to reduce the time spend searching results.
The multiline event has between 150 and 250 lines, but we only need 10 lines, and if we can filter and keep, only these lines we could have a high improvement.
The problem is that only the first line has the time, and if we index the events split by line, the events are indexed not in order, because the events write in log very close in time, and we can't know the relationship with each other.
Anybody knows any solution to index only the lines that we want in order?

One example of an event is:

09:58:12:859 DATA (82373276236368) = {
request: 1111, type: 'x' - [238.11025]->{ [238.12] [238.28] [238.29] } (0)
userType = 6
DataReply (456476567560) = {
request: 221212, type: 'x' - [233.10]->233.44
userType = 6
<--------------------------->
<---------- REPLY ---------->
<--------------------------->
Fixed fields = { key : 0 - no : 995 - typeMessage: 88 'O' - classOrder : 'O'
typeReply : 65 'A' - index : 243376 - nbRequestReply : 1
}
Record (54353453) = {
0 (aa) = "VALUE1"
1 (bb) = "VALUE2"
2 (cc) = "VALUE3"
...
51 (abv) = "VALUE4"
52 (sdf) = "VALUE5"
53 (erf) = "VALUE6"
...
240 (wer) = "VALUE7"
241 (tyr) = "VALUE8"
242 (yhr) = "VALUE9"
}
}
}

In this example, we only want first line with the hour, the line with request, and lines with codes: 1, 52, 241 and 242.

We are waiting for any help
Thanks in advance

gbv · ‎03-28-2019

After a period making a lot of test, we find a solution. Instead, we keep only the lines we want, we transform the raw text with the concatenation of fields extracted in these lines. The code in the configuration files is:

TRANSFORMS.CONF
[change_raw]
REGEX = (?:(?:INFO)\s+)(\d{2}:\d{2}:\d{2}:\d{3}).[\n\r]+.request:\s([^,]+).(?:(?:[\n\r]+.?)+(1\s(bb)\s=\s\"\w+\")+)?(?:(?:[\n\r]+.?)+(52\s(sdf)\s=\s\"?\w+\"?)+)?(?:(?:[\n\r]+.?)+(241\s(tyr)\s=\s\"[^\"]+\")+)?(?:(?:[\n\r]+.?)+(242\s(yhr)\s=\s\"\w+\")+)?(?:(?:[\n\r]+.*?)+)?
DEST_KEY = _raw
LOOKAHEAD = 100000
MATCH_LIMIT = 1000000
FORMAT = $1 request: $2 $3 $4 $5 $6
WRITE_META = true

PROPS.CONF
[sourcetype]
BREAK_ONLY_BEFORE = ^\w*\s+\d+:\d+:\d+:\d+\s+\w+\s+(\d+x[^)]+)\s+=\s+{
MAX_TIMESTAMP_LOOKAHEAD = 12
NO_BINARY_CHECK = true
SHOULD_LINEMERGE = true
TIME_FORMAT = %H:%M:%S:%3N
TRUNCATE = 0
MAX_EVENTS = 512
TIME_PREFIX = ^\w*\s+
DATETIME_CONFIG =
disabled = false
pulldown_type = true
TRANSFORMS-filtro = change_raw

View solution in original post

gbv · ‎03-28-2019

After a period making a lot of test, we find a solution. Instead, we keep only the lines we want, we transform the raw text with the concatenation of fields extracted in these lines. The code in the configuration files is:

TRANSFORMS.CONF
[change_raw]
REGEX = (?:(?:INFO)\s+)(\d{2}:\d{2}:\d{2}:\d{3}).[\n\r]+.request:\s([^,]+).(?:(?:[\n\r]+.?)+(1\s(bb)\s=\s\"\w+\")+)?(?:(?:[\n\r]+.?)+(52\s(sdf)\s=\s\"?\w+\"?)+)?(?:(?:[\n\r]+.?)+(241\s(tyr)\s=\s\"[^\"]+\")+)?(?:(?:[\n\r]+.?)+(242\s(yhr)\s=\s\"\w+\")+)?(?:(?:[\n\r]+.*?)+)?
DEST_KEY = _raw
LOOKAHEAD = 100000
MATCH_LIMIT = 1000000
FORMAT = $1 request: $2 $3 $4 $5 $6
WRITE_META = true

PROPS.CONF
[sourcetype]
BREAK_ONLY_BEFORE = ^\w*\s+\d+:\d+:\d+:\d+\s+\w+\s+(\d+x[^)]+)\s+=\s+{
MAX_TIMESTAMP_LOOKAHEAD = 12
NO_BINARY_CHECK = true
SHOULD_LINEMERGE = true
TIME_FORMAT = %H:%M:%S:%3N
TRUNCATE = 0
MAX_EVENTS = 512
TIME_PREFIX = ^\w*\s+
DATETIME_CONFIG =
disabled = false
pulldown_type = true
TRANSFORMS-filtro = change_raw

richgalloway · ‎03-28-2019

@gbv If your problem is resolved, please accept an answer to help future readers.

---
If this reply helps you, Karma would be appreciated.

woodcock · ‎03-25-2019

I am completely unclear what is to be kept and what is to be stripped. Instead of the ellipses, put the real text back in. Then clearly mark which lines are to stay and which ones are to be removed. The real problem here is that you are doing realtime searches. See here: https://answers.splunk.com/answers/734767/why-are-realtime-searches-disliked-in-the-splunk-w.html

nickhills · ‎03-25-2019

Firstly, I would get your event breaking working nicely first - can you share your props?
However you probably want to try and use something like this:

LINE_BREAKER=(^)\d{2}:\d{2}:\d{2}:\d{3}\sDATA
TIME_FORMAT=%H:%M:%S:%N3

This should break your events nicely, albeit with all the 250 values.
This may improve searching on its own, but lets tackle that separately.

If my comment helps, please give it a thumbs up!

ehowardl3 · ‎03-23-2019

If you don’t need the full events in Splunk, you should write a script to parse out the lines you need before ingesting into Splunk. This will speed up your search time and greatly reduce your licensing costs.

gbv · ‎03-25-2019

Thanks for your answer. The problem here is we need real-time, and if we use a script, we lose that.
On the other hand, the source is not ours, so we only can change configuration on indexers.
Any suggestions?

Index selected lines in a multiline event

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Announcing Modern Navigation: A New Era of Splunk User Experience

Think Like an Architect: Introducing the Splunk Certified Cybersecurity Defense ...

Best Practices: Splunk auto adjust pipeline queue

Join the Conversation