Hi,
Nowadays, we have indexed multiline events and when we search, for example, in a time windows of today, Splunk needs a lot of time. We need to use this information to create a realtime alert, and we need to reduce the time spend searching results.
The multiline event has between 150 and 250 lines, but we only need 10 lines, and if we can filter and keep, only these lines we could have a high improvement.
The problem is that only the first line has the time, and if we index the events split by line, the events are indexed not in order, because the events write in log very close in time, and we can't know the relationship with each other.
Anybody knows any solution to index only the lines that we want in order?
One example of an event is:
09:58:12:859 DATA (82373276236368) = {
request: 1111, type: 'x' - [238.11025]->{ [238.12] [238.28] [238.29] } (0)
userType = 6
DataReply (456476567560) = {
request: 221212, type: 'x' - [233.10]->233.44
userType = 6
<--------------------------->
<---------- REPLY ---------->
<--------------------------->
Fixed fields = { key : 0 - no : 995 - typeMessage: 88 'O' - classOrder : 'O'
typeReply : 65 'A' - index : 243376 - nbRequestReply : 1
}
Record (54353453) = {
0 (aa) = "VALUE1"
1 (bb) = "VALUE2"
2 (cc) = "VALUE3"
...
51 (abv) = "VALUE4"
52 (sdf) = "VALUE5"
53 (erf) = "VALUE6"
...
240 (wer) = "VALUE7"
241 (tyr) = "VALUE8"
242 (yhr) = "VALUE9"
}
}
}
In this example, we only want first line with the hour, the line with request, and lines with codes: 1, 52, 241 and 242.
We are waiting for any help
Thanks in advance
After a period making a lot of test, we find a solution. Instead, we keep only the lines we want, we transform the raw text with the concatenation of fields extracted in these lines. The code in the configuration files is:
TRANSFORMS.CONF
[change_raw]
REGEX = (?:(?:INFO)\s+)(\d{2}:\d{2}:\d{2}:\d{3}).[\n\r]+.request:\s([^,]+).(?:(?:[\n\r]+.?)+(1\s(bb)\s=\s\"\w+\")+)?(?:(?:[\n\r]+.?)+(52\s(sdf)\s=\s\"?\w+\"?)+)?(?:(?:[\n\r]+.?)+(241\s(tyr)\s=\s\"[^\"]+\")+)?(?:(?:[\n\r]+.?)+(242\s(yhr)\s=\s\"\w+\")+)?(?:(?:[\n\r]+.*?)+)?
DEST_KEY = _raw
LOOKAHEAD = 100000
MATCH_LIMIT = 1000000
FORMAT = $1 request: $2 $3 $4 $5 $6
WRITE_META = true
PROPS.CONF
[sourcetype]
BREAK_ONLY_BEFORE = ^\w*\s+\d+:\d+:\d+:\d+\s+\w+\s+(\d+x[^)]+)\s+=\s+{
MAX_TIMESTAMP_LOOKAHEAD = 12
NO_BINARY_CHECK = true
SHOULD_LINEMERGE = true
TIME_FORMAT = %H:%M:%S:%3N
TRUNCATE = 0
MAX_EVENTS = 512
TIME_PREFIX = ^\w*\s+
DATETIME_CONFIG =
disabled = false
pulldown_type = true
TRANSFORMS-filtro = change_raw
After a period making a lot of test, we find a solution. Instead, we keep only the lines we want, we transform the raw text with the concatenation of fields extracted in these lines. The code in the configuration files is:
TRANSFORMS.CONF
[change_raw]
REGEX = (?:(?:INFO)\s+)(\d{2}:\d{2}:\d{2}:\d{3}).[\n\r]+.request:\s([^,]+).(?:(?:[\n\r]+.?)+(1\s(bb)\s=\s\"\w+\")+)?(?:(?:[\n\r]+.?)+(52\s(sdf)\s=\s\"?\w+\"?)+)?(?:(?:[\n\r]+.?)+(241\s(tyr)\s=\s\"[^\"]+\")+)?(?:(?:[\n\r]+.?)+(242\s(yhr)\s=\s\"\w+\")+)?(?:(?:[\n\r]+.*?)+)?
DEST_KEY = _raw
LOOKAHEAD = 100000
MATCH_LIMIT = 1000000
FORMAT = $1 request: $2 $3 $4 $5 $6
WRITE_META = true
PROPS.CONF
[sourcetype]
BREAK_ONLY_BEFORE = ^\w*\s+\d+:\d+:\d+:\d+\s+\w+\s+(\d+x[^)]+)\s+=\s+{
MAX_TIMESTAMP_LOOKAHEAD = 12
NO_BINARY_CHECK = true
SHOULD_LINEMERGE = true
TIME_FORMAT = %H:%M:%S:%3N
TRUNCATE = 0
MAX_EVENTS = 512
TIME_PREFIX = ^\w*\s+
DATETIME_CONFIG =
disabled = false
pulldown_type = true
TRANSFORMS-filtro = change_raw
@gbv If your problem is resolved, please accept an answer to help future readers.
I am completely unclear what is to be kept and what is to be stripped. Instead of the ellipses, put the real text back in. Then clearly mark which lines are to stay and which ones are to be removed. The real problem here is that you are doing realtime searches. See here: https://answers.splunk.com/answers/734767/why-are-realtime-searches-disliked-in-the-splunk-w.html
Firstly, I would get your event breaking working nicely first - can you share your props?
However you probably want to try and use something like this:
LINE_BREAKER=(^)\d{2}:\d{2}:\d{2}:\d{3}\sDATA
TIME_FORMAT=%H:%M:%S:%N3
This should break your events nicely, albeit with all the 250 values.
This may improve searching on its own, but lets tackle that separately.
If you don’t need the full events in Splunk, you should write a script to parse out the lines you need before ingesting into Splunk. This will speed up your search time and greatly reduce your licensing costs.
Thanks for your answer. The problem here is we need real-time, and if we use a script, we lose that.
On the other hand, the source is not ours, so we only can change configuration on indexers.
Any suggestions?