Getting Data In

break up events based on pid?

builder
Path Finder

I assume there is no way to do what I want, but I figured I'd ask anyway. I have a background job processor that logs data about the jobs, as follows.

[2011-06-13 12:10:31,DEBUG,ip-10-126-214-147(18959)] KtEventReport 82245248 completed after 0.1432
[2011-06-13 12:10:31,DEBUG,ip-10-126-214-147(18937)] KtEventReport 82245253 completed after 0.1443
[2011-06-13 12:10:31,INFO,ip-10-126-214-147(18959)] ,host:ip-10-126-214-147 pid:18959] 1 jobs processed at 5.9643 j/s, 0 failed ...
[2011-06-13 12:10:31,INFO,ip-10-126-214-147(18937)] ,host:ip-10-126-214-147 pid:18937] 2 jobs processed at 6.2623 j/s, 0 failed ...
[2011-06-13 12:10:31,DEBUG,ip-10-126-214-147(18944)] KtEventReport 82245256 completed after 0.1428
[2011-06-13 12:10:31,DEBUG,ip-10-126-214-147(18925)] KtUserReport 82245257 completed after 0.1455
[2011-06-13 12:10:31,INFO,ip-10-126-214-147(18944)] ,host:ip-10-126-214-147 pid:18944] 1 jobs processed at 6.5080 j/s, 0 failed ...
[2011-06-13 12:10:37,DEBUG,ip-10-126-214-147(18952)] KtEventReport 82245275 completed after 0.1444
[2011-06-13 12:10:31,INFO,ip-10-126-214-147(18925)] ,host:ip-10-126-214-147 pid:18925] 1 jobs processed at 5.9259 j/s, 0 failed ...
[2011-06-13 12:10:37,DEBUG,ip-10-126-214-147(18952)] KtEventReport 82245282 completed after 0.1435
[2011-06-13 12:10:37,INFO,ip-10-126-214-147(18952)] ,host:ip-10-126-214-147 pid:18952] 2 jobs processed at 6.4665 j/s, 0 failed ..

Note that the number in parenthesis at the end of the timestamp is the process ID of the process handling the job. I would like to break up my events into batches run for each process ID. However, the various batches for each process ID are interleaved in the logs. If I understand correctly, you can only break up the logs into events serially, so what I want to do probably isn't possible. However, if I am wrong about that and there is a way to do this, let me know! To be clear, the above log would be broken into the following events.

event 1:

[2011-06-13 12:10:31,DEBUG,ip-10-126-214-147(18959)] KtEventReport 82245248 completed after 0.1432
[2011-06-13 12:10:31,INFO,ip-10-126-214-147(18959)] ,host:ip-10-126-214-147 pid:18959] 1 jobs processed at 5.9643 j/s, 0 failed ...

event 2:

[2011-06-13 12:10:31,DEBUG,ip-10-126-214-147(18937)] KtEventReport 82245253 completed after 0.1443
[2011-06-13 12:10:31,INFO,ip-10-126-214-147(18937)] ,host:ip-10-126-214-147 pid:18937] 2 jobs processed at 6.2623 j/s, 0 failed ...

event 3:

[2011-06-13 12:10:31,DEBUG,ip-10-126-214-147(18944)] KtEventReport 82245256 completed after 0.1428
[2011-06-13 12:10:31,INFO,ip-10-126-214-147(18944)] ,host:ip-10-126-214-147 pid:18944] 1 jobs processed at 6.5080 j/s, 0 failed ...

event 4:

[2011-06-13 12:10:31,DEBUG,ip-10-126-214-147(18925)] KtUserReport 82245257 completed after 0.1455
[2011-06-13 12:10:31,INFO,ip-10-126-214-147(18925)] ,host:ip-10-126-214-147 pid:18925] 1 jobs processed at 5.9259 j/s, 0 failed ...

event 5:

[2011-06-13 12:10:37,DEBUG,ip-10-126-214-147(18952)] KtEventReport 82245275 completed after 0.1444
[2011-06-13 12:10:37,DEBUG,ip-10-126-214-147(18952)] KtEventReport 82245282 completed after 0.1435
[2011-06-13 12:10:37,INFO,ip-10-126-214-147(18952)] ,host:ip-10-126-214-147 pid:18952] 2 jobs processed at 6.4665 j/s, 0 failed ..
0 Karma
1 Solution

mw
Splunk Employee
Splunk Employee

Assuming that you've extracted out that field as "pid", it's just a matter of using a transaction:

sourcetype=my_sourcetype | transaction pid

View solution in original post

mw
Splunk Employee
Splunk Employee

Assuming that you've extracted out that field as "pid", it's just a matter of using a transaction:

sourcetype=my_sourcetype | transaction pid

builder
Path Finder

By the way, I modified the logging so that the pid would be auto-extracted (e.g., pid=X) and then ran your search. It's perfect!

0 Karma

builder
Path Finder

That's what I figured, but I'm new to this so I wanted to confirm. Thanks for the speedy replies!

0 Karma

dwaddle
SplunkTrust
SplunkTrust

mw's approach is the best you are going to do. When Splunk is parsing a file, each event has to consist of contiguous bytes of the file. Event separation and line breaking configuration basically allows you to define how many contiguous bytes are in an event. Once the end of that event is reached and the next event starts, you can't go back and append to the prior event.

mw
Splunk Employee
Splunk Employee

It would be possible if the lines weren't intertwined as they are, but given the fact that a pid spans across a period of time, you'd have to do it at search time.

0 Karma

builder
Path Finder

But what you're suggesting is a way of doing a search on already parsed events, right? I'm looking for a way of defining the sourcetype stanza in the props.conf so that the events are broken out this way in the first place.

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...