Solved: Re: break up events based on pid?

builder · ‎06-13-2011

I assume there is no way to do what I want, but I figured I'd ask anyway. I have a background job processor that logs data about the jobs, as follows.

[2011-06-13 12:10:31,DEBUG,ip-10-126-214-147(18959)] KtEventReport 82245248 completed after 0.1432
[2011-06-13 12:10:31,DEBUG,ip-10-126-214-147(18937)] KtEventReport 82245253 completed after 0.1443
[2011-06-13 12:10:31,INFO,ip-10-126-214-147(18959)] ,host:ip-10-126-214-147 pid:18959] 1 jobs processed at 5.9643 j/s, 0 failed ...
[2011-06-13 12:10:31,INFO,ip-10-126-214-147(18937)] ,host:ip-10-126-214-147 pid:18937] 2 jobs processed at 6.2623 j/s, 0 failed ...
[2011-06-13 12:10:31,DEBUG,ip-10-126-214-147(18944)] KtEventReport 82245256 completed after 0.1428
[2011-06-13 12:10:31,DEBUG,ip-10-126-214-147(18925)] KtUserReport 82245257 completed after 0.1455
[2011-06-13 12:10:31,INFO,ip-10-126-214-147(18944)] ,host:ip-10-126-214-147 pid:18944] 1 jobs processed at 6.5080 j/s, 0 failed ...
[2011-06-13 12:10:37,DEBUG,ip-10-126-214-147(18952)] KtEventReport 82245275 completed after 0.1444
[2011-06-13 12:10:31,INFO,ip-10-126-214-147(18925)] ,host:ip-10-126-214-147 pid:18925] 1 jobs processed at 5.9259 j/s, 0 failed ...
[2011-06-13 12:10:37,DEBUG,ip-10-126-214-147(18952)] KtEventReport 82245282 completed after 0.1435
[2011-06-13 12:10:37,INFO,ip-10-126-214-147(18952)] ,host:ip-10-126-214-147 pid:18952] 2 jobs processed at 6.4665 j/s, 0 failed ..

Note that the number in parenthesis at the end of the timestamp is the process ID of the process handling the job. I would like to break up my events into batches run for each process ID. However, the various batches for each process ID are interleaved in the logs. If I understand correctly, you can only break up the logs into events serially, so what I want to do probably isn't possible. However, if I am wrong about that and there is a way to do this, let me know! To be clear, the above log would be broken into the following events.

event 1:

[2011-06-13 12:10:31,DEBUG,ip-10-126-214-147(18959)] KtEventReport 82245248 completed after 0.1432
[2011-06-13 12:10:31,INFO,ip-10-126-214-147(18959)] ,host:ip-10-126-214-147 pid:18959] 1 jobs processed at 5.9643 j/s, 0 failed ...

event 2:

[2011-06-13 12:10:31,DEBUG,ip-10-126-214-147(18937)] KtEventReport 82245253 completed after 0.1443
[2011-06-13 12:10:31,INFO,ip-10-126-214-147(18937)] ,host:ip-10-126-214-147 pid:18937] 2 jobs processed at 6.2623 j/s, 0 failed ...

event 3:

[2011-06-13 12:10:31,DEBUG,ip-10-126-214-147(18944)] KtEventReport 82245256 completed after 0.1428
[2011-06-13 12:10:31,INFO,ip-10-126-214-147(18944)] ,host:ip-10-126-214-147 pid:18944] 1 jobs processed at 6.5080 j/s, 0 failed ...

event 4:

[2011-06-13 12:10:31,DEBUG,ip-10-126-214-147(18925)] KtUserReport 82245257 completed after 0.1455
[2011-06-13 12:10:31,INFO,ip-10-126-214-147(18925)] ,host:ip-10-126-214-147 pid:18925] 1 jobs processed at 5.9259 j/s, 0 failed ...

event 5:

[2011-06-13 12:10:37,DEBUG,ip-10-126-214-147(18952)] KtEventReport 82245275 completed after 0.1444
[2011-06-13 12:10:37,DEBUG,ip-10-126-214-147(18952)] KtEventReport 82245282 completed after 0.1435
[2011-06-13 12:10:37,INFO,ip-10-126-214-147(18952)] ,host:ip-10-126-214-147 pid:18952] 2 jobs processed at 6.4665 j/s, 0 failed ..

mw · ‎06-13-2011

Assuming that you've extracted out that field as "pid", it's just a matter of using a transaction:

sourcetype=my_sourcetype | transaction pid

View solution in original post

mw · ‎06-13-2011

Assuming that you've extracted out that field as "pid", it's just a matter of using a transaction:

sourcetype=my_sourcetype | transaction pid

builder · ‎06-14-2011

By the way, I modified the logging so that the pid would be auto-extracted (e.g., pid=X) and then ran your search. It's perfect!

builder · ‎06-14-2011

That's what I figured, but I'm new to this so I wanted to confirm. Thanks for the speedy replies!

dwaddle · ‎06-13-2011

mw's approach is the best you are going to do. When Splunk is parsing a file, each event has to consist of contiguous bytes of the file. Event separation and line breaking configuration basically allows you to define how many contiguous bytes are in an event. Once the end of that event is reached and the next event starts, you can't go back and append to the prior event.

mw · ‎06-13-2011

It would be possible if the lines weren't intertwined as they are, but given the fact that a pid spans across a period of time, you'd have to do it at search time.

builder · ‎06-13-2011

But what you're suggesting is a way of doing a search on already parsed events, right? I'm looking for a way of defining the sourcetype stanza in the props.conf so that the events are broken out this way in the first place.

break up events based on pid?

Data Management Digest – December 2025

Index This | What is broken 80% of the time by February?

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...

Join the Conversation

break up events based on pid?

Data Management Digest – December 2025

Index This | What is broken 80% of the time by February?

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...