Getting Data In

break up events based on pid?

Path Finder

I assume there is no way to do what I want, but I figured I'd ask anyway. I have a background job processor that logs data about the jobs, as follows.

[2011-06-13 12:10:31,DEBUG,ip-10-126-214-147(18959)] KtEventReport 82245248 completed after 0.1432
[2011-06-13 12:10:31,DEBUG,ip-10-126-214-147(18937)] KtEventReport 82245253 completed after 0.1443
[2011-06-13 12:10:31,INFO,ip-10-126-214-147(18959)] ,host:ip-10-126-214-147 pid:18959] 1 jobs processed at 5.9643 j/s, 0 failed ...
[2011-06-13 12:10:31,INFO,ip-10-126-214-147(18937)] ,host:ip-10-126-214-147 pid:18937] 2 jobs processed at 6.2623 j/s, 0 failed ...
[2011-06-13 12:10:31,DEBUG,ip-10-126-214-147(18944)] KtEventReport 82245256 completed after 0.1428
[2011-06-13 12:10:31,DEBUG,ip-10-126-214-147(18925)] KtUserReport 82245257 completed after 0.1455
[2011-06-13 12:10:31,INFO,ip-10-126-214-147(18944)] ,host:ip-10-126-214-147 pid:18944] 1 jobs processed at 6.5080 j/s, 0 failed ...
[2011-06-13 12:10:37,DEBUG,ip-10-126-214-147(18952)] KtEventReport 82245275 completed after 0.1444
[2011-06-13 12:10:31,INFO,ip-10-126-214-147(18925)] ,host:ip-10-126-214-147 pid:18925] 1 jobs processed at 5.9259 j/s, 0 failed ...
[2011-06-13 12:10:37,DEBUG,ip-10-126-214-147(18952)] KtEventReport 82245282 completed after 0.1435
[2011-06-13 12:10:37,INFO,ip-10-126-214-147(18952)] ,host:ip-10-126-214-147 pid:18952] 2 jobs processed at 6.4665 j/s, 0 failed ..

Note that the number in parenthesis at the end of the timestamp is the process ID of the process handling the job. I would like to break up my events into batches run for each process ID. However, the various batches for each process ID are interleaved in the logs. If I understand correctly, you can only break up the logs into events serially, so what I want to do probably isn't possible. However, if I am wrong about that and there is a way to do this, let me know! To be clear, the above log would be broken into the following events.

event 1:

[2011-06-13 12:10:31,DEBUG,ip-10-126-214-147(18959)] KtEventReport 82245248 completed after 0.1432
[2011-06-13 12:10:31,INFO,ip-10-126-214-147(18959)] ,host:ip-10-126-214-147 pid:18959] 1 jobs processed at 5.9643 j/s, 0 failed ...

event 2:

[2011-06-13 12:10:31,DEBUG,ip-10-126-214-147(18937)] KtEventReport 82245253 completed after 0.1443
[2011-06-13 12:10:31,INFO,ip-10-126-214-147(18937)] ,host:ip-10-126-214-147 pid:18937] 2 jobs processed at 6.2623 j/s, 0 failed ...

event 3:

[2011-06-13 12:10:31,DEBUG,ip-10-126-214-147(18944)] KtEventReport 82245256 completed after 0.1428
[2011-06-13 12:10:31,INFO,ip-10-126-214-147(18944)] ,host:ip-10-126-214-147 pid:18944] 1 jobs processed at 6.5080 j/s, 0 failed ...

event 4:

[2011-06-13 12:10:31,DEBUG,ip-10-126-214-147(18925)] KtUserReport 82245257 completed after 0.1455
[2011-06-13 12:10:31,INFO,ip-10-126-214-147(18925)] ,host:ip-10-126-214-147 pid:18925] 1 jobs processed at 5.9259 j/s, 0 failed ...

event 5:

[2011-06-13 12:10:37,DEBUG,ip-10-126-214-147(18952)] KtEventReport 82245275 completed after 0.1444
[2011-06-13 12:10:37,DEBUG,ip-10-126-214-147(18952)] KtEventReport 82245282 completed after 0.1435
[2011-06-13 12:10:37,INFO,ip-10-126-214-147(18952)] ,host:ip-10-126-214-147 pid:18952] 2 jobs processed at 6.4665 j/s, 0 failed ..
0 Karma
1 Solution

Splunk Employee
Splunk Employee

Assuming that you've extracted out that field as "pid", it's just a matter of using a transaction:

sourcetype=my_sourcetype | transaction pid

View solution in original post

Splunk Employee
Splunk Employee

Assuming that you've extracted out that field as "pid", it's just a matter of using a transaction:

sourcetype=my_sourcetype | transaction pid

View solution in original post

Path Finder

By the way, I modified the logging so that the pid would be auto-extracted (e.g., pid=X) and then ran your search. It's perfect!

0 Karma

Path Finder

That's what I figured, but I'm new to this so I wanted to confirm. Thanks for the speedy replies!

0 Karma

SplunkTrust
SplunkTrust

mw's approach is the best you are going to do. When Splunk is parsing a file, each event has to consist of contiguous bytes of the file. Event separation and line breaking configuration basically allows you to define how many contiguous bytes are in an event. Once the end of that event is reached and the next event starts, you can't go back and append to the prior event.

Splunk Employee
Splunk Employee

It would be possible if the lines weren't intertwined as they are, but given the fact that a pid spans across a period of time, you'd have to do it at search time.

0 Karma

Path Finder

But what you're suggesting is a way of doing a search on already parsed events, right? I'm looking for a way of defining the sourcetype stanza in the props.conf so that the events are broken out this way in the first place.

0 Karma
State of Splunk Careers

Access the Splunk Careers Report to see real data that shows how Splunk mastery increases your value and job satisfaction.

Find out what your skills are worth!