I assume there is no way to do what I want, but I figured I'd ask anyway. I have a background job processor that logs data about the jobs, as follows.
[2011-06-13 12:10:31,DEBUG,ip-10-126-214-147(18959)] KtEventReport 82245248 completed after 0.1432
[2011-06-13 12:10:31,DEBUG,ip-10-126-214-147(18937)] KtEventReport 82245253 completed after 0.1443
[2011-06-13 12:10:31,INFO,ip-10-126-214-147(18959)] ,host:ip-10-126-214-147 pid:18959] 1 jobs processed at 5.9643 j/s, 0 failed ...
[2011-06-13 12:10:31,INFO,ip-10-126-214-147(18937)] ,host:ip-10-126-214-147 pid:18937] 2 jobs processed at 6.2623 j/s, 0 failed ...
[2011-06-13 12:10:31,DEBUG,ip-10-126-214-147(18944)] KtEventReport 82245256 completed after 0.1428
[2011-06-13 12:10:31,DEBUG,ip-10-126-214-147(18925)] KtUserReport 82245257 completed after 0.1455
[2011-06-13 12:10:31,INFO,ip-10-126-214-147(18944)] ,host:ip-10-126-214-147 pid:18944] 1 jobs processed at 6.5080 j/s, 0 failed ...
[2011-06-13 12:10:37,DEBUG,ip-10-126-214-147(18952)] KtEventReport 82245275 completed after 0.1444
[2011-06-13 12:10:31,INFO,ip-10-126-214-147(18925)] ,host:ip-10-126-214-147 pid:18925] 1 jobs processed at 5.9259 j/s, 0 failed ...
[2011-06-13 12:10:37,DEBUG,ip-10-126-214-147(18952)] KtEventReport 82245282 completed after 0.1435
[2011-06-13 12:10:37,INFO,ip-10-126-214-147(18952)] ,host:ip-10-126-214-147 pid:18952] 2 jobs processed at 6.4665 j/s, 0 failed ..
Note that the number in parenthesis at the end of the timestamp is the process ID of the process handling the job. I would like to break up my events into batches run for each process ID. However, the various batches for each process ID are interleaved in the logs. If I understand correctly, you can only break up the logs into events serially, so what I want to do probably isn't possible. However, if I am wrong about that and there is a way to do this, let me know! To be clear, the above log would be broken into the following events.
event 1:
[2011-06-13 12:10:31,DEBUG,ip-10-126-214-147(18959)] KtEventReport 82245248 completed after 0.1432
[2011-06-13 12:10:31,INFO,ip-10-126-214-147(18959)] ,host:ip-10-126-214-147 pid:18959] 1 jobs processed at 5.9643 j/s, 0 failed ...
event 2:
[2011-06-13 12:10:31,DEBUG,ip-10-126-214-147(18937)] KtEventReport 82245253 completed after 0.1443
[2011-06-13 12:10:31,INFO,ip-10-126-214-147(18937)] ,host:ip-10-126-214-147 pid:18937] 2 jobs processed at 6.2623 j/s, 0 failed ...
event 3:
[2011-06-13 12:10:31,DEBUG,ip-10-126-214-147(18944)] KtEventReport 82245256 completed after 0.1428
[2011-06-13 12:10:31,INFO,ip-10-126-214-147(18944)] ,host:ip-10-126-214-147 pid:18944] 1 jobs processed at 6.5080 j/s, 0 failed ...
event 4:
[2011-06-13 12:10:31,DEBUG,ip-10-126-214-147(18925)] KtUserReport 82245257 completed after 0.1455
[2011-06-13 12:10:31,INFO,ip-10-126-214-147(18925)] ,host:ip-10-126-214-147 pid:18925] 1 jobs processed at 5.9259 j/s, 0 failed ...
event 5:
[2011-06-13 12:10:37,DEBUG,ip-10-126-214-147(18952)] KtEventReport 82245275 completed after 0.1444
[2011-06-13 12:10:37,DEBUG,ip-10-126-214-147(18952)] KtEventReport 82245282 completed after 0.1435
[2011-06-13 12:10:37,INFO,ip-10-126-214-147(18952)] ,host:ip-10-126-214-147 pid:18952] 2 jobs processed at 6.4665 j/s, 0 failed ..
Assuming that you've extracted out that field as "pid", it's just a matter of using a transaction:
sourcetype=my_sourcetype | transaction pid
Assuming that you've extracted out that field as "pid", it's just a matter of using a transaction:
sourcetype=my_sourcetype | transaction pid
By the way, I modified the logging so that the pid would be auto-extracted (e.g., pid=X) and then ran your search. It's perfect!
That's what I figured, but I'm new to this so I wanted to confirm. Thanks for the speedy replies!
mw's approach is the best you are going to do. When Splunk is parsing a file, each event has to consist of contiguous bytes of the file. Event separation and line breaking configuration basically allows you to define how many contiguous bytes are in an event. Once the end of that event is reached and the next event starts, you can't go back and append to the prior event.
It would be possible if the lines weren't intertwined as they are, but given the fact that a pid spans across a period of time, you'd have to do it at search time.
But what you're suggesting is a way of doing a search on already parsed events, right? I'm looking for a way of defining the sourcetype stanza in the props.conf so that the events are broken out this way in the first place.