Solved: Re: Is there an easy way to pair two events with t...

ssiat479 · ‎09-11-2018

I am looking for an elegant solution to the following problem:
I want to summarize data from two different events which have the same sourcetype/index/etc, but which have identical values in two different fields.

Event A:
sourcetype= foo
ComputerName=homepc
FileName=example.exe
PID=3333
PPID=2222

Event B:
sourcetype=foo
ComputerName=homepc
FileName=parent.exe
PID=2222
PPID=1111

I want to group data from both events into one summarized line like follows:

ComputerName......FileName...........PID.........ParentFileName.......PPID
homepc...................example.exe......3333.......parent.exe................2222

I have attempted to accomplish this via JOIN and it does seem to work, but I am aware this is not an ideal solution:

index=_internal sourcetype=foo
| table ComputerName FileName PID PPID 
| rename FileName as Child_FileName, PID as Child_PID, PPID as Parent_PID
| join Parent_PID ComputerName
[ search index=_internal sourcetype=foo
| table ComputerName FileName  PID
| rename FileName as Parent_FileName, PID as Parent_PID ]

If the sourcetypes in the two searches were different, I know I could easily accomplish this via a string of 'eval's and stats. Thanks for any suggestions!

zonistj · ‎09-11-2018

Hello,

What would you like to see improved in the SPL you posted?

I wrote a solution that appears to work then realized that it's the same logic as your example:

| makeresults 
| eval sourcetype="foo",ComputerName = "homepc", FileName="example.exe",PID="3333",PPID="2222" 
| append 
    [| makeresults 
    | eval sourcetype="foo",ComputerName = "homepc", FileName="parent.exe",PID="2222",PPID="1111"] 
| join PPID 
    [| makeresults 
    | eval sourcetype="foo",ComputerName = "homepc", FileName="parent.exe",PID="2222",PPID="1111" 
    | rename PPID AS parent_PPID, FileName AS parent_FileName 
    | rename PID AS PPID 
    | fields parent_FileName,PPID,parent_PPID]

View solution in original post

zonistj · ‎09-11-2018

Hello,

What would you like to see improved in the SPL you posted?

I wrote a solution that appears to work then realized that it's the same logic as your example:

| makeresults 
| eval sourcetype="foo",ComputerName = "homepc", FileName="example.exe",PID="3333",PPID="2222" 
| append 
    [| makeresults 
    | eval sourcetype="foo",ComputerName = "homepc", FileName="parent.exe",PID="2222",PPID="1111"] 
| join PPID 
    [| makeresults 
    | eval sourcetype="foo",ComputerName = "homepc", FileName="parent.exe",PID="2222",PPID="1111" 
    | rename PPID AS parent_PPID, FileName AS parent_FileName 
    | rename PID AS PPID 
    | fields parent_FileName,PPID,parent_PPID]

ssiat479 · ‎09-11-2018

Hi!

I guess my question was to identify if there was in fact a better way than running the same search twice and joining them together. I was taught that 'join' should be avoided if at all possible. However, if it is the best solution I will keep it.

Thanks for the help!

zonistj · ‎09-11-2018

Thanks for the clarification!

join is the most efficient method that I know of for joining the two data sets in a "real-time" manner.

If you had a specific time period of data, you could use a lookup table and that would be more resource efficient. For example, if you were conducting a forensic investigation into a system and had a timeline of processes that ran with PID and PPID then you could run one query to create the lookup table and one query to get your results.

It would look like this:

CREATE LOOKUP TABLE

| makeresults 
 | eval sourcetype="foo",ComputerName = "homepc", FileName="example.exe",PID="3333",PPID="2222" 
 | append 
     [| makeresults 
     | eval sourcetype="foo",ComputerName = "homepc", FileName="parent.exe",PID="2222",PPID="1111"]
| append [| makeresults 
     | eval sourcetype="foo",ComputerName = "homepc", FileName="grandparent.exe",PID="1111",PPID="0"]
| stats values(FileName) AS FileName by PID
| outputlookup foo_bar_data.csv

SEARCH FOR DATA

| makeresults 
 | eval sourcetype="foo",ComputerName = "homepc", FileName="example.exe",PID="3333",PPID="2222" 
 | append 
     [| makeresults 
     | eval sourcetype="foo",ComputerName = "homepc", FileName="parent.exe",PID="2222",PPID="1111"]
| append [| makeresults 
     | eval sourcetype="foo",ComputerName = "homepc", FileName="grandparent.exe",PID="1111",PPID="0"]
| lookup foo_bar_data.csv PID AS PPID OUTPUTNEW FileName AS Parent_FileName

This might be a better approach depending on your exact use case.

kamlesh_vaghela · ‎09-11-2018

@ssiat479

Can you please try this?

index=_internal sourcetype=foo 
| fields ComputerName FileName PID 
| append  [ search index=_internal sourcetype=foo 
    | fields ComputerName FileName PID | eval PID=PPID | eval ParentFileName=FileName | fields - FileName ] | stats values(*) as * by ComputerName

ssiat479 · ‎09-11-2018

Thanks!

It does not seem to work as well as the above query which used JOIN.

The results only include three columns: ComputerName, FileName, and PID. PPID is not included and since results are grouped by ComputerName only, there is no way to correlate a PID to PPID/ParentFileName.

Is there an easy way to pair two events with the same sourcetype that have the same values in different fields?

.conf24 | Registration Open!

ICYMI - Check out the latest releases of Splunk Edge Processor

Introducing the 2024 SplunkTrust!