Splunk Search

Is there an easy way to pair two events with the same sourcetype that have the same values in different fields?

ssiat479
Engager

I am looking for an elegant solution to the following problem:
I want to summarize data from two different events which have the same sourcetype/index/etc, but which have identical values in two different fields.

Event A:
sourcetype= foo
ComputerName=homepc
FileName=example.exe
PID=3333
PPID=2222

Event B:
sourcetype=foo
ComputerName=homepc
FileName=parent.exe
PID=2222
PPID=1111

I want to group data from both events into one summarized line like follows:

ComputerName......FileName...........PID.........ParentFileName.......PPID
homepc...................example.exe......3333.......parent.exe................2222

I have attempted to accomplish this via JOIN and it does seem to work, but I am aware this is not an ideal solution:

index=_internal sourcetype=foo
| table ComputerName FileName PID PPID 
| rename FileName as Child_FileName, PID as Child_PID, PPID as Parent_PID
| join Parent_PID ComputerName
[ search index=_internal sourcetype=foo
| table ComputerName FileName  PID
| rename FileName as Parent_FileName, PID as Parent_PID ]

If the sourcetypes in the two searches were different, I know I could easily accomplish this via a string of 'eval's and stats. Thanks for any suggestions!

0 Karma
1 Solution

zonistj
Path Finder

Hello,

What would you like to see improved in the SPL you posted?

I wrote a solution that appears to work then realized that it's the same logic as your example:

| makeresults 
| eval sourcetype="foo",ComputerName = "homepc", FileName="example.exe",PID="3333",PPID="2222" 
| append 
    [| makeresults 
    | eval sourcetype="foo",ComputerName = "homepc", FileName="parent.exe",PID="2222",PPID="1111"] 
| join PPID 
    [| makeresults 
    | eval sourcetype="foo",ComputerName = "homepc", FileName="parent.exe",PID="2222",PPID="1111" 
    | rename PPID AS parent_PPID, FileName AS parent_FileName 
    | rename PID AS PPID 
    | fields parent_FileName,PPID,parent_PPID]

View solution in original post

0 Karma

zonistj
Path Finder

Hello,

What would you like to see improved in the SPL you posted?

I wrote a solution that appears to work then realized that it's the same logic as your example:

| makeresults 
| eval sourcetype="foo",ComputerName = "homepc", FileName="example.exe",PID="3333",PPID="2222" 
| append 
    [| makeresults 
    | eval sourcetype="foo",ComputerName = "homepc", FileName="parent.exe",PID="2222",PPID="1111"] 
| join PPID 
    [| makeresults 
    | eval sourcetype="foo",ComputerName = "homepc", FileName="parent.exe",PID="2222",PPID="1111" 
    | rename PPID AS parent_PPID, FileName AS parent_FileName 
    | rename PID AS PPID 
    | fields parent_FileName,PPID,parent_PPID]
0 Karma

ssiat479
Engager

Hi!

I guess my question was to identify if there was in fact a better way than running the same search twice and joining them together. I was taught that 'join' should be avoided if at all possible. However, if it is the best solution I will keep it.

Thanks for the help!

0 Karma

zonistj
Path Finder

Thanks for the clarification!

join is the most efficient method that I know of for joining the two data sets in a "real-time" manner.

If you had a specific time period of data, you could use a lookup table and that would be more resource efficient. For example, if you were conducting a forensic investigation into a system and had a timeline of processes that ran with PID and PPID then you could run one query to create the lookup table and one query to get your results.

It would look like this:

CREATE LOOKUP TABLE

| makeresults 
 | eval sourcetype="foo",ComputerName = "homepc", FileName="example.exe",PID="3333",PPID="2222" 
 | append 
     [| makeresults 
     | eval sourcetype="foo",ComputerName = "homepc", FileName="parent.exe",PID="2222",PPID="1111"]
| append [| makeresults 
     | eval sourcetype="foo",ComputerName = "homepc", FileName="grandparent.exe",PID="1111",PPID="0"]
| stats values(FileName) AS FileName by PID
| outputlookup foo_bar_data.csv

SEARCH FOR DATA

| makeresults 
 | eval sourcetype="foo",ComputerName = "homepc", FileName="example.exe",PID="3333",PPID="2222" 
 | append 
     [| makeresults 
     | eval sourcetype="foo",ComputerName = "homepc", FileName="parent.exe",PID="2222",PPID="1111"]
| append [| makeresults 
     | eval sourcetype="foo",ComputerName = "homepc", FileName="grandparent.exe",PID="1111",PPID="0"]
| lookup foo_bar_data.csv PID AS PPID OUTPUTNEW FileName AS Parent_FileName

This might be a better approach depending on your exact use case.

0 Karma

kamlesh_vaghela
SplunkTrust
SplunkTrust

@ssiat479

Can you please try this?

index=_internal sourcetype=foo 
| fields ComputerName FileName PID 
| append  [ search index=_internal sourcetype=foo 
    | fields ComputerName FileName PID | eval PID=PPID | eval ParentFileName=FileName | fields - FileName ] | stats values(*) as * by ComputerName
0 Karma

ssiat479
Engager

Thanks!

It does not seem to work as well as the above query which used JOIN.

The results only include three columns: ComputerName, FileName, and PID. PPID is not included and since results are grouped by ComputerName only, there is no way to correlate a PID to PPID/ParentFileName.

0 Karma
Get Updates on the Splunk Community!

What’s new on Splunk Lantern in August

This month’s Splunk Lantern update gives you the low-down on all of the articles we’ve published over the past ...

Welcome to the Future of Data Search & Exploration

You have more data coming at you than ever before. Over the next five years, the total amount of digital data ...

This Week's Community Digest - Splunk Community Happenings [8.3.22]

Get the latest news and updates from the Splunk Community here! News From Splunk Answers ✍️ Splunk Answers is ...