Splunk Search

How do to a "dedup" on a single event?

ZacEsa
Communicator

I have events which have multiple of the same fields but with different values.

E.g;
Event 1: deviceName="device1" appName="app1" appName="app1" appName ="app1" appName="app2" appName="app2" appName="app3"
Event 2: deviceName="device1" appName="app1" appName="app2" appName="app2"
Event 3: deviceName="device2" appName="app1" appName="app2" appName="app3"

I want to "dedup" only for event, not for multiple event. How do I do so?

Current table output:
| deviceName | appName | count |
| device1 | app2 | 4 |
| device1 | app1 | 3 |
| device1 | app3 | 1 |
| device2 | app1 | 1 |
| device2 | app2 | 1 |
| device2 | app3 | 1 |

Table I want:
| deviceName | appName | count |
| device1 | app1 | 2 |
| device1 | app2 | 2 |
| device1 | app3 | 1 |
| device2 | app1 | 1 |
| device2 | app2 | 1 |
| device2 | app3 | 1 |

My apologies if the question is unclear. Not really sure how to explain it.

0 Karma

ZacEsa
Communicator

I think I managed to get it working! Posting my answer here. If anyone has any suggestions on optimizing it or, think that it may not work, please do let me know!

(search)                                                  //My initial search string
 | rename threatInfo.indicators{}.applicationName as "appName", threatInfo.indicators{}.indicatorName as "indicatorName" //Renames the JSON field names to something easier (and also mv commands sometimes doesn't work with JSON field names)
 | eval appindName=mvzip(appName, indicatorName)          //Zip up the two related fields
 | mvexpand appindName                                    //Expands the zipped up multi-value fields into their own events
 | makemv appindName delim=","                            //Converts into multi-value
 | eval appName=mvindex(appindName, 0)                    //Assigns the first value into appName
 | eval indicatorName=mvindex(appindName, 1)              //Assigns the second value into indicatorName
 | top deviceInfo.deviceName appName indicatorName        //Top to get top by deviceName, appName and indicatorName
 | fields - percent                                       //To remove the percent field in the table
 | where indicatorName="DETECTED_MALWARE_APP" OR indicatorName="DETECTED_SUSPECT_APP" OR indicatorName="DETECTED_PUP_APP"                                              //To only show events which have the stated as indicatorName
0 Karma

ZacEsa
Communicator

Unaccepting as answer as it seems indicatorName="DETECTED_SUSPECT_APP" is not showing if I select a wider time range. Not sure what the issue is.

0 Karma

woodcock
Esteemed Legend

I am not sure I understand but what I think you are saying is that you desire to dedup the appName field within each event, right?
Try this:

| makeresults 
| eval raw="deviceName=device1 appName=app1,app1,app1,app2,app2,app3::deviceName=device1 appName=app1,app2,app2::deviceName=device2 appName=app1,app2,app3"
| makemv delim="::" raw
| mvexpand raw
| rename raw AS _raw
| rex "^deviceName=(?<deviceName>\S+)\s+appName=(?<appName>\S+)$"
| fields - _*
| makemv delim="," appName

| rename COMMENT AS "Everything above generates sample event data; everything below is your solution"

| streamstats current=t window=1 values(appName) AS appName
| stats count by appName deviceName
0 Karma

ZacEsa
Communicator

Not exactly. Refer to the other answers with the more info in the event. Each event has multiple appName field which comes with an indicatorName field. What I'm trying to do is get the appNames which has an indicatorName of "DETECTED_SUSPECT_APP". But right now, however way I put it, I'm getting all the appNames in the event, even if they don't have "DETECTED_SUSPECT_APP" as their indicatorName.

0 Karma

niketn
Legend

@ZacEsa, from your sample data in the question, your current output and your expected output it is indeed not quite clear as to what stats you want to perform for dedup. However here is a run anywhere stats for one of the sample JSON event in your comment. See if this is what you need. If not you can adjust the final stats as per what you need.

| makeresults
| eval _raw="{\"threatInfo\": { 
   \"incidentId\":  \"123ABC\", 
   \"indicators\": [ 
    { 
       \"applicationName\":  \"Setup.exe\", 
       \"indicatorName\":  \"RUN_SUSPECT_APP\", 
       \"sha256Hash\":  123456
    }, 
    { 
       \"applicationName\":  \"Setup.exe\", 
       \"indicatorName\":  \"DETECTED_SUSPECT_APP\", 
       \"sha256Hash\":  123456 
    }, 
    { 
       \"applicationName\":  \"WinRAR.exe\", 
       \"indicatorName\":  \"MODIFY_PROCESS\", 
       \"sha256Hash\":  654321 
    }, 
    { 
       \"applicationName\":  \"iexplore.exe\", 
       \"indicatorName\":  \"RUN_UNKNOWN_APP\", 
       \"sha256Hash\":  987654
    }, 
    { 
       \"applicationName\":  \"WinRAR.exe\", 
       \"indicatorName\":  \"UNKNOWN_APP\", 
       \"sha256Hash\":  654321 
    }, 
    {  
       \"applicationName\":  \"iexplore.exe\", 
       \"indicatorName\":  \"RUN_ANOTHER_APP\", 
       \"sha256Hash\":  987654 
    }, 
    { 
       \"applicationName\":  \"Setup.exe\", 
       \"indicatorName\":  \"POLICY_TERMINATE\", 
       \"sha256Hash\":  123456 
    } 
  ]
  }
}"
| spath
| rename "threatInfo.indicators{}.*" as "*"
| rename "threatInfo.*" as "*"
| eval data=mvzip(applicationName,mvzip(indicatorName,sha256Hash))
| table incidentId data
| mvexpand data
| eval data=split(data,",")
| eval applicationName=mvindex(data,0)
| eval indicatorName=mvindex(data,1)
| eval sha256Hash=mvindex(data,2)
| fields - data
| stats count(indicatorName) as Count values(indicatorName) as indicatorNames by incidentId applicationName
____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma

ZacEsa
Communicator

That got it much closer however, it's still showing the other indicatorNames for Setup.exe. I used where indicatorNames="DETECTED_SUSPECT_APP" to remove other applicationNames which does not have that indicatorName.

0 Karma

niketn
Legend

Sorry I think I missed this requirement. However, then I dont know whether you need final stats command. Can you try the follwowing:

| makeresults
| eval _raw="{\"threatInfo\": { 
   \"incidentId\":  \"123ABC\", 
   \"indicators\": [ 
    { 
       \"applicationName\":  \"Setup.exe\", 
       \"indicatorName\":  \"RUN_SUSPECT_APP\", 
       \"sha256Hash\":  123456
    }, 
    { 
       \"applicationName\":  \"Setup.exe\", 
       \"indicatorName\":  \"DETECTED_SUSPECT_APP\", 
       \"sha256Hash\":  123456 
    }, 
    { 
       \"applicationName\":  \"WinRAR.exe\", 
       \"indicatorName\":  \"MODIFY_PROCESS\", 
       \"sha256Hash\":  654321 
    }, 
    { 
       \"applicationName\":  \"iexplore.exe\", 
       \"indicatorName\":  \"RUN_UNKNOWN_APP\", 
       \"sha256Hash\":  987654
    }, 
    { 
       \"applicationName\":  \"WinRAR.exe\", 
       \"indicatorName\":  \"UNKNOWN_APP\", 
       \"sha256Hash\":  654321 
    }, 
    {  
       \"applicationName\":  \"iexplore.exe\", 
       \"indicatorName\":  \"RUN_ANOTHER_APP\", 
       \"sha256Hash\":  987654 
    }, 
    { 
       \"applicationName\":  \"Setup.exe\", 
       \"indicatorName\":  \"POLICY_TERMINATE\", 
       \"sha256Hash\":  123456 
    } 
  ]
  }
}"
| spath
| rename "threatInfo.indicators{}.*" as "*"
| rename "threatInfo.*" as "*"
| eval data=mvzip(applicationName,mvzip(indicatorName,sha256Hash))
| table incidentId data
| mvexpand data
| eval data=split(data,",")
| eval applicationName=mvindex(data,0)
| eval indicatorName=mvindex(data,1)
| eval sha256Hash=mvindex(data,2)
| fields - data
| search indicatorName="DETECTED_SUSPECT_APP"

Replace commands till | spath with your base search and test.

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma

ZacEsa
Communicator

This one removed the rest but, the event came out as two same events instead. I managed to change it to make it into a table but that got worse as the count became 4.

 | spath
 | rename "threatInfo.indicators{}.*" as "*"
 | rename "threatInfo.*" as "*"
 | eval data=mvzip(applicationName,indicatorName)
 | top deviceInfo.deviceName data
 | mvexpand data
 | eval data=split(data,",")
 | eval applicationName=mvindex(data,0)
 | eval indicatorName=mvindex(data,1)
 | fields - data percent
 | search indicatorName="DETECTED_SUSPECT_APP"
 | table deviceInfo.deviceName applicationName indicatorName count
0 Karma

niketn
Legend

What do you need to count, please give example from this data? Was the count in my first query correct just that you needed one indicatorName?

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma

HiroshiSatoh
Champion

Try this!

(your search)
|eval Event_number=1| accum Event_number
|rex field=appName max_match=0 "appName=\"(?<appName>[^\s]*)\""
|mvexpand appName
|dedup Event_number deviceName appName
|stats count by deviceName appName

I add a sequence number to the event and make it unique.

0 Karma

ZacEsa
Communicator

Thanks! It sort of works. Haha. Unfortunately, my event turns out to be more complicated. 😕

Below is an example of the event;

threatInfo: { [-] 
  incidentId:  123ABC 
  indicators: [ [-] 
   { [-] 
      applicationName:  Setup.exe 
      indicatorName:  RUN_SUSPECT_APP 
      sha256Hash:  123456 
   } 
   { [-] 
      applicationName:  Setup.exe 
      indicatorName:  DETECTED_SUSPECT_APP 
      sha256Hash:  123456 
   } 
   { [-] 
      applicationName:  WinRAR.exe 
      indicatorName:  MODIFY_PROCESS 
      sha256Hash:  654321 
   } 
   { [-] 
      applicationName:  iexplore.exe 
      indicatorName:  RUN_UNKNOWN_APP 
      sha256Hash:  987654
   } 
   { [-] 
      applicationName:  WinRAR.exe 
      indicatorName:  UNKNOWN_APP 
      sha256Hash:  654321 
   } 
   { [-] 
      applicationName:  iexplore.exe 
      indicatorName:  RUN_ANOTHER_APP 
      sha256Hash:  987654 
   } 
   { [-] 
      applicationName:  Setup.exe 
      indicatorName:  POLICY_TERMINATE 
      sha256Hash:  123456 
   } 
 ] 

Basically, I only want to show the applicationName which has DETECTED_SUSPECT_APP as an indicatorName.

0 Karma

HiroshiSatoh
Champion

Please extract the field with rex in the same way.

0 Karma

ZacEsa
Communicator

Doesn't seem to work. Each applicationName keeps taking the other indicatorNames too.

0 Karma
Get Updates on the Splunk Community!

Earn a $35 Gift Card for Answering our Splunk Admins & App Developer Survey

Survey for Splunk Admins and App Developers is open now! | Earn a $35 gift card!      Hello there,  Splunk ...

Continuing Innovation & New Integrations Unlock Full Stack Observability For Your ...

You’ve probably heard the latest about AppDynamics joining the Splunk Observability portfolio, deepening our ...

Monitoring Amazon Elastic Kubernetes Service (EKS)

As we’ve seen, integrating Kubernetes environments with Splunk Observability Cloud is a quick and easy way to ...