Splunk Search

Reduce a list of possible names to the most common values?


I have a list of malware vendors and associated malware names, each in its own field from spath JSON output. Is there a way to parse and extract the most common names from the list?

for example:

Gen:Variant.Barys.10219         Gen:Variant.Barys.10219 null    Trojan.Disfa!6vVRDVDpUts    Win-Trojan/Zbot.24064           Trojan/MSIL.Disfa       MSIL:GenMalicious-AV [Trj]  BDS/Bladabindi.ajoqp        Gen:Variant.Barys.10219 W32.FipaletAAK.Trojan   null    Backdoor.Bladabindi.AL3 null    null        Backdoor.MSIL.Bladabindi.A  null    BackDoor.Bladabindi.1056        Gen:Variant.Barys.10219 (B) W32/MSIL_Bladabindi.G.gen!Eldorado  Gen:Variant.Barys.10219     Gen:Variant.Barys.10219     null    Trojan ( 700000121 )    Trojan ( 700000121 )    Trojan.MSIL.Disfa.bop   Win32.Troj.Undef.(kcloud)   Backdoor.Bladabindi.Gen BehavesLike.Win32.BackdoorNJRat.mm  BackDoor-NJRat!EC56BB70A034 Gen:Variant.Barys.10219 Backdoor:MSIL/Bladabindi.AJ Trojan.Win32.DownLoader11.cxfbrl    Bladabindi.JQ           Trojan.Agent/Gen-Bladabindi Troj/DotNet-P   Backdoor.Ratenjay   null    null    Win32/DotNetDl.A!generic    BKDR_BLBINDI.SMN    BKDR_BLBINDI.SMN        Backdoor.MSIL.Bladabindi.a (v)  null    Trojan.Disfa.Win32.10565        Trojan/W32.Agent.24064.TS
  1. Bladabindi
  2. Barsys
  3. NJRat
0 Karma


If these are actually field names and not values, you could do this

| fieldsummary 
| rename field as malware
| search malware!="date_*" AND malware!="source" AND malware!="host" AND 
     malware!="sourcetype" AND malware!="index" AND malware!="linecount" AND 
     malware!="splunk_server" AND malware!="timeendpos" AND malware!="timestartpos"
| fields malware count | sort -count

The search in the middle removes all the fields that you don't want - it might not be a complete list, but I included the typical default fields. If your list of fields is very long, you might use a lookup table instead of the search.

0 Karma


Each name is in its own field? That doesn't seem desirable for your use case.

If it's just a list of names separated by white space, I think you should extract them all into one field called "name" first. Then, we can look at the list and help you rex the name field into the common names (Bladabindi, Barsys, etc.). After the data is normalized into common names, then we can tell Splunk to look for non-rare values.

What happens if you do something like: yoursearch | rex "(?\S+)\s+" max_match=0

0 Karma


Unfortunately, these come in JSON with individual field names:

scans.ALYac.result scans.AVG.result scans.AVware.result scans.Ad-Aware.result scans.AegisLab.result scans.Agnitum.result scans.AhnLab-V3.result scans.Alibaba.result scans.AntiVir.result scans.Antiy-AVL.result scans.Arcabit.result scans.Avast.result scans.Avira.result scans.Baidu-International.result scans.BitDefender.result scans.Bkav.result scans.ByteHero.result scans.CAT-QuickHeal.result scans.CMC.result scans.ClamAV.result scans.Commtouch.result scans.Comodo.result scans.Cyren.result scans.DrWeb.result scans.ESET-NOD32.result scans.Emsisoft.result scans.F-Prot.result scans.F-Secure.result scans.Fortinet.result scans.GData.result scans.Ikarus.result scans.Jiangmin.result scans.K7AntiVirus.result scans.K7GW.result scans.Kaspersky.result scans.Kingsoft.result scans.Malwarebytes.result scans.McAfee-GW-Edition.result scans.McAfee.result scans.MicroWorld-eScan.result scans.Microsoft.result scans.NANO-Antivirus.result scans.Norman.result scans.PCTools.result scans.Rising.result scans.SUPERAntiSpyware.result scans.Sophos.result scans.Symantec.result scans.Tencent.result scans.TheHacker.result scans.TotalDefense.result scans.TrendMicro-HouseCall.result scans.TrendMicro.result scans.VBA32.result scans.VIPRE.result scans.ViRobot.result scans.Zillya.result scans.Zoner.result scans.nProtect.result

0 Karma
Get Updates on the Splunk Community!

Stay Connected: Your Guide to July and August Tech Talks, Office Hours, and Webinars!

Dive into our sizzling summer lineup for July and August Community Office Hours and Tech Talks. Scroll down to ...

Edge Processor Scaling, Energy & Manufacturing Use Cases, and More New Articles on ...

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...

Get More Out of Your Security Practice With a SIEM

Get More Out of Your Security Practice With a SIEMWednesday, July 31, 2024  |  11AM PT / 2PM ETREGISTER ...