Splunk Search

Reduce a list of possible names to the most common values?

david_rundle_fi
Explorer

I have a list of malware vendors and associated malware names, each in its own field from spath JSON output. Is there a way to parse and extract the most common names from the list?

for example:

Gen:Variant.Barys.10219         Gen:Variant.Barys.10219 null    Trojan.Disfa!6vVRDVDpUts    Win-Trojan/Zbot.24064           Trojan/MSIL.Disfa       MSIL:GenMalicious-AV [Trj]  BDS/Bladabindi.ajoqp        Gen:Variant.Barys.10219 W32.FipaletAAK.Trojan   null    Backdoor.Bladabindi.AL3 null    null        Backdoor.MSIL.Bladabindi.A  null    BackDoor.Bladabindi.1056        Gen:Variant.Barys.10219 (B) W32/MSIL_Bladabindi.G.gen!Eldorado  Gen:Variant.Barys.10219     Gen:Variant.Barys.10219     null    Trojan ( 700000121 )    Trojan ( 700000121 )    Trojan.MSIL.Disfa.bop   Win32.Troj.Undef.(kcloud)   Backdoor.Bladabindi.Gen BehavesLike.Win32.BackdoorNJRat.mm  BackDoor-NJRat!EC56BB70A034 Gen:Variant.Barys.10219 Backdoor:MSIL/Bladabindi.AJ Trojan.Win32.DownLoader11.cxfbrl    Bladabindi.JQ           Trojan.Agent/Gen-Bladabindi Troj/DotNet-P   Backdoor.Ratenjay   null    null    Win32/DotNetDl.A!generic    BKDR_BLBINDI.SMN    BKDR_BLBINDI.SMN        Backdoor.MSIL.Bladabindi.a (v)  null    Trojan.Disfa.Win32.10565        Trojan/W32.Agent.24064.TS
  1. Bladabindi
  2. Barsys
  3. NJRat
0 Karma

lguinn2
Legend

If these are actually field names and not values, you could do this

yoursearchhere
| fieldsummary 
| rename field as malware
| search malware!="date_*" AND malware!="source" AND malware!="host" AND 
     malware!="sourcetype" AND malware!="index" AND malware!="linecount" AND 
     malware!="splunk_server" AND malware!="timeendpos" AND malware!="timestartpos"
| fields malware count | sort -count

The search in the middle removes all the fields that you don't want - it might not be a complete list, but I included the typical default fields. If your list of fields is very long, you might use a lookup table instead of the search.

0 Karma

masonmorales
Influencer

Each name is in its own field? That doesn't seem desirable for your use case.

If it's just a list of names separated by white space, I think you should extract them all into one field called "name" first. Then, we can look at the list and help you rex the name field into the common names (Bladabindi, Barsys, etc.). After the data is normalized into common names, then we can tell Splunk to look for non-rare values.

What happens if you do something like: yoursearch | rex "(?\S+)\s+" max_match=0

0 Karma

david_rundle_fi
Explorer

Unfortunately, these come in JSON with individual field names:

scans.ALYac.result scans.AVG.result scans.AVware.result scans.Ad-Aware.result scans.AegisLab.result scans.Agnitum.result scans.AhnLab-V3.result scans.Alibaba.result scans.AntiVir.result scans.Antiy-AVL.result scans.Arcabit.result scans.Avast.result scans.Avira.result scans.Baidu-International.result scans.BitDefender.result scans.Bkav.result scans.ByteHero.result scans.CAT-QuickHeal.result scans.CMC.result scans.ClamAV.result scans.Commtouch.result scans.Comodo.result scans.Cyren.result scans.DrWeb.result scans.ESET-NOD32.result scans.Emsisoft.result scans.F-Prot.result scans.F-Secure.result scans.Fortinet.result scans.GData.result scans.Ikarus.result scans.Jiangmin.result scans.K7AntiVirus.result scans.K7GW.result scans.Kaspersky.result scans.Kingsoft.result scans.Malwarebytes.result scans.McAfee-GW-Edition.result scans.McAfee.result scans.MicroWorld-eScan.result scans.Microsoft.result scans.NANO-Antivirus.result scans.Norman.result scans.PCTools.result scans.Rising.result scans.SUPERAntiSpyware.result scans.Sophos.result scans.Symantec.result scans.Tencent.result scans.TheHacker.result scans.TotalDefense.result scans.TrendMicro-HouseCall.result scans.TrendMicro.result scans.VBA32.result scans.VIPRE.result scans.ViRobot.result scans.Zillya.result scans.Zoner.result scans.nProtect.result

0 Karma
Get Updates on the Splunk Community!

Monitoring Postgres with OpenTelemetry

Behind every business-critical application, you’ll find databases. These behind-the-scenes stores power ...

Mastering Synthetic Browser Testing: Pro Tips to Keep Your Web App Running Smoothly

To start, if you're new to synthetic monitoring, I recommend exploring this synthetic monitoring overview. In ...

Splunk Edge Processor | Popular Use Cases to Get Started with Edge Processor

Splunk Edge Processor offers more efficient, flexible data transformation – helping you reduce noise, control ...