I have a list of malware vendors and associated malware names, each in its own field from spath JSON output. Is there a way to parse and extract the most common names from the list?
for example:
Gen:Variant.Barys.10219 Gen:Variant.Barys.10219 null Trojan.Disfa!6vVRDVDpUts Win-Trojan/Zbot.24064 Trojan/MSIL.Disfa MSIL:GenMalicious-AV [Trj] BDS/Bladabindi.ajoqp Gen:Variant.Barys.10219 W32.FipaletAAK.Trojan null Backdoor.Bladabindi.AL3 null null Backdoor.MSIL.Bladabindi.A null BackDoor.Bladabindi.1056 Gen:Variant.Barys.10219 (B) W32/MSIL_Bladabindi.G.gen!Eldorado Gen:Variant.Barys.10219 Gen:Variant.Barys.10219 null Trojan ( 700000121 ) Trojan ( 700000121 ) Trojan.MSIL.Disfa.bop Win32.Troj.Undef.(kcloud) Backdoor.Bladabindi.Gen BehavesLike.Win32.BackdoorNJRat.mm BackDoor-NJRat!EC56BB70A034 Gen:Variant.Barys.10219 Backdoor:MSIL/Bladabindi.AJ Trojan.Win32.DownLoader11.cxfbrl Bladabindi.JQ Trojan.Agent/Gen-Bladabindi Troj/DotNet-P Backdoor.Ratenjay null null Win32/DotNetDl.A!generic BKDR_BLBINDI.SMN BKDR_BLBINDI.SMN Backdoor.MSIL.Bladabindi.a (v) null Trojan.Disfa.Win32.10565 Trojan/W32.Agent.24064.TS
If these are actually field names and not values, you could do this
yoursearchhere
| fieldsummary
| rename field as malware
| search malware!="date_*" AND malware!="source" AND malware!="host" AND
malware!="sourcetype" AND malware!="index" AND malware!="linecount" AND
malware!="splunk_server" AND malware!="timeendpos" AND malware!="timestartpos"
| fields malware count | sort -count
The search
in the middle removes all the fields that you don't want - it might not be a complete list, but I included the typical default fields. If your list of fields is very long, you might use a lookup table instead of the search.
Each name is in its own field? That doesn't seem desirable for your use case.
If it's just a list of names separated by white space, I think you should extract them all into one field called "name" first. Then, we can look at the list and help you rex the name field into the common names (Bladabindi, Barsys, etc.). After the data is normalized into common names, then we can tell Splunk to look for non-rare values.
What happens if you do something like: yoursearch | rex "(?\S+)\s+" max_match=0
Unfortunately, these come in JSON with individual field names:
scans.ALYac.result scans.AVG.result scans.AVware.result scans.Ad-Aware.result scans.AegisLab.result scans.Agnitum.result scans.AhnLab-V3.result scans.Alibaba.result scans.AntiVir.result scans.Antiy-AVL.result scans.Arcabit.result scans.Avast.result scans.Avira.result scans.Baidu-International.result scans.BitDefender.result scans.Bkav.result scans.ByteHero.result scans.CAT-QuickHeal.result scans.CMC.result scans.ClamAV.result scans.Commtouch.result scans.Comodo.result scans.Cyren.result scans.DrWeb.result scans.ESET-NOD32.result scans.Emsisoft.result scans.F-Prot.result scans.F-Secure.result scans.Fortinet.result scans.GData.result scans.Ikarus.result scans.Jiangmin.result scans.K7AntiVirus.result scans.K7GW.result scans.Kaspersky.result scans.Kingsoft.result scans.Malwarebytes.result scans.McAfee-GW-Edition.result scans.McAfee.result scans.MicroWorld-eScan.result scans.Microsoft.result scans.NANO-Antivirus.result scans.Norman.result scans.PCTools.result scans.Rising.result scans.SUPERAntiSpyware.result scans.Sophos.result scans.Symantec.result scans.Tencent.result scans.TheHacker.result scans.TotalDefense.result scans.TrendMicro-HouseCall.result scans.TrendMicro.result scans.VBA32.result scans.VIPRE.result scans.ViRobot.result scans.Zillya.result scans.Zoner.result scans.nProtect.result