Splunk Search

How to filter a field by common words

parkz
Explorer

I have a field of titles that are filled with sentences about why a test was failed in a security audit, but they are separated by each asset. So there can be two different assets with the same reason listed but in different words. For example, one might say "Login Password is empty" and another asset failure will say "Login password did not meet requirements". If I could aggregate them based on words like "password", I can get more value from the data. I can't hardcode it because I don't know all the possible aggregates.

Here is what I have so far, and I'm open to any feedback:

earliest=-1d@d latest=@d index=cdb_summary sourcetype=cfg_summary source=CDM_*_Daily_Summary
| search hva=*
| eval FailedSTIGs=mvsort(split(FailedSTIGs,","))
| stats values(fismaid) as fismaid dc(asset_id) as Affected by FailedSTIGs,hva
| lookup DHS_Expected_Checks "STIG ID" as FailedSTIGs output "Rule Title"
| fit TFIDF "Rule Title" as rule_tfidf ngram_range=1-12 max_df=0.6 min_df=0.2 stop_words=english | fit KMeans rule_tfidf* k=8 | fields cluster "Rule Title" | sample 6 by cluster | sort by cluster

Labels (1)
0 Karma

yuanliu
SplunkTrust
SplunkTrust

This is similar to the varied logs from different applications that share common business and technology domains, just more "freehand".  We tried to "encourage" standardization but that only went so far.  I still couldn't predict what the developers would throw at me.  I had to manually tune my aggregation strategies, and update from time to time.

Ideally, you'll have a natural language model to deal with them.  Failing that, you can use ML to do some clustering and start tuning from there.  In all cases, this is going to be dynamic.

Get Updates on the Splunk Community!

What the End of Support for Splunk Add-on Builder Means for You

Hello Splunk Community! We want to share an important update regarding the future of the Splunk Add-on Builder ...

Solve, Learn, Repeat: New Puzzle Channel Now Live

Welcome to the Splunk Puzzle PlaygroundIf you are anything like me, you love to solve problems, and what ...

Building Reliable Asset and Identity Frameworks in Splunk ES

 Accurate asset and identity resolution is the backbone of security operations. Without it, alerts are ...