Splunk Search

How to filter a field by common words

parkz
Explorer

I have a field of titles that are filled with sentences about why a test was failed in a security audit, but they are separated by each asset. So there can be two different assets with the same reason listed but in different words. For example, one might say "Login Password is empty" and another asset failure will say "Login password did not meet requirements". If I could aggregate them based on words like "password", I can get more value from the data. I can't hardcode it because I don't know all the possible aggregates.

Here is what I have so far, and I'm open to any feedback:

earliest=-1d@d latest=@d index=cdb_summary sourcetype=cfg_summary source=CDM_*_Daily_Summary
| search hva=*
| eval FailedSTIGs=mvsort(split(FailedSTIGs,","))
| stats values(fismaid) as fismaid dc(asset_id) as Affected by FailedSTIGs,hva
| lookup DHS_Expected_Checks "STIG ID" as FailedSTIGs output "Rule Title"
| fit TFIDF "Rule Title" as rule_tfidf ngram_range=1-12 max_df=0.6 min_df=0.2 stop_words=english | fit KMeans rule_tfidf* k=8 | fields cluster "Rule Title" | sample 6 by cluster | sort by cluster

Labels (1)
0 Karma

yuanliu
SplunkTrust
SplunkTrust

This is similar to the varied logs from different applications that share common business and technology domains, just more "freehand".  We tried to "encourage" standardization but that only went so far.  I still couldn't predict what the developers would throw at me.  I had to manually tune my aggregation strategies, and update from time to time.

Ideally, you'll have a natural language model to deal with them.  Failing that, you can use ML to do some clustering and start tuning from there.  In all cases, this is going to be dynamic.

Get Updates on the Splunk Community!

OpenTelemetry for Legacy Apps? Yes, You Can!

This article is a follow-up to my previous article posted on the OpenTelemetry Blog, "Your Critical Legacy App ...

UCC Framework: Discover Developer Toolkit for Building Technology Add-ons

The Next-Gen Toolkit for Splunk Technology Add-on Development The Universal Configuration Console (UCC) ...

.conf25 Community Recap

Hello Splunkers, And just like that, .conf25 is in the books! What an incredible few days — full of learning, ...