Splunk Search

How to filter a field by common words

parkz
Explorer

I have a field of titles that are filled with sentences about why a test was failed in a security audit, but they are separated by each asset. So there can be two different assets with the same reason listed but in different words. For example, one might say "Login Password is empty" and another asset failure will say "Login password did not meet requirements". If I could aggregate them based on words like "password", I can get more value from the data. I can't hardcode it because I don't know all the possible aggregates.

Here is what I have so far, and I'm open to any feedback:

earliest=-1d@d latest=@d index=cdb_summary sourcetype=cfg_summary source=CDM_*_Daily_Summary
| search hva=*
| eval FailedSTIGs=mvsort(split(FailedSTIGs,","))
| stats values(fismaid) as fismaid dc(asset_id) as Affected by FailedSTIGs,hva
| lookup DHS_Expected_Checks "STIG ID" as FailedSTIGs output "Rule Title"
| fit TFIDF "Rule Title" as rule_tfidf ngram_range=1-12 max_df=0.6 min_df=0.2 stop_words=english | fit KMeans rule_tfidf* k=8 | fields cluster "Rule Title" | sample 6 by cluster | sort by cluster

Labels (1)
0 Karma

yuanliu
SplunkTrust
SplunkTrust

This is similar to the varied logs from different applications that share common business and technology domains, just more "freehand".  We tried to "encourage" standardization but that only went so far.  I still couldn't predict what the developers would throw at me.  I had to manually tune my aggregation strategies, and update from time to time.

Ideally, you'll have a natural language model to deal with them.  Failing that, you can use ML to do some clustering and start tuning from there.  In all cases, this is going to be dynamic.

Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

[Puzzles] Solve, Learn, Repeat: Character substitutions with Regular Expressions

This challenge was first posted on Slack #puzzles channelFor BORE at .conf23, we had a puzzle question which ...

Splunk Community Badges!

  Hey everyone! Ready to earn some serious bragging rights in the community? Along with our existing badges ...

[Puzzles] Solve, Learn, Repeat: Matching cron expressions

This puzzle (first published here) is based on matching timestamps to cron expressions.All the timestamps ...