Splunk Search

Add stopwords to tfidf command in splunk

parkz
Explorer

I have the following search:

 

earliest=-1d@d latest=@d index=cdb_summary sourcetype=cfg_summary source=CDM_*_Daily_Summary
| search hva=*
| eval FailedSTIGs=mvsort(split(FailedSTIGs,","))
| stats values(fismaid) as fismaid dc(asset_id) as Affected by FailedSTIGs,hva
| lookup DHS_Expected_Checks "STIG ID" as FailedSTIGs output "Rule Title"
| fit TFIDF "Rule Title" as rule_tfidf ngram_range=1-12 max_df=0.8 min_df=0.2 stop_words=english
| fit KMeans rule_tfidf* k=8
|stats values(FailedSTIGs), values("Rule Title") by cluster

 

 

How can I add stop words to the stop_words argument? In python I would write the following:

 

from sklearn.feature_extraction import text 

stop_words = text.ENGLISH_STOP_WORDS.union(my_additional_stop_words)

 

Obviously I can't use python, but I am not familiar enough with Splunk searches to know if it's possible to modify the english keyword in a similar way so that it takes in additional words like "Windows"

Labels (1)
0 Karma
Get Updates on the Splunk Community!

Index This | Why did the turkey cross the road?

November 2025 Edition  Hayyy Splunk Education Enthusiasts and the Eternally Curious!   We’re back with this ...

Enter the Agentic Era with Splunk AI Assistant for SPL 1.4

  🚀 Your data just got a serious AI upgrade — are you ready? Say hello to the Agentic Era with the ...

Feel the Splunk Love: Real Stories from Real Customers

Hello Splunk Community,    What’s the best part of hearing how our customers use Splunk? Easy: the positive ...