I have the following search:
earliest=-1d@d latest=@d index=cdb_summary sourcetype=cfg_summary source=CDM_*_Daily_Summary
| search hva=*
| eval FailedSTIGs=mvsort(split(FailedSTIGs,","))
| stats values(fismaid) as fismaid dc(asset_id) as Affected by FailedSTIGs,hva
| lookup DHS_Expected_Checks "STIG ID" as FailedSTIGs output "Rule Title"
| fit TFIDF "Rule Title" as rule_tfidf ngram_range=1-12 max_df=0.8 min_df=0.2 stop_words=english
| fit KMeans rule_tfidf* k=8
|stats values(FailedSTIGs), values("Rule Title") by cluster
How can I add stop words to the stop_words argument? In python I would write the following:
from sklearn.feature_extraction import text
stop_words = text.ENGLISH_STOP_WORDS.union(my_additional_stop_words)
Obviously I can't use python, but I am not familiar enough with Splunk searches to know if it's possible to modify the english keyword in a similar way so that it takes in additional words like "Windows"