Splunk Search

Add stopwords to tfidf command in splunk

parkz
Explorer

I have the following search:

 

earliest=-1d@d latest=@d index=cdb_summary sourcetype=cfg_summary source=CDM_*_Daily_Summary
| search hva=*
| eval FailedSTIGs=mvsort(split(FailedSTIGs,","))
| stats values(fismaid) as fismaid dc(asset_id) as Affected by FailedSTIGs,hva
| lookup DHS_Expected_Checks "STIG ID" as FailedSTIGs output "Rule Title"
| fit TFIDF "Rule Title" as rule_tfidf ngram_range=1-12 max_df=0.8 min_df=0.2 stop_words=english
| fit KMeans rule_tfidf* k=8
|stats values(FailedSTIGs), values("Rule Title") by cluster

 

 

How can I add stop words to the stop_words argument? In python I would write the following:

 

from sklearn.feature_extraction import text 

stop_words = text.ENGLISH_STOP_WORDS.union(my_additional_stop_words)

 

Obviously I can't use python, but I am not familiar enough with Splunk searches to know if it's possible to modify the english keyword in a similar way so that it takes in additional words like "Windows"

Labels (1)
0 Karma
Get Updates on the Splunk Community!

Splunk App for Anomaly Detection End of Life Announcment

Q: What is happening to the Splunk App for Anomaly Detection?A: Splunk is officially announcing the ...

Aligning Observability Costs with Business Value: Practical Strategies

 Join us for an engaging Tech Talk on Aligning Observability Costs with Business Value: Practical ...

Mastering Data Pipelines: Unlocking Value with Splunk

 In today's AI-driven world, organizations must balance the challenges of managing the explosion of data with ...