list of the URLs the contractors have access to which is the csv file. The firewall team wants to remove any URLs that aren't used in a period of time. Thus, I have to compare the firewall URLs t...
See more...
list of the URLs the contractors have access to which is the csv file. The firewall team wants to remove any URLs that aren't used in a period of time. Thus, I have to compare the firewall URLs to the csv So, the firewall team wants to update that CSV file so it will not contain entries that haven't had matching events for a given time period. Is this correct? This seems to be the opposite of what the Splunk search is doing. Some more points you need to clarify. What are field name(s) the index search and the lookup file use to indicate URLs? Based on your code snippet, I assume that they both use url. Does the CSV file contain additional fields? Based on your code snippet, I will assume none. Is there some significance of trailing slash (/)? Do all url values end with one trailing slash? This may not be relevant, but some SPL manipulations may ruin your convention. So, I'd like to be cautious. A more important question is the use of asterisk (*). Are the last two domains (root and second level) the only parts of interest? Given all the illustrations, I have to assume yes. In other words, no differentiation is needed between *.microsoft.com/ and microsoft.com/. Additionally, I will assume that every url in the CSV needs to be paired with a wildcard entry. Using the above assumptions, the following can show you second level domains that have not been used. index=my_index sourcetype=my_sourcetype (rule=policy_1 OR rule=policy_2 OR rule=policy_3)
| eval url = mvjoin(mvindex(split(url, "."), -2,-1), ".")
| dedup url
| inputlookup append=true my_list_of_urls.csv
| fillnull sourcetype value=CSV
| stats values(sourcetype) as sourcetype by url
| where mvcount(sourcetype) == 1 AND sourcetype == "CSV"
| eval url = mvappend(url, "*." . url)
| mvexpand url The output contains a list of second level domains affixed with a trailing slash, and these same strings prefixed with "*.". These would be the ones to be removed. If you have lots of events with URLs that have no match in the CSV, you can also use the subsearch as a filter to improve efficiency. Like index=my_index sourcetype=my_sourcetype (rule=policy_1 OR rule=policy_2 OR rule=policy_3)
[ | inputlookup my_list_of_urls.csv ]
| eval url = mvjoin(mvindex(split(url, "."), -2,-1), ".")
| dedup url
| inputlookup append=true my_list_of_urls.csv
| fillnull sourcetype value=CSV
| stats values(sourcetype) as sourcetype by url
| where mvcount(sourcetype) == 1 AND sourcetype == "CSV"
| eval url = mvappend(url, "*." . url)
| mvexpand url Hope this helps.