So, I was requested to basically grab data from some public IP spam/malicious host/etc lists.
I've generated a shell script to download files, run sed and format the outputs into 3 separate CSV files and then delete the original downloads.
Obviously since they are lookup files these need to run on the search heads, however, I would think replication of lookups/files etc between the search heads would cause some problems.
What I'd like to do is only run this on one search head to create lookup table files, and have them replicated to the other two. Is that possible, or do I have to somehow disable this app from being replicated and have all 3 search heads run the script?
Solved the issue here:
create the new lookup file, and drop it into /opt/splunk/var/run/splunk/lookup_tmp/
on linux chown the file
Then run the web api to delete the current instance, upload the new one, and move it to all permissions
curl -k -u admin:changeme --request DELETE https://server:8089/servicesNS/admin/app/data/lookup-table-files/file.csv
curl -k -u admin:changeme https://server:8089/servicesNS/admin/app/data/lookup-table-files -d eai:data=/opt/splunk/var/run/splunk/lookup_tmp/file.csv -d name=file.csv
curl -k -u admin:changeme https://server:8089/servicesNS/admin/app/data/lookup-table-files/acl -d owner=admin -d sharing=global -d perms.read=*
Solved the issue here:
create the new lookup file, and drop it into /opt/splunk/var/run/splunk/lookup_tmp/
on linux chown the file
Then run the web api to delete the current instance, upload the new one, and move it to all permissions
curl -k -u admin:changeme --request DELETE https://server:8089/servicesNS/admin/app/data/lookup-table-files/file.csv
curl -k -u admin:changeme https://server:8089/servicesNS/admin/app/data/lookup-table-files -d eai:data=/opt/splunk/var/run/splunk/lookup_tmp/file.csv -d name=file.csv
curl -k -u admin:changeme https://server:8089/servicesNS/admin/app/data/lookup-table-files/acl -d owner=admin -d sharing=global -d perms.read=*
Another idea you could do is
- write your script/app in your Forwarders
- Index them to your indexers with field or time with today's value
- Run a search on indexed field and output it as csv. This way you don't have worry about deletion and also can keep track of your lookup file. You can configure your index to have retention of 30days etc..
yeah my original plan was to make this a scripted app on the heavy forwarder, but the problem I ran into it, is that the data stays regardless if the open source block lists update to remove an IP.
in other words, even if you set the retention to 30 days, if a particular IP is set for blocking say CNN gets infected and one of their public web IPs is now a known attacker IP. My Incident response team is going to be doing lookups on our logs against the known threat list. If this IP leaves the black list within 24 hours as the cnn was able to fix the issue, then this IP will be stuck in the black list index for 30 days.
I'd rather have a more real time list, grabbing this data once every hour or two.
A very interesting problem. I have a similar need and haven't decided on a solution. Just thinking a bit about it leads me to maybe try something like this:
SH1 - build the lookup (this is the "master" lookup)
Index the lookup into a 'transportation' index (mine is called test):
| inputlookup test | collect index=test
SH2 - populate the lookup:
index=test | outputlookup test
Anyway, I'll poke around a bit with this and repost if I come up with my own solution.
The three SH, are they on Cluster/pool? What problem do you think the replication will cause?
3 search heads yes. I would expect file locking issues and/or replication causing unexpected results if two or more search heads run the script at the same time or staggered (for example, one search head is finishing the script when another starts it, and incomplete results are copied over on top of others.
How are you running the script? If you're search heads are in cluster (SHC), setup the script to run as alert action of a scheduled search (any dummy search which will return result), so that it'll run only on one SH for a schedule. Let SHC replicate the lookup for you to other SH.