Based on some advice from the cloud team I did this:
Getting auto-generated lookup into SplunkCloud.
There isn’t a simple interface for getting lookups (like assets and identities lists) that are periodically updated by a script or some other means onto a Splunk Cloud Search Head. Here is one option:
HF - A heavy forwarder (could also be a UF of course but it will complicate your life, see Notes) that the customer owns/runs where they can scp/ftp/copy files to.
SH - the search head in the cloud
IDX - some arbitrary index name you create to hold lookup data
DS - The deployment server the customer uses to manage the HF
In this example I am grabbing two kinds of lookup files (myorg_assets.csv and myorg_identities.csv), but you could add any number of lookups this way.
Step 0 (via ssh or opening a ticket with Splunk):
- Make sure an index is created on the indexers called . If you have ssh access you can do this on the cluster master, otherwise open a ticket with Splunk Support. The index should probably have a short frozenTimePeriodInSecs value so that data doesn’t live for too long in the index.
Step 1 (on the HF):
- Create a drop site directory that the customer will use to drop files into. They can drop them in by whatever means. They should drop in files with no header to match the .conf settings below.
Step2 (on the DS):
- Create an app called myorg_all_es_inputs with the following content:
[monitor://<path to the drop directory/myorg_assets.csv] index= <IDX> sourcetype= es_assets [monitor://<path to the drop directory/myorg_identities.csv] index= <IDX> sourcetype= es_identities
[es_identities] SHOULD_LINEMERGE=false REPORT-es_identities_fields = es_identities_fields [es_assets] SHOULD_LINEMERGE=false REPORT-es_assets_fields = es_assets_fields
[es_identities_fields] DELIMS = "," FIELDS="identity","prefix","nick","first","last","suffix","email","phone","phone2","managedBy","priority","bunit","category","watchlist","startDate","endDate" [es_assets_fields] DELIMS = "," FIELDS=“ip”,”mac”,”nt_host”,”dns”,”owner”,”priority”,”lat”,”long”,”city”,”country”,”bunt",”category”,”pci_domain”,”is_expected”,”should_timesync”,”should_update”,”requires_av"
Step 3 (On the SH):
- Upload the myorg_all_es_inputs to your search head as well. (this will be needed for the delimited fields to be extracted form the data.
- You should now be able to search the index with a search like index=“IDX”.
For each lookup file being pushed to Splunk, do the following:
- Create the lookup definition as you normally would under settings->Lookups
- Created a scheduled report that looks something like this: (this example is for the identities lookup in Enterprise Security)
index=es_transfers sourcetype=es_identities | join _time type=inner [search index=es_transfers sourcetype=es_identities | head 1] | table
"watchlist","startDate","endDate" | outputlookup bws_identities
(the report should run at some time interval where it will be run a little time after the csv file is dropped, for example a midnight drop might meant schedule the report for 1am)
- Make sure you set your index up so that it doesn’t keep data around long.
- If you are concerned about the .csv files hanging around on the HF, you could change the monitor to be a batch reader with a move_policy=sinkhole to delete them after reading.
- A Heavy Forwarder is much simpler because you control the index time extractions there (i.e. You don’t need to be concerned with the indexers except for creating the index). If you must use a UF, bear in mind that you’ll need to push that myorg_all_es_inputs app to the indexers (by opening a support ticket) as well so that they have the LINGE_MERGE = false line for the sourcetypes you add.
- Size your index so that it can hold at least a couple runs of each sourcetype. i.e. If you have 200K assets and 300K employees make sure the index is big enough to hold a few days worth of that kind of data volume.