Solved: How to create a job to update CSV file frequently ...

LearningGuy · ‎08-23-2023

Hello,
How do I create a job to update CSV file frequently using DBXquery?
I have a company.csv file that was created by running a DBXquery search and uploaded into Splunk lookups. Whenever there is an update on the data, I need to manually run DBXquery search, download the CSV file, delete the existing CSV file, then re-upload it to Splunk lookups. In order to automate the process, I am thinking to use a job that executes DBXquery and outputlookup. Please suggest
Thank you All for your help.

Query:
| dbxquery query="select * from tableCompany"

Table 1: company.csv

ip_address	company	location
192.168.1.1	Comp-A	Loc-A
192.168.1.2	Comp-B	Loc-B
192.168.1.5	Comp-E	Loc-E

bowesmana · ‎08-23-2023

You've already written the SQL

| dbxquery query="select * from tableCompany"
| table ip_address company location
| outputlookup your_csv_file.csv

and then just 'Save As' a Report and then in the Schedule window select the sort of frequency you want, e.g. for a cron schedule to run just after 11pm every day make the schedule

07 23 * * *

for 7 minutes past 11pm

https://docs.splunk.com/Documentation/Splunk/9.1.0/Report/Configurethepriorityofscheduledreports

and then it will run daily and create your lookup.

As for running it from the rest api - see

https://docs.splunk.com/Documentation/Splunk/9.1.0/RESTREF/RESTsearch#saved.2Fsearches.2F.7Bname.7D....

and

https://community.splunk.com/t5/forums/searchpage/tab/message?advanced=false&allow_punctuation=false...

which has a bunch of questions about running saved searches.

View solution in original post

LearningGuy · ‎08-24-2023

Hello,
I have accepted your suggestion no 1 as a solution. Thank you for your assistance.
For your suggestion no 2, you said "Get a trigger on the SQL side to run a script that executes a REST API search" I read some of the documentation you sent about REST API Splunk
1) What do you mean on the SQL side?
My understanding is we can call REST API https://splunkserverIP:8089/search (not sure the exact URL) that contains DBX query search that output the result to a CSV file

2) How do I see what endpoint is available to access?
I tried to access https://SplunkIP:8089/servicesNS/admin/search/saved/searches,
but it wasn't reachable, I am assuming I am not the admin

3) The data from DBXquery was obtained from another source. We provided JSON file to Splunk admin to import into Splunk DB. The JSON file was generated by python script pulling the data via API.
Is it possible If I create python script to parse JSON file to generate CSV file and have a job running to pull CSV file? If yes, which one is better solution compared to generating CSV from Splunk search?
Thank you!!

bowesmana · ‎08-24-2023

So, for "on the SQL side", I am not a SQL DB expert, but I assume it's possible to detect changes made to a table, so if that occurs, use some database "magic" to run a process (I don't know how that is done) that calls the rest API.

As for the endpoints for search, they are all listed here

https://docs.splunk.com/Documentation/Splunk/9.1.0/RESTREF/RESTsearch

You can run _some_ of the rest commands in the Splunk UI, e.g.

| rest /servicesNS/-/-/saved/searches
| table title

will give you a list of all the saved searches

but when running the rest api via http request you need to authenticate, so you would call

https://SplunkIP:8089/services/auth/login

https://docs.splunk.com/Documentation/Splunk/9.1.0/RESTREF/RESTaccess#auth.2Flogin

with username and password arguments and you will get back a sessionKey.

You can then use that sessionKey value as an Authorization header.

See this documentation

https://docs.splunk.com/Documentation/Splunk/9.1.0/RESTUM/RESTusing#Authentication_and_authorization

It is easy enough to manipulate JSON into another format, so if you have a search that is getting the data, then ask another question about how you can convert that to a format suitable to export a CSV

bowesmana · ‎08-23-2023

There are a couple of ways to get the updates

1. Make a saved search based on your dbquery that runs on a cron schedule (hourly/daily/weekly) and does your search + outputlookup - note that as this is run on a schedule, it will not guarantee a hit of the lookup if the Splunk vuln events contain a new IP that is not in the lookup and potentially if you are using DHCP and IP addresses change, it may actually give the wrong company data, but that's an issue in general.

2. Get a trigger on the SQL side to run a script that executes a REST API search on Splunk to run that search, so it will be updated immediately - the only issue with this is that it may end up triggering so often, it just runs all the time.

I notice all the IPs are 192.168 local addresses - do you have a company policy that allocates IPs based on subnets? Is there any way that if the lookup cannot find the address it makes calculations on the subnet to determine a 'default' company.

Note that any time you're using replicated data, there's potential for stale results - you will have the DB copy, the local search head copy and the copy replicated to the indexer. Depending on how often your data changes, you may need to factor this in.

LearningGuy · ‎08-23-2023

Hello,
Can you provide an example or a website that shows an example for your suggestion no 1 and 2?

Thank you for your help.

bowesmana · ‎08-23-2023