Dashboards & Visualizations

Generating .csv-Lookups from XML-File

lrudolph
Path Finder

Hi,

I'd like to use Splunk itself to to generate a .csv-lookupfile to enrich one of my data sources. The data for this lookup comes from an XML-file that is undergoing several changes per week.

Now I want to automate this as good as possible and thought about the following solution:

a) Configure Splunk to monitor a folder in which I'd copy the new XML-file once it has undergone one or more changes. The sourcetype for that file is configured to see every line of that XML-file as a new event.

b) Use a schedules saved search (I already finished the searchstring) to filter down and process the events from the XML-file and generate a new lookupfile (overwrite the old one) with the | outputlookup-command.

My problem is that I somehow have to differentiate in the scheduled search to only use the newest data from the XML-file. Let's say I'd run a scheduled search every day for the last 24 hours and there was no new XML-file uploaded - outputlook would overwrite the .csv-file with an empty one. And in case I'd upload two new XML files per day, I'd end up with an .csv-file having double entries.

Someone has an idea how to solve this problem?

Regards,

Leo

Tags (2)
0 Karma
1 Solution

gkanapathy
Splunk Employee
Splunk Employee

I happened to work this out for someone else yesterday. What you really want is "lookup file contains the results of the last non-empty job run, right? So basically, you include in the lookup the time of the run (or of the data set or whatever). In mine below, I just take it as the job run time:

... | eval jobtime=now() | inputlookup append=true mylookup | sort 0 - jobtime | streamstats max(jobtime) as latestjobtime | head (jobtime==latestjobtime) null=t | fields - latestjobtime | outputlookup mylookup

Now the trick is that the part of the search before eval (the ...) needs to either return the last valid set, or else no results at all. So if you have a daily upload, and you run your base search from -1d@d to @d each day, that should work fine.

View solution in original post

0 Karma

gkanapathy
Splunk Employee
Splunk Employee

I happened to work this out for someone else yesterday. What you really want is "lookup file contains the results of the last non-empty job run, right? So basically, you include in the lookup the time of the run (or of the data set or whatever). In mine below, I just take it as the job run time:

... | eval jobtime=now() | inputlookup append=true mylookup | sort 0 - jobtime | streamstats max(jobtime) as latestjobtime | head (jobtime==latestjobtime) null=t | fields - latestjobtime | outputlookup mylookup

Now the trick is that the part of the search before eval (the ...) needs to either return the last valid set, or else no results at all. So if you have a daily upload, and you run your base search from -1d@d to @d each day, that should work fine.

0 Karma

lrudolph
Path Finder

Ah! Thanks for this comment. I think I know solved it:

sourcetype=xml_input "Service" | eval hostname=substr(name, 1, len(name)-25) | rex field=hostname "^(?[^-]+)" | table id hostname city | rename id AS AreaId | eval jobtime=now() | inputlookup append=true id_lookup.csv | sort - jobtime | streamstats max(jobtime) as latestjobtime | head (jobtime==latestjobtime) | fields - latestjobtime | outputlookup id_lookup.csv

Thanks for your help!

0 Karma

gkanapathy
Splunk Employee
Splunk Employee

Looks to me like you left out the part about inputlookup. Also, would recommend you do your table, rename, and other field manipulations before adding eval jobtime, etc.

0 Karma

lrudolph
Path Finder

Hm, sounds logical but doesn't work quite yet. I modified my searchstring so it now looks like this:

sourcetype=xml_input "Service" | eval hostname=substr(name, 1, len(name)-25) | rex field=hostname "^(?[^-]+)" | eval jobtime=now() | sort - jobtime | streamstats max(jobtime) as latestjobtime | head (jobtime==latestjobtime) | fields - latestjobtime | table id hostname city | rename id AS AreaId | outputlookup id_lookup.csv

If I run it over a time period in which I have not uploaded new data, the lookupfile "id_lookup.csv" gets overwritten with an empy file. I'd like to avoid this.

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Observability Simplified: Combining User Experience, Application Performance & ...

Tech Talk Observability Simplified: Combining User Experience, Application Performance & Network ...

Event Series May & June: From Network Visibility to Service Intelligence

Unifying the Network: Moving from Alert Noise to Service Intelligence with Splunk ITSI In today’s hybrid ...

Global Splunk User Group Events: May + June 2026

Your Splunk Community Awaits: Discover Upcoming User Group Events Worldwide    Staying ahead in the fast-paced ...