Dashboards & Visualizations

Generating .csv-Lookups from XML-File

lrudolph
Path Finder

Hi,

I'd like to use Splunk itself to to generate a .csv-lookupfile to enrich one of my data sources. The data for this lookup comes from an XML-file that is undergoing several changes per week.

Now I want to automate this as good as possible and thought about the following solution:

a) Configure Splunk to monitor a folder in which I'd copy the new XML-file once it has undergone one or more changes. The sourcetype for that file is configured to see every line of that XML-file as a new event.

b) Use a schedules saved search (I already finished the searchstring) to filter down and process the events from the XML-file and generate a new lookupfile (overwrite the old one) with the | outputlookup-command.

My problem is that I somehow have to differentiate in the scheduled search to only use the newest data from the XML-file. Let's say I'd run a scheduled search every day for the last 24 hours and there was no new XML-file uploaded - outputlook would overwrite the .csv-file with an empty one. And in case I'd upload two new XML files per day, I'd end up with an .csv-file having double entries.

Someone has an idea how to solve this problem?

Regards,

Leo

Tags (2)
0 Karma
1 Solution

gkanapathy
Splunk Employee
Splunk Employee

I happened to work this out for someone else yesterday. What you really want is "lookup file contains the results of the last non-empty job run, right? So basically, you include in the lookup the time of the run (or of the data set or whatever). In mine below, I just take it as the job run time:

... | eval jobtime=now() | inputlookup append=true mylookup | sort 0 - jobtime | streamstats max(jobtime) as latestjobtime | head (jobtime==latestjobtime) null=t | fields - latestjobtime | outputlookup mylookup

Now the trick is that the part of the search before eval (the ...) needs to either return the last valid set, or else no results at all. So if you have a daily upload, and you run your base search from -1d@d to @d each day, that should work fine.

View solution in original post

0 Karma

gkanapathy
Splunk Employee
Splunk Employee

I happened to work this out for someone else yesterday. What you really want is "lookup file contains the results of the last non-empty job run, right? So basically, you include in the lookup the time of the run (or of the data set or whatever). In mine below, I just take it as the job run time:

... | eval jobtime=now() | inputlookup append=true mylookup | sort 0 - jobtime | streamstats max(jobtime) as latestjobtime | head (jobtime==latestjobtime) null=t | fields - latestjobtime | outputlookup mylookup

Now the trick is that the part of the search before eval (the ...) needs to either return the last valid set, or else no results at all. So if you have a daily upload, and you run your base search from -1d@d to @d each day, that should work fine.

0 Karma

lrudolph
Path Finder

Ah! Thanks for this comment. I think I know solved it:

sourcetype=xml_input "Service" | eval hostname=substr(name, 1, len(name)-25) | rex field=hostname "^(?[^-]+)" | table id hostname city | rename id AS AreaId | eval jobtime=now() | inputlookup append=true id_lookup.csv | sort - jobtime | streamstats max(jobtime) as latestjobtime | head (jobtime==latestjobtime) | fields - latestjobtime | outputlookup id_lookup.csv

Thanks for your help!

0 Karma

gkanapathy
Splunk Employee
Splunk Employee

Looks to me like you left out the part about inputlookup. Also, would recommend you do your table, rename, and other field manipulations before adding eval jobtime, etc.

0 Karma

lrudolph
Path Finder

Hm, sounds logical but doesn't work quite yet. I modified my searchstring so it now looks like this:

sourcetype=xml_input "Service" | eval hostname=substr(name, 1, len(name)-25) | rex field=hostname "^(?[^-]+)" | eval jobtime=now() | sort - jobtime | streamstats max(jobtime) as latestjobtime | head (jobtime==latestjobtime) | fields - latestjobtime | table id hostname city | rename id AS AreaId | outputlookup id_lookup.csv

If I run it over a time period in which I have not uploaded new data, the lookupfile "id_lookup.csv" gets overwritten with an empy file. I'd like to avoid this.

0 Karma
Get Updates on the Splunk Community!

Harnessing Splunk’s Federated Search for Amazon S3

Managing your data effectively often means balancing performance, costs, and compliance. Splunk’s Federated ...

Infographic provides the TL;DR for the 2024 Splunk Career Impact Report

We’ve been buzzing with excitement about the recent validation of Splunk Education! The 2024 Splunk Career ...

Enterprise Security Content Update (ESCU) | New Releases

In December, the Splunk Threat Research Team had 1 release of new security content via the Enterprise Security ...