- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I'd like to use Splunk itself to to generate a .csv-lookupfile to enrich one of my data sources. The data for this lookup comes from an XML-file that is undergoing several changes per week.
Now I want to automate this as good as possible and thought about the following solution:
a) Configure Splunk to monitor a folder in which I'd copy the new XML-file once it has undergone one or more changes. The sourcetype for that file is configured to see every line of that XML-file as a new event.
b) Use a schedules saved search (I already finished the searchstring) to filter down and process the events from the XML-file and generate a new lookupfile (overwrite the old one) with the | outputlookup-command.
My problem is that I somehow have to differentiate in the scheduled search to only use the newest data from the XML-file. Let's say I'd run a scheduled search every day for the last 24 hours and there was no new XML-file uploaded - outputlook would overwrite the .csv-file with an empty one. And in case I'd upload two new XML files per day, I'd end up with an .csv-file having double entries.
Someone has an idea how to solve this problem?
Regards,
Leo
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
![Splunk Employee Splunk Employee](/html/@F88B7774A2BF2E9108D79A067A92A581/rank_icons/employee-16.png)
I happened to work this out for someone else yesterday. What you really want is "lookup file contains the results of the last non-empty job run, right? So basically, you include in the lookup the time of the run (or of the data set or whatever). In mine below, I just take it as the job run time:
... | eval jobtime=now() | inputlookup append=true mylookup | sort 0 - jobtime | streamstats max(jobtime) as latestjobtime | head (jobtime==latestjobtime) null=t | fields - latestjobtime | outputlookup mylookup
Now the trick is that the part of the search before eval
(the ...
) needs to either return the last valid set, or else no results at all. So if you have a daily upload, and you run your base search from -1d@d to @d each day, that should work fine.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
![Splunk Employee Splunk Employee](/html/@F88B7774A2BF2E9108D79A067A92A581/rank_icons/employee-16.png)
I happened to work this out for someone else yesterday. What you really want is "lookup file contains the results of the last non-empty job run, right? So basically, you include in the lookup the time of the run (or of the data set or whatever). In mine below, I just take it as the job run time:
... | eval jobtime=now() | inputlookup append=true mylookup | sort 0 - jobtime | streamstats max(jobtime) as latestjobtime | head (jobtime==latestjobtime) null=t | fields - latestjobtime | outputlookup mylookup
Now the trick is that the part of the search before eval
(the ...
) needs to either return the last valid set, or else no results at all. So if you have a daily upload, and you run your base search from -1d@d to @d each day, that should work fine.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ah! Thanks for this comment. I think I know solved it:
sourcetype=xml_input "Service" | eval hostname=substr(name, 1, len(name)-25) | rex field=hostname "^(?
Thanks for your help!
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
![Splunk Employee Splunk Employee](/html/@F88B7774A2BF2E9108D79A067A92A581/rank_icons/employee-16.png)
Looks to me like you left out the part about inputlookup. Also, would recommend you do your table, rename, and other field manipulations before adding eval jobtime, etc.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hm, sounds logical but doesn't work quite yet. I modified my searchstring so it now looks like this:
sourcetype=xml_input "Service" | eval hostname=substr(name, 1, len(name)-25) | rex field=hostname "^(?
If I run it over a time period in which I have not uploaded new data, the lookupfile "id_lookup.csv" gets overwritten with an empy file. I'd like to avoid this.
![](/skins/images/89D5ADE867CBAF0B5A525B7E23D83D7E/responsive_peak/images/icon_anonymous_message.png)