I have sanitized the index names-
I have users that have propagated a lookup command in dashboards that is now a major issue with the lookup file being so large now its causing bundle replication errors (lookup table is a whopping 512mb). They do not have an append=true in their dashboards and personally in my opinion its bad practice to create a lookup table using a dashboard but to use a scheduled search instead and reference in the dashboard.
Id like a few sets of eyes on this and do a sanity check or am i looking at this totally wrong?
BTW do i need commas to separate the field value's?
The user want just certain values updated when run every 15 mins - " pcenter office externalIP"
Their lookup command within the dashboard - | lookup agentsessions.csv sessionId OUTPUTNEW pcenter office externalIP
Now, my theory is they wanted to update just the data in the fields for pcenter, office, and externalIP.
IIRC the OUTPUTNEW command is used to fill a field that previously had no data (was blank)
The OUTPUT command IIRC replaces the specified data with the new data so the new dashboard command should look like this : | lookup agentsessions.csv sessionId OUTPUT pcenter office externalIP.
I created a scheduled search which should update the whole table (renamed it for testing)
index IN (one, two, three, four) source="wineventlog:custom SourceName=DesktopAgentService action timestamp sessionId Heartbeat
| table Message
| spath input=Message
| dedup sessionId sortby +_time
| lookup agentsessions2.csv sessionId OUTPUT sessionId as existingSessionId
| where isnull(existingSessionId)
| fields - action existingSessionId Message
| outputlookup agentsessions2.csv append=true
OR I clould modifiy the scheduled search like this:
index IN (one, two, three, four) source="wineventlog:custom" SourceName=DesktopAgentService action timestamp sessionId Heartbeat
| table Message
| spath input=Message
| dedup sessionId sortby +_time
| lookup agentsessions2.csv sessionId OUTPUT sessionId as existingSessionId pcenter office externalIP
| where isnull(existingSessionId)
| fields - action existingSessionId Message
| outputlookup agentsessions2.csv append=true
Also, with the append=true, wont that duplicate entries each time it is ran? or will it just update the table with new fresh data?
Ran both my scheduled searches and they do seem to work, I just want to verify I am doing it correctly and getting the updated data instead of them trying to do all this in a dashboard that runs every 15 mins. Or should I have them create a dataset table to do all this more efficiently? https://docs.splunk.com/Documentation/Splunk/9.0.4/Knowledge/Aboutdatasets
Thanks, I figured i was overthinking it all
The 'append=true' along with the 'outputlookup' command will overwrite the data in the file.If you wish to keep the old data and append the new results , then something like the below might be more feasible.
.. | inputlookup append=true agentsessions2.csv
| outputlookup agentsessions2.csv
But if the goal is to fetch new data and overwrite the CSV file , then it is ok.
For duplicates, we can just add a dedup command before 'outputlookup'.