Solved: How do you hash an entire lookup file (detecting c...

awmorris · ‎09-04-2018

I have several critical lookup files that I want to monitor to determine if they are altered in ANY capacity (lookup editor, outputlookup command, command line, etc.)

One idea i had was to call something like the MD5 function on the ENTIRE lookup file but can't seem to do that. My current method at present is to calculate the length of every field and sum them all up for a total byte count. It wouldn't detect a net-zero change in total bytes, but absent a better solution, it may be my best hope.

Ideas?

awmorris · ‎11-01-2018

SOLVED IT... here is my query i landed on:

|inputlookup historyOfHashes.csv
|append [
  |inputlookup lookupFileToMonitor.csv
  |fieldsummary
  |eval foo=MD5(values)
  |stats values(foo) AS foofoo
  |nomv foofoo
  | rex mode=sed field=foofoo "s/ //g"
  |eval finalfoo=MD5(foofoo)
  |eval hashTimestamp=now()
  |convert ctime(hashTimestamp)
  |fields hashTimestamp,finalfoo]
|outputlookup historyOfHashes.csv

View solution in original post

woodcock · ‎11-01-2018

This would make a GREAT project for an intern to write a custom splunk command. Python can trivially compute MD5 of a file. It would probably only take him a week to get it done (the python < 20 lines) and would enable your team to then have this powerful tool at your disposal for future supra-splunk solutions. That is what I would do (and I might even put one of my guys on this task as a learning exercise).

awmorris · ‎11-01-2018

SOLVED IT... here is my query i landed on:

|inputlookup historyOfHashes.csv
|append [
  |inputlookup lookupFileToMonitor.csv
  |fieldsummary
  |eval foo=MD5(values)
  |stats values(foo) AS foofoo
  |nomv foofoo
  | rex mode=sed field=foofoo "s/ //g"
  |eval finalfoo=MD5(foofoo)
  |eval hashTimestamp=now()
  |convert ctime(hashTimestamp)
  |fields hashTimestamp,finalfoo]
|outputlookup historyOfHashes.csv

awmorris · ‎11-01-2018

Note - this really only works well if there are less than 500 distinct values in each field.... but there is enough direction there you can work with the idea yourself.

DalJeanis · ‎09-05-2018

Depending on the size of the lookup table, you might consider just making a summary index that duplicates the contents of the lookup.

This way, you can set diff the most recent updated lookup from the index against the current lookup, and if there are any changes, then alert ... and save a new copy to the summary index.

Seems like you'd just need the summary events to be all the lookup fields, plus the _time and the lookup file name. You'd compare against all the lookup fields for the highest _time on the index for that lookup name. I'd have the lookup name be an indexed field in that summary index, for obvious reasons.

DalJeanis · ‎09-05-2018

Something like this...

 | set diff
      [index=summaryfoo lookup=mylookup 
          [index=summaryfoo lookup=mylookup | head 1 | table _time]
      | fields - _time lookup
      ]
      [ | inputlookup mylookup ]
 | stats count
 | where count > 0

awmorris · ‎09-04-2018

For the benefit of anyone else, here is the query I am using to determine total byte count (The result is total bytes in each column):

|inputlookup myLookupDefinition | foreach * [eval len_'<>'=len('<>')]|stats sum(len_*)

The benefit then is the addition or subtraction of a single byte will trigger an alert... but i do prefer a hash of the entire table. 😞

How do you hash an entire lookup file (detecting change to lookups)?

Index This | What is broken 80% of the time by February?

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...

Splunk MCP & Agentic AI: Machine Data Without Limits

Join the Conversation

How do you hash an entire lookup file (detecting change to lookups)?

Index This | What is broken 80% of the time by February?

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...

Splunk MCP & Agentic AI: Machine Data Without Limits