Splunk Search

How do you hash an entire lookup file (detecting change to lookups)?

awmorris
Path Finder

I have several critical lookup files that I want to monitor to determine if they are altered in ANY capacity (lookup editor, outputlookup command, command line, etc.)

One idea i had was to call something like the MD5 function on the ENTIRE lookup file but can't seem to do that. My current method at present is to calculate the length of every field and sum them all up for a total byte count. It wouldn't detect a net-zero change in total bytes, but absent a better solution, it may be my best hope.

Ideas?

0 Karma
1 Solution

awmorris
Path Finder

SOLVED IT... here is my query i landed on:

|inputlookup historyOfHashes.csv
|append [
  |inputlookup lookupFileToMonitor.csv
  |fieldsummary
  |eval foo=MD5(values)
  |stats values(foo) AS foofoo
  |nomv foofoo
  | rex mode=sed field=foofoo "s/ //g"
  |eval finalfoo=MD5(foofoo)
  |eval hashTimestamp=now()
  |convert ctime(hashTimestamp)
  |fields hashTimestamp,finalfoo]
|outputlookup historyOfHashes.csv

View solution in original post

0 Karma

woodcock
Esteemed Legend

This would make a GREAT project for an intern to write a custom splunk command. Python can trivially compute MD5 of a file. It would probably only take him a week to get it done (the python < 20 lines) and would enable your team to then have this powerful tool at your disposal for future supra-splunk solutions. That is what I would do (and I might even put one of my guys on this task as a learning exercise).

0 Karma

awmorris
Path Finder

SOLVED IT... here is my query i landed on:

|inputlookup historyOfHashes.csv
|append [
  |inputlookup lookupFileToMonitor.csv
  |fieldsummary
  |eval foo=MD5(values)
  |stats values(foo) AS foofoo
  |nomv foofoo
  | rex mode=sed field=foofoo "s/ //g"
  |eval finalfoo=MD5(foofoo)
  |eval hashTimestamp=now()
  |convert ctime(hashTimestamp)
  |fields hashTimestamp,finalfoo]
|outputlookup historyOfHashes.csv

View solution in original post

0 Karma

awmorris
Path Finder

Note - this really only works well if there are less than 500 distinct values in each field.... but there is enough direction there you can work with the idea yourself.

0 Karma

DalJeanis
SplunkTrust
SplunkTrust

Depending on the size of the lookup table, you might consider just making a summary index that duplicates the contents of the lookup.

This way, you can set diff the most recent updated lookup from the index against the current lookup, and if there are any changes, then alert ... and save a new copy to the summary index.

Seems like you'd just need the summary events to be all the lookup fields, plus the _time and the lookup file name. You'd compare against all the lookup fields for the highest _time on the index for that lookup name. I'd have the lookup name be an indexed field in that summary index, for obvious reasons.

0 Karma

DalJeanis
SplunkTrust
SplunkTrust

Something like this...

 | set diff
      [index=summaryfoo lookup=mylookup 
          [index=summaryfoo lookup=mylookup | head 1 | table _time]
      | fields - _time lookup
      ]
      [ | inputlookup mylookup ]
 | stats count
 | where count > 0
0 Karma

awmorris
Path Finder

For the benefit of anyone else, here is the query I am using to determine total byte count (The result is total bytes in each column):

|inputlookup myLookupDefinition | foreach * [eval len_'<>'=len('<>')]|stats sum(len_*)

The benefit then is the addition or subtraction of a single byte will trigger an alert... but i do prefer a hash of the entire table. 😞

0 Karma
.conf21 Now Fully Virtual!
Register for FREE Today!

We've made .conf21 totally virtual and totally FREE! Our completely online experience will run from 10/19 through 10/20 with some additional events, too!