Solved: How to compare the row count of CSV or XLS files d...

meghnak · ‎03-22-2017

How to compare the row count of CSV or XLS files day over day and generate an alert if the threshold is greater than 10%?

I want to compare row count of current day file vs previous day file and generate an alert for threshold.
Basically the number of records in the files day over day should be within the threshold of 10%. How to achieve this via Splunk?

DalJeanis · ‎03-22-2017

The exact code is going to depend on where you are putting the output. Here's one way, assuming that some other process is generating the file and you can determine what the names were...

your search for todays file
| stats count as todayscount 
| append [| inputcsv yesterdaysfile.csv | stats count as yesterdayscount]
| eval change= 100*abs(todayscount-yesterdayscount))/yesterdayscount
| where change>10

Or, perhaps this way if you want to send the alert out of the actual report that creates the file, and use the same file name every day...

 your search that creates todays file
 | eventstats count as todayscount 
 | append [| inputcsv thefile.csv | stats count as yesterdayscount]
 | appendpipe [| table ...list each field you DO want on the file... | outputcsv thefile.csv | where false() ]
 | stats max(todayscount) as todayscount max(yesterdayscount) as yesterdayscount
 | eval change= abs(100*(todayscount-yesterdayscount)/yesterdayscount)
 | where change>10

View solution in original post

DalJeanis · ‎03-22-2017

The exact code is going to depend on where you are putting the output. Here's one way, assuming that some other process is generating the file and you can determine what the names were...

your search for todays file
| stats count as todayscount 
| append [| inputcsv yesterdaysfile.csv | stats count as yesterdayscount]
| eval change= 100*abs(todayscount-yesterdayscount))/yesterdayscount
| where change>10

Or, perhaps this way if you want to send the alert out of the actual report that creates the file, and use the same file name every day...

 your search that creates todays file
 | eventstats count as todayscount 
 | append [| inputcsv thefile.csv | stats count as yesterdayscount]
 | appendpipe [| table ...list each field you DO want on the file... | outputcsv thefile.csv | where false() ]
 | stats max(todayscount) as todayscount max(yesterdayscount) as yesterdayscount
 | eval change= abs(100*(todayscount-yesterdayscount)/yesterdayscount)
 | where change>10

meghnak · ‎03-23-2017

So, suppose the file name will be X at the C:\Files location\X and the previous day file will be at C:\Files\Archive\X_previousdate location.

How do I pull yesterday's file from the C:\Files\Archive location as it will have files for all previous day. How to pull the latest file from that location

DalJeanis · ‎03-23-2017

Are these files being ingested and indexed by splunk?

Come to think of it, if you only need the count from yesterday, there's no sense keeping the whole file. Try this...

  your search that creates todays file
  | append [| inputcsv thefile.csv | table yesterdayscount]
  | appendpipe [| where isnull(yesterdayscount) | stats count as todayscount]
  | stats max(todayscount) as todayscount max(yesterdayscount) as yesterdayscount
  | appendpipe [| table todayscount | rename todayscount as yesterdayscount | outputcsv thefile.csv | where false() ]      
  | eval change= abs(100*(todayscount-yesterdayscount)/yesterdayscount)
  | where change>10

meghnak · ‎03-30-2017

Is there a command which can get results for last two events? That would be helpful

There are two files at C:\Temp
File 1: X.csv
File 2: X_20170330.csv(basically y'day's file)

I want to get these two file searched and compare the row count of them after which I can use the
| eval change= abs(100*(todayscount-yesterdayscount)/yesterdayscount)
| where change>10 command
I am stuck on it since many days, any guidance will be a great help

How to compare the row count of CSV or XLS files day over day and generate an alert if the threshold is greater than 10%?

Introducing Splunk Enterprise 9.2

Adoption of RUM and APM at Splunk

Routing logs with Splunk OTel Collector for Kubernetes