Splunk Search

Why is the "diff" search command not reliable for large events containing several hundred lines?

hexx
Splunk Employee
Splunk Employee

When I use the "diff" search command to compare events that contain several hundred lines, I notice that differences located at "the bottom" of the event (after about 500 lines of event content) are not picked up.

Why is that?

Is there a way to circumvent this limitation?

Tags (3)
1 Solution

hexx
Splunk Employee
Splunk Employee

When used from the Search app, the diff search command calls a Python script located in $SPLUNK_HOME/etc/apps/search/bin/diff.py which uses the difflib Python library.

As it ships with Splunk, this script truncates its input at 9,000 characters :


# less $SPLUNK_HOME/apps/search/bin/diff.py

# Copyright (C) 2005-2010 Splunk Inc.  All Rights Reserved.  Version 4.0
import sys,splunk.Intersplunk
import difflib,time
import splunk.mining.dcutils as dcu

logger = dcu.getLogger()

##  COMPARE TWO RESULTS
##  ARGS [pos1 pos2] [attribute to compare]
##
##  DEFAULTS = 1 2 _raw
##

(...)

maxlen = 9000

(...)

  if len(val1) > maxlen or len(val2) > maxlen:
      # cut text off at maxlen
      val1 = val1[:maxlen]
      val2 = val2[:maxlen]

(...)

This limitation is explained in the Search Reference Manual as being roughly equivalent to 500 lines :

http://www.splunk.com/base/Documentation/latest/SearchReference/Diff#Description

You can change the value of "maxlen" to enable diff.py to compare larger events. The best method to do this would be to make a copy of diff.py (so as to make sure your version isn't overwritten during a Splunk upgrade), increase the value of "maxlen" according to your needs and declare your new search command in commands.conf to replace "diff".

Example :

  • cp $SPLUNK_HOME/etc/app/search/bin/diff.py $SPLUNK_HOME/etc/app/search/bin/mydiff.py
  • vi $SPLUNK_HOME/etc/app/search/bin/mydiff.py
  • change "maxlen" to 12000 on line (for example)
  • vi $SPLUNK_HOME/etc/app/search/local/commands.conf
  • add the following stanza :

[diff]
filename = mydiff.py
supports_getinfo = true
enableheader = false
retainsevents = true
changes_colorder = false
overrides_timeorder = true

When invoking "diff" in the Search app, your comparisons will now be limited to 12,000 characters per event instead of 9,000.

CAUTION : This cap was set in order to prevent the "diff" command from consuming excessive amounts of memory if for example you feed it tens of thousands of very long events. Be aware that you increase this limit at your system's resources risk!

View solution in original post

Marinus
Communicator

I'd make a copy of diff and add a new option in maxlines, so that you can tweak it as you need to. You can add a new option as follows.

# poor mans opt
for a in sys.argv[1:]:

    if a.startswith("maxlen="):
        where = a.find('=')
        maxlen = a[where+1:len(a)]

hexx
Splunk Employee
Splunk Employee

When used from the Search app, the diff search command calls a Python script located in $SPLUNK_HOME/etc/apps/search/bin/diff.py which uses the difflib Python library.

As it ships with Splunk, this script truncates its input at 9,000 characters :


# less $SPLUNK_HOME/apps/search/bin/diff.py

# Copyright (C) 2005-2010 Splunk Inc.  All Rights Reserved.  Version 4.0
import sys,splunk.Intersplunk
import difflib,time
import splunk.mining.dcutils as dcu

logger = dcu.getLogger()

##  COMPARE TWO RESULTS
##  ARGS [pos1 pos2] [attribute to compare]
##
##  DEFAULTS = 1 2 _raw
##

(...)

maxlen = 9000

(...)

  if len(val1) > maxlen or len(val2) > maxlen:
      # cut text off at maxlen
      val1 = val1[:maxlen]
      val2 = val2[:maxlen]

(...)

This limitation is explained in the Search Reference Manual as being roughly equivalent to 500 lines :

http://www.splunk.com/base/Documentation/latest/SearchReference/Diff#Description

You can change the value of "maxlen" to enable diff.py to compare larger events. The best method to do this would be to make a copy of diff.py (so as to make sure your version isn't overwritten during a Splunk upgrade), increase the value of "maxlen" according to your needs and declare your new search command in commands.conf to replace "diff".

Example :

  • cp $SPLUNK_HOME/etc/app/search/bin/diff.py $SPLUNK_HOME/etc/app/search/bin/mydiff.py
  • vi $SPLUNK_HOME/etc/app/search/bin/mydiff.py
  • change "maxlen" to 12000 on line (for example)
  • vi $SPLUNK_HOME/etc/app/search/local/commands.conf
  • add the following stanza :

[diff]
filename = mydiff.py
supports_getinfo = true
enableheader = false
retainsevents = true
changes_colorder = false
overrides_timeorder = true

When invoking "diff" in the Search app, your comparisons will now be limited to 12,000 characters per event instead of 9,000.

CAUTION : This cap was set in order to prevent the "diff" command from consuming excessive amounts of memory if for example you feed it tens of thousands of very long events. Be aware that you increase this limit at your system's resources risk!

Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.

Can’t make it to .conf25? Join us online!

Get Updates on the Splunk Community!

Community Content Calendar, September edition

Welcome to another insightful post from our Community Content Calendar! We're thrilled to continue bringing ...

Splunkbase Unveils New App Listing Management Public Preview

Splunkbase Unveils New App Listing Management Public PreviewWe're thrilled to announce the public preview of ...

Leveraging Automated Threat Analysis Across the Splunk Ecosystem

Are you leveraging automation to its fullest potential in your threat detection strategy?Our upcoming Security ...