I have a scripted input that checks disk space used by directories (--max-depth=1 though!).
So example output looks like this:
Bytes Path 40395574 /tmp 0 /net 4952729895 /usr 134266341 /dev 9517345 /bin 39047719 /etc 1124063352 /opt 4363858083 /esupport 4096 /selinux 28472166 /lib64 4096 /srv 8655 /.dbus 4096 /cgroup 1064258524604 /mnt 16384 /lost+found 4096 /media 1333026929 /var 0 /misc 9788227 /home 448967744 /lib 18931889 /sbin 457042 /root
Initially, all the lines showed up in one event. I didn't have anything in the props file that would break each line into its own event at search time. (I later did that for a test machine and it worked fine)
I then proceeded to put the script on a production machine and noticed that separate events were being created even though I did not put anything into the props.conf for the source type.
I thought maybe it was a special character in a directory file, or some odd white space character causing Splunk to interpret the output as separate events.
I wanted to see what running the script looked like on the production system and noticed that it stalled. Well I knew this to be the result of recursively adding all the usage of the many sub-directories below it.
So what I end up with are three separate events all having the same time stamp the break point being the point in time the script "stalled" while calculating space.
5/1/19 2:28:46.000 PM 40395574 /tmp 0 /net 4952729895 /usr 134266341 /dev 9517345 /bin 39047719 /etc 1124063352 /opt 4363858083 /esupport 4096 /selinux 28472166 /lib64 4096 /srv 8655 /.dbus 4096 /cgroup
5/1/19 2:28:46.000 PM 1064258524604 /mnt 16384 /lost+found
5/1/19 2:28:46.000 PM 1333026929 /var 0 /misc 9788227 /home 448967744 /lib 18931889 /sbin 457042 /root
How do I prevent Splunk from splitting the output of this script into multiple events (before I have a chance to split then into individual events with a props.conf config)????
Try these settings in
time_before_close = <integer> * The amount of time, in seconds, that the file monitor must wait for modifications before closing a file after reaching an End-of-File (EOF) marker. * Tells the input not to close files that have been updated in the past 'time_before_close' seconds. * Default: 3. multiline_event_extra_waittime = <boolean> * By default, the file monitor sends an event delimiter when: * It reaches EOF of a file it monitors and * Ihe last character it reads is a newline. * In some cases, it takes time for all lines of a multiple-line event to arrive. * Set to "true" to delay sending an event delimiter until the time that the file monitor closes the file, as defined by the 'time_before_close' setting, to allow all event lines to arrive. * Default: false.
I really thought you had my solution. I put these into place and it did not change how Splunk handled the data.
So I looked up the options and it seems like they are only for the monitor:// stanzas not script:// stanzas.
Okay... it may not be the "stall" time at all. I just copy and pasted the data out to a text file and uploaded to Splunk. It still breaks the events at the same location.
So I thought it was related to the numbers (possibly being interpreted as timestamp data?). So I play around and change the data up and sure enough I can cause event breaks to occur when the number of bytes is large.
Even when I set the props file to DATETIME_CONFIG = NONE the events still break at the large byte count locations.
Try this in props.conf on your Indexers/HFs:
[YourSourcetypeHere] DATETIME_CONFIG = CURRENT SHOULD_LINEMERGE = false LINE_BREAKER = (?!)
Deploy to the first full instance of Splunk that handles the data, restart Splunk there, send in NEW events (old events will stay wrong), and search using
_index_earliest=-5m to be sure that you are only seeing recently indexed events.
I am hoping that I can accomplish this at search time, not index time. So I did not put this on the indexers.
I changed the props.conf file on the search head where I do the searches for this data and it had no effect.
No, these settings are ONLY for indexers. I will give you another answer for search-time but it will not be satisfactory.
index=YouShouldAlwaysSpecifyAnIndex AND sourcetype=AndSourcetypeToo | stat list(_raw) AS_raw BY _time host
That search produced one list of all the directories at the time stamp show in my previous post. However, when holding the mouse over the directory items in the list(_raw) column, the background color changed for specific directories that exactly matched how they are "separated" into different events.