I have a scripted input that checks disk space used by directories (--max-depth=1 though!).
So example output looks like this:
Bytes                  Path
40395574               /tmp
0                      /net
4952729895             /usr
134266341              /dev
9517345                /bin
39047719               /etc
1124063352             /opt
4363858083             /esupport
4096                   /selinux
28472166               /lib64
4096                   /srv
8655                   /.dbus
4096                   /cgroup
1064258524604          /mnt
16384                  /lost+found
4096                   /media
1333026929             /var
0                      /misc
9788227                /home
448967744              /lib
18931889               /sbin
457042                 /root
Initially, all the lines showed up in one event. I didn't have anything in the props file that would break each line into its own event at search time. (I later did that for a test machine and it worked fine)
I then proceeded to put the script on a production machine and noticed that separate events were being created even though I did not put anything into the props.conf for the source type.
I thought maybe it was a special character in a directory file, or some odd white space character causing Splunk to interpret the output as separate events.
I wanted to see what running the script looked like on the production system and noticed that it stalled. Well I knew this to be the result of recursively adding all the usage of the many sub-directories below it.
So what I end up with are three separate events all having the same time stamp the break point being the point in time the script "stalled" while calculating space.
 5/1/19 2:28:46.000 PM
 40395574               /tmp
 0                      /net
 4952729895             /usr
 134266341              /dev
 9517345                /bin
 39047719               /etc
 1124063352             /opt
 4363858083             /esupport
 4096                   /selinux
 28472166               /lib64
 4096                   /srv
 8655                   /.dbus
 4096                   /cgroup
5/1/19 2:28:46.000 PM
1064258524604          /mnt
16384                  /lost+found
5/1/19 2:28:46.000 PM
1333026929             /var
0                      /misc
9788227                /home
448967744              /lib
18931889               /sbin
457042                 /root
How do I prevent Splunk from splitting the output of this script into multiple events (before I have a chance to split then into individual events with a props.conf config)????
Thanks
While not providing a very specific reason for why I was having this problem. Splunk support asked me to make changes to the props.conf file on the indexer.
BREAK_ONLY_BEFORE = Bytes
I was reluctant because I wanted search time changes to the data, not index time. In the end, I had to do it as they told me there was no other option.
While not providing a very specific reason for why I was having this problem. Splunk support asked me to make changes to the props.conf file on the indexer.
BREAK_ONLY_BEFORE = Bytes
I was reluctant because I wanted search time changes to the data, not index time. In the end, I had to do it as they told me there was no other option.
 
					
				
		
Try this:
index=YouShouldAlwaysSpecifyAnIndex AND sourcetype=AndSourcetypeToo
| stat list(_raw) AS_raw BY _time host
That search produced one list of all the directories at the time stamp show in my previous post. However, when holding the mouse over the directory items in the list(_raw) column, the background color changed for specific directories that exactly matched how they are "separated" into different events.
 
					
				
		
OK, so add this to the end:
... | nomv _raw
 
					
				
		
Try this in props.conf on your Indexers/HFs:
[YourSourcetypeHere]
DATETIME_CONFIG = CURRENT
SHOULD_LINEMERGE = false
LINE_BREAKER = (?!)
Deploy to the first full instance of Splunk that handles the data, restart Splunk there, send in NEW events (old events will stay wrong), and search using _index_earliest=-5m to be sure that you are only seeing recently indexed events.
I am hoping that I can accomplish this at search time, not index time. So I did not put this on the indexers.
I changed the props.conf file on the search head where I do the searches for this data and it had no effect.
 
					
				
		
No, these settings are ONLY for indexers. I will give you another answer for search-time but it will not be satisfactory.
Okay... it may not be the "stall" time at all. I just copy and pasted the data out to a text file and uploaded to Splunk. It still breaks the events at the same location.
So I thought it was related to the numbers (possibly being interpreted as timestamp data?). So I play around and change the data up and sure enough I can cause event breaks to occur when the number of bytes is large.
Even when I set the props file to DATETIME_CONFIG = NONE the events still break at the large byte count locations.
 
					
				
		
Try these settings in inputs.conf:
time_before_close = <integer>
* The amount of time, in seconds, that the file monitor must wait for
  modifications before closing a file after reaching an End-of-File
  (EOF) marker.
* Tells the input not to close files that have been updated in the
  past 'time_before_close' seconds.
* Default: 3.
multiline_event_extra_waittime = <boolean>
* By default, the file monitor sends an event delimiter when:
  * It reaches EOF of a file it monitors and
  * Ihe last character it reads is a newline.
* In some cases, it takes time for all lines of a multiple-line event to
  arrive.
* Set to "true" to delay sending an event delimiter until the time that the
  file monitor closes the file, as defined by the 'time_before_close' setting,
  to allow all event lines to arrive.
* Default: false.
I really thought you had my solution. I put these into place and it did not change how Splunk handled the data.
So I looked up the options and it seems like they are only for the monitor:// stanzas not script:// stanzas.
