Getting Data In

how to keep data together as one event

hiddenkirby
Contributor

If i can pre-process the data... (wrap it in tags or something)... is there a good way to keep data that usually splits into multiple events... have it keep it as one event?

1 Solution

hexx
Splunk Employee
Splunk Employee

Provided that the data you are trying to consolidate in a single event comes from the same file input and is adjacent (i.e : lines following each other in the source file), what you want to do here is configure line-breaking to merge lines into a single event.

The general instructions regarding line-breaking can be found here in our online documentation :

http://www.splunk.com/base/Documentation/latest/Admin/Indexmulti-lineevents

If you can include tags as delimiters for your events, this will make things easier and you can inform Splunk of this by populating LINE_BREAKER (in props.conf) with the adequate regex. From http://www.splunk.com/base/Documentation/latest/Admin/Propsconf :

LINE_BREAKER = <regular expression>
* Specifies a regex that determines how the raw text stream is broken into initial events, before line merging takes place. (See SHOULD_LINEMERGE)
* Defaults to ([\r\n]+), meaning data is broken into an event for each line, delimited by \r or \n. 
* The regex must contain a matching group. 
* Wherever the regex matches, the start of the first matching group is considered the end of the previous event, and the end of the first matching group is considered the start of the next  event.
* The contents of the first matching group is ignored as event text.
* NOTE: There is a significant speed boost by using the LINE_BREAKER to delimit multiline events, rather than using line merging to reassemble individual lines into events.

There are other settings you may need to specify in your props.conf.

Make sure SHOULD_LINEMERGE is set to true :

SHOULD_LINEMERGE = true | false
* When set to true, Splunk combines several lines of data into a single event, based on the following configuration attributes.
* Defaults to true.

If you are trying to include more than 256 lines in a single event, make sure that you tweak MAX_EVENTS and TRUNCATE accordingly :

MAX_EVENTS = <integer>
* Specifies the maximum number of input lines to add to any event. 
* Splunk breaks after the specified number of lines are read.
* Defaults to 256.

TRUNCATE = <non-negative integer>
* Change the default maximum line length.  
* Set to 0 if you do not want truncation ever (very long lines are, however, often a sign of garbage data).
* Defaults to 10000.

View solution in original post

lguinn2
Legend

Update: Check out this answer to the same question Each File as One Single Splunk Event

[mysinglefilesourcetype]
SHOULD_LINEMERGE = false
LINE_BREAKER = ((*FAIL))
TRUNCATE = 99999999

I think this is newer information

0 Karma

ftk
Motivator

For a practical example of how to index entire files, have a look at this answer. The example indexes entire splunk config files.

http://answers.splunk.com/questions/2882/using-fschange-to-monitor-windows-filesystem/3620#3620

hulahoop
Splunk Employee
Splunk Employee

Hello Hiddenkirby,

I think the easiest thing to do is preface the event with a well-formatted timestamp:

09-16-2010 11:41:00.000 PST my awesome event
that breaks
over multiple lines
09-16-2010 11:42:00.000 PST another very cool event
that breaks
over many lines

Then set line breaking rules in props.conf for your data source:

[my_sourcetype]
SHOULD_LINEMERGE = true
BREAK_ONLY_BEFORE_DATE = true

Actually, these are the default settings for any data source so you shouldn't have to add any configuration.

hulahoop
Splunk Employee
Splunk Employee

Lowell, that is an excellent suggestion.

0 Karma

Lowell
Super Champion

Sometimes it's nice to be explicit in your custom config files. It helps make it clear what you are expecting the behavior to be, and protects you if the defaults ever change (due to a config screwup or otherwise)

0 Karma

hexx
Splunk Employee
Splunk Employee

Provided that the data you are trying to consolidate in a single event comes from the same file input and is adjacent (i.e : lines following each other in the source file), what you want to do here is configure line-breaking to merge lines into a single event.

The general instructions regarding line-breaking can be found here in our online documentation :

http://www.splunk.com/base/Documentation/latest/Admin/Indexmulti-lineevents

If you can include tags as delimiters for your events, this will make things easier and you can inform Splunk of this by populating LINE_BREAKER (in props.conf) with the adequate regex. From http://www.splunk.com/base/Documentation/latest/Admin/Propsconf :

LINE_BREAKER = <regular expression>
* Specifies a regex that determines how the raw text stream is broken into initial events, before line merging takes place. (See SHOULD_LINEMERGE)
* Defaults to ([\r\n]+), meaning data is broken into an event for each line, delimited by \r or \n. 
* The regex must contain a matching group. 
* Wherever the regex matches, the start of the first matching group is considered the end of the previous event, and the end of the first matching group is considered the start of the next  event.
* The contents of the first matching group is ignored as event text.
* NOTE: There is a significant speed boost by using the LINE_BREAKER to delimit multiline events, rather than using line merging to reassemble individual lines into events.

There are other settings you may need to specify in your props.conf.

Make sure SHOULD_LINEMERGE is set to true :

SHOULD_LINEMERGE = true | false
* When set to true, Splunk combines several lines of data into a single event, based on the following configuration attributes.
* Defaults to true.

If you are trying to include more than 256 lines in a single event, make sure that you tweak MAX_EVENTS and TRUNCATE accordingly :

MAX_EVENTS = <integer>
* Specifies the maximum number of input lines to add to any event. 
* Splunk breaks after the specified number of lines are read.
* Defaults to 256.

TRUNCATE = <non-negative integer>
* Change the default maximum line length.  
* Set to 0 if you do not want truncation ever (very long lines are, however, often a sign of garbage data).
* Defaults to 10000.

View solution in original post

hexx
Splunk Employee
Splunk Employee

You might find this Splunk Answer interesting, as it most certainly covers your use-case : http://answers.splunk.com/questions/5426/entire-file-contents-as-a-single-event

0 Karma
.conf21 Now Fully Virtual!
Register for FREE Today!

We've made .conf21 totally virtual and totally FREE! Our completely online experience will run from 10/19 through 10/20 with some additional events, too!