Getting Data In

Problem with line breaks


I am trying to index a file that looks like the following:

1,"Location" 2,"Attack Type" 3,"Impact" 4,"Exploit" 5,"OSVDB" 6,"Solution" 7,"Disclosure"

I want to force Splunk to treat each line as a separate event. I have tried every combination I can think of of SHOULD_LINEMERGE, BREAK_ONLY_BEFORE, BREAK_ONLY_AFTER, and MAX_EVENTS. The best success I've had is using MAX_EVENTS = 1. The only problem is that the first two lines of the input file are always treated as one event.

What do I need to do to force Splunk to treat each line in the above file as a single event?



Tags (1)

As per the original question, try to set


and don't specify any LINE_BREAKER, BREAK_ONLY_BEFORE, ... option. This way splunk will break events at every new line.

For the messy formats: those are basically not CSV then... If you always have the two numbers as first characters of each event, like:

18776,18776,"Apple Mac OS X A....

then try setting:

LINE_BREAKER= ([\r\n]+)\d+,\d+,

you might also want to play around with TIME_PREFIX and TIME_FORMAT to ensure Splunk will mark the events with the correct timestamp.


Some of my inputs are still having problems. The file is supposed to be in a comma delimited format, but the contents of some columns contain \n and commas as part of the "data".

Some events are clean:

1080071,61191,"Cisco ASA Clientless SSL VPN URL Rewriting Cross Domain Same Origin Policy Bypass","2009-12-06 13:30:58","2009-12-19 02:04:43","1970-01-01 00:00:00","2006-06-08 00:00:00","1970-01-01 00:00:00","","","","",\N,"1970-01-01 00:00:00"

Some events are messy and have commas and backslashes in the body:

18776,18776,"Apple Mac OS X AppKit Error Condition Local Account Creation","2005-08-16 23:54:36","2010-11-02 06:59:20","1970-01-01 00:00:00","2005-08-16 23:54:39","1970-01-01 00:00:00","Mac OS X 10.3 - 10.4.2 AppKit Error Condition Local Account Creation","Mac OS X contains a flaw that may allow a malicious user to gain access to unauthorized privileges. The issue is triggered when an attacker triggers an error condition at the login screen which allows new accounts to be created. This flaw may lead to a loss of integrity.",\N,"Currently, there are no known workarounds or upgrades to correct this issue. However, Apple has released a patch to address this vulnerability.",\N,"1970-01-01 00:00:00"

I tried a number of different settings in the props.conf, but I couldn't find any combination of settings that would work:

[osvdb_vulnerabilities] SHOULD_LINEMERGE = TRUE MUST_BREAK_AFTER = ,\N,"\d\d\d\d-\d\d-\d\d\s\d\d:\d\d:\d\d"


SHOULD_LINEMERGE = TRUE BREAK_ONLY_BEFORE = \d+,\d+,"[^"]+","\d+-\d+-\d+\s\d+:\d+:\d+","\d+-\d+-\d+\s\d+:\d+:\d+","\d+-\d+-\d+\s\d+:\d+:\d+","\d+-\d+-\d+\s\d+:\d+:\d+","\d+-\d+-\d+\s\d+:\d+:\d"


[osvdb_vulnerabilities] SHOULD_LINEMERGE = TRUE BREAK_ONLY_BEFORE = ^\d+,\d+,"


[osvdb_vulnerabilities] SHOULD_LINEMERGE = TRUE BREAK_ONLY_BEFORE = ^\d+,\d+," AUTO_LINEMERGE = FALSE MUST_BREAK_AFTER = \s00:00:00"

It seems like the \ character is causing Splunk to break the event. Is it possible to turn that off somehow?

I'm still getting events like this however:

\ http://[target]/admin/MembersAreaManager/components/SecurityLevelManager/upload_image_security_level.asp?cid=-12312312 union select 1,Security_AdminPassword,3,4,5,6 from tblConfig","1970-01-01 00:00:00"

Eventually I added the following transforms and I'm good to go:

[vulnerabilities_index] REGEX = ^\d+,\d+,"[^"]+" DEST_KEY = queue FORMAT = indexQueue

[vulnerabilities_null] REGEX = . DEST_KEY = queue FORMAT = nullQueue

I only needed the first two fields out of the message anyway...


In the example you posted everything is on one single line. Is that correct or is it just a formatting problem in your post?

That said, the default is for Splunk to have SHOULD_LINEMERGE=true and LINE_BREAKER=[\r\n]+ which means it breaks on every newline. If that's not the behaviour you're getting I'm suspecting you're using a sourcetype with other defaults set.