Good times....good times. Splunk is refusing to index past 42 lines of my data regardless of what I do.
This is WC on my file.
5929  > 86549  > 2439613 report.csv
I have tried the following:
 1. Set linemerge to false.  This allowed splunk to break my data down in to separate events properly.  It is 5929 lines, one event per line.
 2. Set this item up as a monitor input on my universal forwarder.  42 great lines of everything OK, the CRUNK it stops.
 3. Looked at the file - there are no special characters whatsoever other than EOL at the breaking point.
 4. Netcatted the file to a TCP port on the indexer itself.  Gave me 42 beautiful events before exploding.
 5.  Oneshot that thing.  No help.  
I KNOW that 42 is the meaning of life...but....but...
Well, I'm more than a bit curious, either you've found a bug - or there is something wrong with the parsing of the timestamps, is my guess. Did you try searching 'All Time', just to make sure that the events are not in the index?
One thing that might be worth testing is to create a new test-index, just to ensure that there is nothing else in it.
Then import the data again into that index. Take note of any DateParserVerbose errors/warnings in splunkd.log.
Have a look at the Manager -> indexes. How many events are there in the new index?
Take a close look a the events you see in the search app. Are the (correct) timestamps parsed correctly? In some cases, where the timestamp in an event is ambiguous/partly missing, Splunk will make a best-effort of finding timestamp information in the event data.
e.g 2012-09-25 1300hours 14.38 kb transferred blah blah
could be interpreted as 2012-09-25 14:38
Also, it'd be good to see your props.conf from the indexer
And yes, a few sample events, even if you have to edit them would really help.
Hope this helps,
Kristian
It was the timestamp algo...but in a super strange fashion. It locked on to another timestamp and everything blew up.
Well, I'm more than a bit curious, either you've found a bug - or there is something wrong with the parsing of the timestamps, is my guess. Did you try searching 'All Time', just to make sure that the events are not in the index?
One thing that might be worth testing is to create a new test-index, just to ensure that there is nothing else in it.
Then import the data again into that index. Take note of any DateParserVerbose errors/warnings in splunkd.log.
Have a look at the Manager -> indexes. How many events are there in the new index?
Take a close look a the events you see in the search app. Are the (correct) timestamps parsed correctly? In some cases, where the timestamp in an event is ambiguous/partly missing, Splunk will make a best-effort of finding timestamp information in the event data.
e.g 2012-09-25 1300hours 14.38 kb transferred blah blah
could be interpreted as 2012-09-25 14:38
Also, it'd be good to see your props.conf from the indexer
And yes, a few sample events, even if you have to edit them would really help.
Hope this helps,
Kristian
It was the timestamp algo...but in a super strange fashion. It locked on to another timestamp and everything blew up.
No - exploding is just the term that popped to mind after spending a week building the event extraction for this data. The indexer still runs afterwards.
Unfortunately, I cannot provide a sample data set. This data is highly (instantly fired) confidential, and scrubbing it of proprietary information would likely kill any usefulness due to changing it up.
I can tell you:
a) it is a very large text file, CSV formatted.
b) field extraction does not occur on the indexer, only on the search head.
c) the file usually has a header and then 5000+ lines of single line/single event data.
4) The sourcetype is custom, and the only modifier on the sourcetype is no linemerging.
V) I have piped the data in over a network port with identical results.
f) No clue what goes here, but the differences in line number schemes makes this entry into my list utterly necessary.
 
		
		
		
		
		
	
			
		
		
			
					
		Very strange, did you checked the splunkd logs ?
the question is :
 
		
		
		
		
		
	
			
		
		
			
					
		So this is an indexing time issue.
Please provide the sourcetype applied of your file, and a test sample.
you say 'exploding' - does the splunkd on the indexer actually stop working?
No NQF, have moved from using the forwarder to netcatting straight to the indexer, so I know its dying at the indexing engine and not at the forwarder.
