Getting Data In

Log file indexes correctly on initial run but considers 8K chunks a single record

brianirwin
Path Finder

Greetings

I am pretty new to Splunk and am having issues when it comes to indexing some of our files. They are written to by some Java code I do not own, and looking at the file sizes they grow in 8K chunks.

On the first pass everything works great, each line is parsed and then stored as a unique record. After Splunk becomes current however it starts compressing multiple lines from the log file it is parsing into a single Splunk event with a linecount ~ 20-30.

The log file is a .csv and is being written from a Java library I do not own, and looking at the filesystem it looks as if the file grows in perfect 8K chunks. Our best guess is that since records do not always end on the 8192 boundary that Splunk is treating all of these 8K block writes as a single event. I have dumped some of multi-line records from SplunkWeb into Microsoft Office and I get a word count right about 8K.

I don't own this system (nor have root) so can not upgrade from the Splunk 3.4.10 (build 60883).

Any suggestions you can offer would be greatly appreciated, the 8K chunks may be a red herring, but it is the best guess we have.

Brian

@gkanapathy, sample files (including CSV header below), the current setup is six indexers (in two data centers) and one search head.

Message ID,WS-Security ID,Source System ID,Source System User ID,Source Server ID,Tracking ID,Transaction Key,Market,Farm Name,Server Node Name,Managed Node Name,Service Name,Service Version,Service Operation Name,Back End Name,API Name,Start Date/Time,End Date/Time,Hour of Day,Response Time,Cache Hit,Fault,Fault Code,Fault Message,Connection Timeout,Connection Timeout Retries,Socket/Read Timeout,Socket/Read Timeout Retries,Internal Timeout,Internal Timeout Retries

a7289d9b-14f9-ce47-a263-a1b08dd37c4b,client_user,IVR-2009,IVRUser,DAS_Server,XbgCWmmc1MLvirO4,,,F2CON,farm2host,domain1,TroubleManagementService,10.06,query,Billling.System.1.Dispatch,GetWipInfo,8/19/2010 14:27,8/19/2010 14:27,14,76,TRUE,TRUE,,,FALSE,0,FALSE,0,FALSE,0
540762e4-83a3-3bd7-f721-a699fd502a71,client_user,IVR-2009,IVRUser,DAS_Server,8mO3mD7ThheFj4os,,,F2CON,farm2host,domain1,TroubleManagementService,10.06,query,Billling.System.1.Dispatch,GetWipInfo,8/19/2010 14:27,8/19/2010 14:27,14,82,TRUE,TRUE,,,FALSE,0,FALSE,0,FALSE,0
c7ac5267-62e7-f603-0c65-36759db1a2e9,client_user,IVR-2009,IVRUser,DAS_Server,FvgTFtQJfx1vDEqD,,,F2CON,farm2host,domain1,TroubleManagementService,10.06,query,Billling.System.1.Dispatch,GetWipInfo,8/19/2010 14:27,8/19/2010 14:27,14,76,TRUE,TRUE,,,FALSE,0,FALSE,0,FALSE,0
c7ac5267-62e7-f603-0c65-36759db1a2e9,client_user,IVR-2009,IVRUser,DAS_Server,FvgTFtQJfx1vDEqD,,,F2CON,farm2host,domain1,TroubleManagementService,10.06,query,Billling.System.1.Dispatch,GetWipInfo,8/19/2010 14:27,8/19/2010 14:27,14,74,TRUE,TRUE,,,FALSE,0,FALSE,0,FALSE,0
5dd84b60-eea7-7cc0-a10a-c5f8856427ee,client_user,IVR-2009,IVRUser,DAS_Server,lAoOTh4ebgqdqLK5,,,F2CON,farm2host,domain1,TroubleManagementService,10.06,query,Billling.System.1.Dispatch,GetWipInfo,8/19/2010 14:27,8/19/2010 14:27,14,72,TRUE,TRUE,,,FALSE,0,FALSE,0,FALSE,0
5dd84b60-eea7-7cc0-a10a-c5f8856427ee,client_user,IVR-2009,IVRUser,DAS_Server,lAoOTh4ebgqdqLK5,,,F2CON,farm2host,domain1,TroubleManagementService,10.06,query,Billling.System.1.Dispatch,GetWipInfo,8/19/2010 14:27,8/19/2010 14:27,14,75,TRUE,TRUE,,,FALSE,0,FALSE,0,FALSE,0
bd62210e-7070-7b77-7565-a8fa93497b1f,client_user,IVR-2009,IVRUser,DAS_Server,3hh5nhBe3sJ2mXzx,,,F2CON,farm2host,domain1,TroubleManagementService,10.06,query,Billling.System.1.Dispatch,GetWipInfo,8/19/2010 14:30,8/19/2010 14:30,14,62,TRUE,TRUE,,,FALSE,0,FALSE,0,FALSE,0
2219fd91-df32-5d04-1f71-34efff65bc49,client_user,IVR-2009,IVRUser,DAS_Server,H0V6dvWpcyoDK1kB,,,F2CON,farm2host,domain1,ChannelService,10.07,queryChannelsInCustArea,Billling.System.1.Dispatch,GetHouseInfo,8/19/2010 14:30,8/19/2010 14:30,14,0,TRUE,FALSE,,,FALSE,0,FALSE,0,FALSE,0
2219fd91-df32-5d04-1f71-34efff65bc49,client_user,IVR-2009,IVRUser,DAS_Server,H0V6dvWpcyoDK1kB,,,F2CON,farm2host,domain1,ChannelService,10.07,queryChannelsInCustArea,Billling.System.1.Dispatch,GetOrderParams,8/19/2010 14:30,8/19/2010 14:30,14,122,TRUE,FALSE,,,FALSE,0,FALSE,0,FALSE,0
b38a1308-698c-c2ee-05cb-0c0c22524e57,client_user,IVR-2009,IVRUser,DAS_Server,YKcN5dGxaYQ5L05J,,,F2CON,farm2host,domain1,TroubleManagementService,10.06,query,Billling.System.1.Dispatch,GetWipInfo,8/19/2010 14:30,8/19/2010 14:30,14,59,TRUE,TRUE,,,FALSE,0,FALSE,0,FALSE,0
b38a1308-698c-c2ee-05cb-0c0c22524e57,client_user,IVR-2009,IVRUser,DAS_Server,YKcN5dGxaYQ5L05J,,,F2CON,farm2host,domain1,TroubleManagementService,10.06,query,Billling.System.1.Dispatch,GetWipInfo,8/19/2010 14:30,8/19/2010 14:30,14,88,TRUE,TRUE,,,FALSE,0,FALSE,0,FALSE,0
36437008-dd71-1598-70df-89e23c51eda6,client_user,IVR-2009,IVRUser,DAS_Server,88uYn1yt6ITNb3TO,,,F2CON,farm2host,domain1,TroubleManagementService,10.06,query,Billling.System.1.Dispatch,GetWipInfo,8/19/2010 14:30,8/19/2010 14:30,14,83,TRUE,FALSE,,,FALSE,0,FALSE,0,FALSE,0
b38a1308-698c-c2ee-05cb-0c0c22524e57,client_user,IVR-2009,IVRUser,DAS_Server,YKcN5dGxaYQ5L05J,,,F2CON,farm2host,domain1,TroubleManagementService,10.06,query,Billling.System.1.Dispatch,GetWipInfo,8/19/2010 14:30,8/19/2010 14:30,14,61,TRUE,TRUE,,,FALSE,0,FALSE,0,FALSE,0
36437008-dd71-1598-70df-89e23c51eda6,client_user,IVR-2009,IVRUser,DAS_Server,88uYn1yt6ITNb3TO,,,F2CON,farm2host,domain1,TroubleManagementService,10.06,query,Billling.System.1.Dispatch,GetWipInfo,8/19/2010 14:30,8/19/2010 14:30,14,54,TRUE,TRUE,,,FALSE,0,FALSE,0,FALSE,0


Config Files in custom App Directory

ls -lart

drwxr-xr-x 4 <removed>  4096 Jul 27 15:41 ..
-rw-r--r-- 1 <removed>  596 Aug 23 22:44 transforms.conf
-rw-r--r-- 1 <removed>  556 Aug 23 22:47 inputs.conf
-r--r--r-- 1 <removed>  6390 Aug 23 23:11 datetime.xml
-rw-r--r-- 1 <removed>  297 Aug 23 23:38 props.conf
drwxr-xr-x 2 <removed>  4096 Aug 23 23:38 .

more inputs.conf

disable = false
_blacklist = \.(txt|gz|\d+)$

host = <removed>
[monitor:///opt/logs/*prd/*-access.log]
sourcetype=ESPAccessLog

[monitor:///opt/logs/*.prd/*/ProvisionMitigation-10.07-FrontEndAudit.csv]
sourcetype=ProvisionMitigation-10.07-FrontEnd
index=cust_esp_application

[monitor:///opt/logs/*.prd/*/ServicesService-10.07-FrontEndAudit.csv]
sourcetype=ServicesService-10.07-FrontEnd
index=cust_esp_application

[monitor:///opt/logs/*.prd/*/HttpConnectorService-10.07-BackEndAudit.csv]
sourcetype=HttpCon-10.07-BackEnd
index=cust_esp_application

more props.conf

[HttpCon-10.07-BackEnd]
KV_MODE = none
MAX_TIMESTAMP_LOOKAHEAD = 1000
REPORT-AutoHeader = AutoHeader-1
TIME_PREFIX = ,(?=\d+/\d+/\d{4} \d\d:\d\d)
MAX_TIMESTAMP_LOOKAHEAD = 1000
SHOULD_LINEMERGE = False
MUST_BREAK_AFTER = <\n>
DATETIME_CONFIG = /opt/instance/splunk/etc/apps/esp/local/datetime.xml

more transforms.conf

[AutoHeader-1]
DELIMS = ","
FIELDS = "Message ID", "WS-Security ID", "Source System ID", "Source System User ID", "Source Server ID", "Tracking ID", "Transaction Key", "Market", "Farm Name", "Server Node Name", "Managed Node Name", "Service Name", "Service Version", "Service Operation Name", "Back End Name", "API Name", "Start Date/Time", "End Date/Time", "Hour of Day", "Response Time", "Cache Hit", "Fault", "Fault Code", "Fault Message", "Connection Timeout", "Connection Timeout Retries", "Socket/Read Timeout", "Socket/Read Timeout Retries", "Internal Timeout", "Internal Timeout Retries"
Tags (2)
0 Karma
1 Solution

Stephen_Sorkin
Splunk Employee
Splunk Employee

First, in inputs.conf, you set the sourcetype to be "HttpCon-10.07-BackEnd" but your props.conf configures "HttpCon-10.07-BackEnd-1". What do searches report the sourcetype as? If they are different, then the SHOULD_LINEMERGE = False will have no effect.

Next, that line wouldn't necessarily matter if timestamps are properly extracted. Are the timestamps correct in each line? If not, you may have to set a TIME_PREFIX regex to properly locate each timestamp in each line.

View solution in original post

0 Karma

Stephen_Sorkin
Splunk Employee
Splunk Employee

First, in inputs.conf, you set the sourcetype to be "HttpCon-10.07-BackEnd" but your props.conf configures "HttpCon-10.07-BackEnd-1". What do searches report the sourcetype as? If they are different, then the SHOULD_LINEMERGE = False will have no effect.

Next, that line wouldn't necessarily matter if timestamps are properly extracted. Are the timestamps correct in each line? If not, you may have to set a TIME_PREFIX regex to properly locate each timestamp in each line.

0 Karma

brianirwin
Path Finder

Stephen, thank you for all the input. Now that we have updated the Splunk 4 on some of our LW Forwarders the files are working great.

It also seems like my configs that were not being pulled were due to a combination of functions not in the 3.4.10 forwarder, and also having multiple regex matching files in a single directory causing issues.

Again thank you for your help 🙂

0 Karma

brianirwin
Path Finder

Stephen,

I agree re: my props.conf must not be getting linked incorrectly, clearly the inputs.conf is parsing (files are being picked up, sourcetype correct etc.), but my link to props.conf must be wrong. I removed all references from /system config files, and moved them all to the app directory to hopefully reduce confusion.

0 Karma

Stephen_Sorkin
Splunk Employee
Splunk Employee

First, for your regex, it should consume the "," as the TIME_PREFIX must take you up to the time format itself. However, the more real question is why the lines are still sticking together even if you set SHOULD_LINEMERGE = false. Are you certain that this is set for the relevant stanza? If it is, then this configuration is just not getting referenced.

0 Karma

brianirwin
Path Finder

Stephen,

I dropped your suggested string into my props.conf and had no success, I went back and blew away and other references to HTTPCon- in my props.confs leaving just the one in my custom app directory but the only timestamps I am seeing are NONE.

It would be much easier if I could just say (?im)^(?:[^,],){16}(?P[^,])(?=,) but that is not to be.

For a similar file I tried utilizing 'splunk train dates ', but I had no luck with that either.

I think I need to go back to basics, I must be missing something fundamental in my configs, not even should_linemerge is working.

0 Karma

Stephen_Sorkin
Splunk Employee
Splunk Employee

The best technique for TIME_PREFIX in this case is the positive lookahead assertion. For example: "TIME_PREFIX = ,(?=\d+/\d+/\d{4} \d\d:\d\d)". In other terms, take us to just before something that looks like the date.

0 Karma

brianirwin
Path Finder

Hey Stephen,
The queries show the sourcetype=HttpCon-10.07-BackEnd. A very good catch re: configs, I had worked on the -1 that was auto generated, and did not notice that there was a "HttpCon-10.07-BackEnd" I was not updating.

As for timestamps, they are not reading correctly, they all show as NONE via SplunkWeb.

My Regex is not very strong, to hit the field prior to the comma and date I can use "(?im)^(?:[^,],){15}(?P[^,])(?=,)", but as I read the DOC for TIME_PREFIX I need to land on that comma and that is not working. Any additional help you can offer would be appreciated.

0 Karma

gkanapathy
Splunk Employee
Splunk Employee

It would be helpful to see some of these lines, and also if you would indicate whether you are forwarding data or it is being read locally.

0 Karma
Get Updates on the Splunk Community!

Harnessing Splunk’s Federated Search for Amazon S3

Managing your data effectively often means balancing performance, costs, and compliance. Splunk’s Federated ...

Infographic provides the TL;DR for the 2024 Splunk Career Impact Report

We’ve been buzzing with excitement about the recent validation of Splunk Education! The 2024 Splunk Career ...

Enterprise Security Content Update (ESCU) | New Releases

In December, the Splunk Threat Research Team had 1 release of new security content via the Enterprise Security ...