Solved: Why are timestamps parsed correctly for only one o...

flle · ‎08-05-2015

I stumbled across an interesting issue and need some advice / hints here.

I have two sourcetypes where I need some time_format and time_prefix mangling to correctly parse the time stamps.
When setting up the new data input (batch input for files) I did the configs in props.conf on the universal forwarder for the first input, and after some regex tuning for time_prefix, it worked fine.
For the second input however, I could not get it to work. The difference is, that I have indexed_extractions for the first input.
I then remembered that time_format & time_prefix are done in the parsing phase and thus can only be done on the indexers or heavy forwarders (see also: http://wiki.splunk.com/Where_do_I_configure_my_Splunk_settings).

So now I am confused on whether time_format & time_prefix also work on a universal forwarder and I am just having an error in my second props.conf I am not seeing or that Splunk miraculously fixed the time_stamp extraction on its own regardless of my props.conf changes :-).
Or time_* works in conjunction with indexed_extractions??

Log sample input 1: (desired timestamp is bold)

"hostname_10.1.2.3_**2015-07-22T15_01_43Z**","HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\Run","2014-05-28T10:28:33Z","expand_sz","SysTrayApp","C:\Program Files\IDT\WDM\sttray64.exe","TRUE","FALSE","2013-11-06T22:07:24Z","2014-05-28T09:03:46Z","2014-05-28T09:03:46Z","C:\Program Files\IDT\WDM\sttray64.exe","1703424","IDT, Inc.","IDT PC Audio","1.0.6496.0","IDT PCA","Copyright &#169; 2004 - 2009 IDT, Inc.","sttray64.exe","IDT PC Audio","1.0.6496.0","FALSE","FALSE","TRUST_E_NOSIGNATURE","The file is not signed","","","1f918ddae59e246b8f48ce5aa400b3aa","8896809e855ae08b43e41b25a6bdca8ed1905bbfc59e7b779070eaa0bbc1b319"

Log sample input 2: (desired timestamp is bold)

"05.08.2015 10:22:36";"3";"Network connection detected:;SequenceNumber: 161522;UtcTime: **05.08.2015 08:17:49.149 AM**;ProcessGuid: {6B887E38-96AB-55AC-0000-0010EB030000};ProcessId: 4;Image: System;User: NT-AUTORIT\xC4T\SYSTEM;Protocol: udp;Initiated: false;SourceIsIpv6: false;SourceIp: 10.1.2.3;SourceHostname: ;SourcePort: 137;SourcePortName: netbios-ns;DestinationIsIpv6: false;DestinationIp: 10.1.2.3;DestinationHostname: myhostnamet;DestinationPort: 137;DestinationPortName: netbios-ns"

**Inputs.conf**
[batch://d:\Splunk\sysmon\]
disabled = 0
sourcetype = sysmon
move_policy = sinkhole
index=testing

[batch://d:\Splunk\regdump\]
disabled = 0
sourcetype = regdump
move_policy = sinkhole
crcSalt = <SOURCE>
index=testing

props.conf
[regdump]
INDEXED_EXTRACTIONS = CSV
TIME_FORMAT = %Y-%m-%dT%H_%M_%S%Z
TIME_PREFIX = ^([^_]*_){2}

[sysmon]
TIME_FORMAT = %d.%m.%Y %I:%M:%S.%3N %p
TIME_PREFIX = ^([^;]*;){4}UtcTime:\s+

[source::...sysmon*csv]
TZ = UTC

The regdump sourcetyp was the first I integrated and Splunk, by default, extracted the timestamp further down in the event (2013-11-06T22:07:24Z). After I configured a matching time_prefix regex and time_format, the desired timestamp is now extracted.
For the sysmon sourcetype however, it does not work.

So what is the deal here? What am I missing?

Thanks for any hints.

flle · ‎11-02-2015

woodcock, thanks for the update. I did figure out the issue in the meantime and there was more to it, hence I add an answer myself 🙂
If you foward structured data from a forwarder to an indexer, the indexer does NOT parse those events again (parsing, aggregation and typing queues are skipped). See the "Caveats" section here: http://docs.splunk.com/Documentation/Splunk/6.2.6/Data/Extractfieldsfromfileheadersatindextime

In my case, INDEXED_EXTRACTIONS on the Universal Forwarder transform data to structured date, so the indexer does ignore any props or transforms on the indexers for this data.
For the timestamp issue I could actually get around that with adding TIMESTAMP_FIELDS on the forwarder, but only if Splunk can auto-identify the time format.
As the parsing capabilities of a universal forwarder are limited to the parsing functions in the INPUT phase (see http://wiki.splunk.com/Where_do_I_configure_my_Splunk_settings%3F), and data is not being parsed again on the indexer when using indexed extractions on the UF this puts some constraints on my overall parsing capabilities. I basically loose all capabilities of the PARSING Phase.

Conclusion: When using INDEXEC_EXTRACTIONS on a UF, be sure that you can achieve all desired parsing with the capabilities of the INPUT phase. Otherwise you have to use a Heavy Forwarder and do all the parsing there. Or with a UF forward unparsed data from the UF to the indexer and to all the parsing on the indexer.

View solution in original post

flle · ‎11-02-2015

woodcock, thanks for the update. I did figure out the issue in the meantime and there was more to it, hence I add an answer myself 🙂
If you foward structured data from a forwarder to an indexer, the indexer does NOT parse those events again (parsing, aggregation and typing queues are skipped). See the "Caveats" section here: http://docs.splunk.com/Documentation/Splunk/6.2.6/Data/Extractfieldsfromfileheadersatindextime

In my case, INDEXED_EXTRACTIONS on the Universal Forwarder transform data to structured date, so the indexer does ignore any props or transforms on the indexers for this data.
For the timestamp issue I could actually get around that with adding TIMESTAMP_FIELDS on the forwarder, but only if Splunk can auto-identify the time format.
As the parsing capabilities of a universal forwarder are limited to the parsing functions in the INPUT phase (see http://wiki.splunk.com/Where_do_I_configure_my_Splunk_settings%3F), and data is not being parsed again on the indexer when using indexed extractions on the UF this puts some constraints on my overall parsing capabilities. I basically loose all capabilities of the PARSING Phase.

Conclusion: When using INDEXEC_EXTRACTIONS on a UF, be sure that you can achieve all desired parsing with the capabilities of the INPUT phase. Otherwise you have to use a Heavy Forwarder and do all the parsing there. Or with a UF forward unparsed data from the UF to the indexer and to all the parsing on the indexer.

woodcock · ‎11-02-2015

In addition to the other 2 answers (both important), I do not see that you a TIMESTAMP_FIELDS= line. You need to add this to the [regdump] stanza (and deploy to the forwarder and restart splunk there) and it should work fine.

woodcock · ‎08-05-2015

When you use INDEXED_EXTRACTIONS then your Universal Forwarder acts more like a Heavy Forwarder for this input in that some of the Indexing work is now done on the Forwarder instead of the Indexers which necessitates that you deploy your props.conf file to your Forwarder. But you still also need to deploy it to your Indexers so that your normal timestmaping functions (which have not moved) can be done properly. So put your props.conf both on your Forwarders and your Indexers and restart all the Splunk instances there and it should work.

somesoni2 · ‎08-05-2015

Yes the event breaking and the timestamp parsing happens only on INdexers /Heavy forwarder. Your first log got lucky as the date format there is matching one of Splunk's default parsed time format. Have the configuration moved to your indexers and try again.

Why are timestamps parsed correctly for only one of two inputs? Do TIME_FORMAT & TIME_PREFIX work on a Universal Forwarder?

Building Reliable Asset and Identity Frameworks in Splunk ES

Cloud Monitoring Console - Unlocking Greater Visibility in SVC Usage Reporting

Automatic Discovery Part 3: Practical Use Cases

Are you a member of the Splunk Community?

Why are timestamps parsed correctly for only one of two inputs? Do TIME_FORMAT & TIME_PREFIX work on a Universal Forwarder?

Building Reliable Asset and Identity Frameworks in Splunk ES

Cloud Monitoring Console - Unlocking Greater Visibility in SVC Usage Reporting

Automatic Discovery Part 3: Practical Use Cases