Splunk Search

How to extract the timestamp from a filename at index-time to use as _time?

Path Finder

I have searched answers high & low to try and extract the timestamp from my filename at index-time, but I'm still unable to get the timestamp from the filename used as _time.

Summary:
- The filename contains a timetamp in %Y%m%d%H%M format (myfile_201510210345.txt)
- While the events in the file do contain a date, this is not the date I want to use for the timestamp.
- Sample data: [IP|1.2.3.4/32|proxy|75|||2015/10/19|some server hostname: server]

I have read the blog post as well as numerous other answers regarding custom datetime.xml usage, but I still cannot seem to crack this nut - each time it indexes using the server time (last resort in the timestamp handling). I have tried adding these definitions to a copy of the original datetime.conf as well as creating a blank datetime.conf with only these definitions in it - neither worked.

My props.conf

[mysource]
DATETIME_CONFIG = /etc/apps/myapp/local/datetime.xml
FIELD_DELIMITER = |
FIELD_NAMES = F1,F2,F3,F4,F5,F6,F7,F8
TIME_FORMAT = %Y%m%d%H%M
TZ = UTC
NO_BINARY_CHECK = true
SHOULD_LINEMERGE = false

My datetime.xml modifications (attempt 1 - extracting date & time together)

<define name="_mydatetime" extract="year, month, day, hour, minute">
        <text><![CDATA[(?:^|source:).*?.*?(20\d\d)(0\d|1[012])([012]\d|3[01])([01]\d|2[0123])([0-6]\d)]]></text>
</define>

<timePatterns>
  <use name="_mydatetime"/>
</timePatterns>

<datePatterns>
  <use name="_mydatetime"/>
</datePatterns>

My datetime.xml modifications (attempt 2 - extracting date & time separately after reading another answers example)

<define name="_mydate" extract="year, month, day">
        <text><![CDATA[(?:^|source:).*?.*?(20\d\d)(0\d|1[012])([012]\d|3[01])(?:[01]\d|2[0123])(?:[0-6]\d)]]></text>
</define>

<define name="_mytime" extract="hour, minute">
        <text><![CDATA[(?:^|source:).*?.*?(?:20\d\d)(?:0\d|1[012])(?:[012]\d|3[01])([01]\d|2[0123])([0-6]\d)]]></text>
</define>

<timePatterns>
  <use name="_mydate"/>
</timePatterns>

<datePatterns>
  <use name="_mytime"/>
</datePatterns>

There has been a lot of answers questions about datetime.xml and also about date / time from filenames - but there does not appear to be a lot of definitive answers. Can anyone who is successfully extracting date / time stamp from the filename provide a working example of using the source at index time for this timestamp, I have been trying to get this working on & off for a long time now.

Thanks,
Ash

1 Solution

Path Finder

So I logged a support case [283416] for this problem and unfortunately for me & everyone following along here - extracting a full timestamp from a filename is not currently supported in Splunk.

You can capture the date from the filename, but not the time - so in essence each line in the file needs to have a time record.

There apparently is several other customers that have requested this feature and I have been added to the bottom of that enhancement request.

I was hoping that I wouldnt have to modify the data before indexing it, but the way it looks like im going to have to get around it is with some pre-processing of the log.

This will append the timestamp of the filename to the front of each line of the file, with a pipe "|" seperator - at least this will index with automatic timestamp extraction, without having to define any time format strings.

find . -name '*201510210345.txt' -type f -print | xargs sed -i 's/^/201510210345\|/'

View solution in original post

Splunk Employee
Splunk Employee

This is possible in Splunk Enterprise 7.2, making use of the new ingest-time eval. Full documentation is at https://docs.splunk.com/Documentation/Splunk/latest/Data/IngestEval.

Example

File Name: Log_I15_13092018183001.txt
File Name Format: Log_I15_%d%m%Y%H%M%S.txt

props.conf

[mysourcetype]
TRANSFORMS=timestampeval

transforms.conf

[timestampeval]
INGEST_EVAL = _time=strptime(replace(source,".*(?=/)/",""),"Log_I15_%d%m%Y%H%M%S.txt")

This takes the "source" metadata value (which is the path and file name), removes the path, then extracts the date and time from the filename.

All events in the file will have the same _time when imported.

SplunkTrust
SplunkTrust

I noticed in the last attempt that the names timepatterns and datepatterns are swapped.

0 Karma

Path Finder

So I logged a support case [283416] for this problem and unfortunately for me & everyone following along here - extracting a full timestamp from a filename is not currently supported in Splunk.

You can capture the date from the filename, but not the time - so in essence each line in the file needs to have a time record.

There apparently is several other customers that have requested this feature and I have been added to the bottom of that enhancement request.

I was hoping that I wouldnt have to modify the data before indexing it, but the way it looks like im going to have to get around it is with some pre-processing of the log.

This will append the timestamp of the filename to the front of each line of the file, with a pipe "|" seperator - at least this will index with automatic timestamp extraction, without having to define any time format strings.

find . -name '*201510210345.txt' -type f -print | xargs sed -i 's/^/201510210345\|/'

View solution in original post

Splunk Employee
Splunk Employee

I downvoted this post because although correct at the time, this answer is no longer accurate

0 Karma

Splunk Employee
Splunk Employee

I downvoted this post because no longer accurate with 7.2 enhancement

Esteemed Legend

First of all, if you are the author of the app, you should use default, not local. Second, you do not need the TIME_FORMAT line. Third: did you put the datetime.xml file in the correct place (does it match your DATETIME_CONFIG line)? Lastly, try this:

<define name="_mydatetime" extract="year, month, day, hour, minute">
   <text><![CDATA[source::.*?_(\d{4})(\d{2})(\d{2})(\d{2})(\d{2}).txt]]></text>
</define>
<timePatterns>
   <use name="_mydatetime"/>
</timePatterns>
<datePatterns>
   <use name="_mydatetime"/>
</datePatterns>
</datetime> 

You must deploy this to your Indexers (or Heavy Forwarder) and restart all splunk instances running there. This will only effect new data that comes in after the restarts; already-indexed data will remain broken.

0 Karma

SplunkTrust
SplunkTrust

@woodcock - Names in the XML for timePatterns and datePatterns are swapped. Any chance that is related to the issue?

0 Karma

Esteemed Legend

They are not swapped, they are shared. But the problem is that H/M/S from file is not supported (see the accepted answer). Bummer.

0 Karma

Explorer

Hello,
i have a question to this answer. Where can find the keywords of the extract attribute and how define the timePatterns and datePatterns for a unix timestamp?

Regards,
Sven

0 Karma

Esteemed Legend

Search for datetime.xml. It is not necessary but everybody uses that same filename. Learn from working examples posted on the internet.

0 Karma

Path Finder

Thanks again for your help woodcock - but this still failed

Here are all the details if you could reproduce the issue.

The Data

root@testbox:/opt/data# cat myfile_201510210345.txt
IP|10.0.0.1/32|proxy|65|||2015/10/19|proxy server 
root@testbox:/opt/data# 

props.conf and datetime.xml

root@testbox:/opt/splunk/etc/apps/myapp/local# cat props.conf 
[mysource]
DATETIME_CONFIG = /etc/apps/myapp/local/datetime.xml
FIELD_DELIMITER = |
FIELD_NAMES = F1,F2,F3,F4,F5,F6,F7,F8
TZ = UTC
NO_BINARY_CHECK = true
SHOULD_LINEMERGE = false
root@testbox:/opt/splunk/etc/apps/myapp/local# 

root@testbox:/opt/splunk/etc/apps/myapp/local# cat datetime.xml
<!--   Version 4.0 -->

<!-- datetime.xml -->
<!-- This file contains the general formulas for parsing date/time formats. -->

<datetime>

<define name="_mydatetime" extract="year, month, day, hour, minute">
        <text><![CDATA[source::.*?_(\d{4})(\d{2})(\d{2})(\d{2})(\d{2}).txt]]></text>
</define>


<timePatterns>
      <use name="_mydatetime"/>
</timePatterns>
<datePatterns>
      <use name="_mydatetime"/>
</datePatterns>

</datetime>
root@testbox:/opt/splunk/etc/apps/myapp/local#

btool output showing full config

root@ashlubuntu:/opt/splunk/etc/apps/myapp/local# /opt/splunk/bin/splunk btool props list mysource
[mysource]
ANNOTATE_PUNCT = True
AUTO_KV_JSON = true
BREAK_ONLY_BEFORE = 
BREAK_ONLY_BEFORE_DATE = True
CHARSET = UTF-8
DATETIME_CONFIG = /etc/apps/myapp/local/datetime.xml
FIELD_DELIMITER = |
FIELD_NAMES = F1,F2,F3,F4,F5,F6,F7,F8
HEADER_MODE = 
LEARN_SOURCETYPE = true
LINE_BREAKER_LOOKBEHIND = 100
MAX_DAYS_AGO = 2000
MAX_DAYS_HENCE = 2
MAX_DIFF_SECS_AGO = 3600
MAX_DIFF_SECS_HENCE = 604800
MAX_EVENTS = 256
MAX_TIMESTAMP_LOOKAHEAD = 128
MUST_BREAK_AFTER = 
MUST_NOT_BREAK_AFTER = 
MUST_NOT_BREAK_BEFORE = 
NO_BINARY_CHECK = true
SEGMENTATION = indexing
SEGMENTATION-all = full
SEGMENTATION-inner = inner
SEGMENTATION-outer = outer
SEGMENTATION-raw = none
SEGMENTATION-standard = standard
SHOULD_LINEMERGE = false
TRANSFORMS = 
TRUNCATE = 10000
TZ = UTC
detect_trailing_nulls = false
maxDist = 100
priority = 
sourcetype = 
root@ashlubuntu:/opt/splunk/etc/apps/myapp/local# 

oneshot import

root@ashlubuntu:/opt/data# /opt/splunk/bin/splunk add oneshot myfile_201510210345.txt -sourcetype "mysource" -index "testindex" -host "myhost"
Your session is invalid.  Please login.
Splunk username: admin
Password: 
Oneshot '/opt/data/myfile_201510210345.txt' added
root@ashlubuntu:/opt/data# 

splunkd.log DEBUG output - showing it recognises the config, but fails to parse the timestamp

10-28-2015 22:56:39.543 DEBUG REST_Calls - app=search POST data/inputs/oneshot/ id=/opt/data/myfile_201510210345.txt: host -> [myhost], index -> [testindex], sourcetype -> [mysource]
10-28-2015 22:56:39.543 DEBUG AdminManager - Validating argument values...
10-28-2015 22:56:39.543 DEBUG AdminManager - Validating rule='validate(len(name) < 1024, 'Parameter "name" must be less than 1024 characters.')' for arg='name'.
10-28-2015 22:56:39.589 DEBUG FilesystemFilter - Testing path=/opt/data/myfile_201510210345.txt(real=/opt/data/myfile_201510210345.txt) with global blacklisted paths
10-28-2015 22:56:39.590 INFO  AdminManager - feedName=oneshotinput, atomUrl=services
10-28-2015 22:56:39.590 INFO  UserManager - Unwound user context: admin -> NULL
10-28-2015 22:56:39.590 DEBUG InThreadActor - this=0x7f8454016b50 waitForActorToComplete start actor=0x7f844a3fcdf0
10-28-2015 22:56:39.592 DEBUG InThreadActor - this=0x7f8454016b50 waitForActorToComplete end actor=0x7f844a3fcdf0
10-28-2015 22:56:39.593 DEBUG ArchiveContext - /opt/data/myfile_201510210345.txt is NOT an archive file.
10-28-2015 22:56:39.593 DEBUG PropertiesMapConfig - Performing pattern matching for: source::/opt/data/myfile_201510210345.txt
10-28-2015 22:56:39.593 DEBUG OneShotWriter - Got new entry in the archive: /opt/data/myfile_201510210345.txt
10-28-2015 22:56:39.593 DEBUG OneShotWriter - Will call classifier with given_type="mysource".
10-28-2015 22:56:39.593 DEBUG FileClassifierManager - Finding type for file: /opt/data/myfile_201510210345.txt
10-28-2015 22:56:39.593 DEBUG PropertiesMapConfig - Performing pattern matching for: source::/opt/data/myfile_201510210345.txt
10-28-2015 22:56:39.593 DEBUG PropertiesMapConfig - Performing pattern matching for: source::/opt/data/myfile_201510210345.txt|mysource
10-28-2015 22:56:39.593 DEBUG PropertiesMapConfig - Pattern 'mysource' matches with priority 100
10-28-2015 22:56:39.593 DEBUG PropertiesMapConfig - Pattern 'mysource' matches with priority 100
10-28-2015 22:56:39.593 DEBUG PropertiesMapConfig - Performing pattern matching for: source::/opt/data/myfile_201510210345.txt|host::myhost|mysource|
10-28-2015 22:56:39.594 DEBUG PropertiesMapConfig - Pattern 'mysource' matches with priority 100
10-28-2015 22:56:39.594 DEBUG OneShotWriter - Setting sourcetype="sourcetype::mysource" 
10-28-2015 22:56:39.594 DEBUG OneShotWriter - Setting channelKey="2" 
10-28-2015 22:56:39.594 DEBUG PropertiesMapConfig - Performing pattern matching for: source::/opt/data/myfile_201510210345.txt|host::myhost|mysource|2
10-28-2015 22:56:39.594 DEBUG PropertiesMapConfig - Pattern 'mysource' matches with priority 100
10-28-2015 22:56:39.594 DEBUG StructuredDataHeaderExtractor - Read configuration: configured=1 mode=6 HEADER_FIELD_LINE_NUMBER=0 HEADER_FIELD_DELIMITER='|' HEADER_FIELD_QUOTE='"' FIELD_DELIMITER='|' FIELD_QUOTE='"'.
10-28-2015 22:56:39.594 DEBUG OneShotWriter - Structured data configurations loaded
10-28-2015 22:56:39.594 INFO  UTF8Processor - Converting using CHARSET="UTF-8" for conf "source::/opt/data/myfile_201510210345.txt|host::myhost|mysource|2"
10-28-2015 22:56:39.594 INFO  LineBreakingProcessor - Using truncation length 10000 for conf "source::/opt/data/myfile_201510210345.txt|host::myhost|mysource|2"
10-28-2015 22:56:39.594 INFO  LineBreakingProcessor - Using lookbehind 100 for conf "source::/opt/data/myfile_201510210345.txt|host::myhost|mysource|2"
10-28-2015 22:56:39.594 DEBUG StructuredDataHeaderExtractor - Read configuration: configured=1 mode=6 HEADER_FIELD_LINE_NUMBER=0 HEADER_FIELD_DELIMITER='|' HEADER_FIELD_QUOTE='"' FIELD_DELIMITER='|' FIELD_QUOTE='"'.
10-28-2015 22:56:39.594 INFO  AggregatorMiningProcessor - Setting up line merging apparatus for: source::/opt/data/myfile_201510210345.txt|host::myhost|mysource|2
10-28-2015 22:56:39.595 DEBUG LoadDateParserRegexes - put _mydatetime regex=source::.*?_(\d{4})(\d{2})(\d{2})(\d{2})(\d{2}).txt
10-28-2015 22:56:39.595 DEBUG LoadDateParserRegexes -     * year
10-28-2015 22:56:39.595 DEBUG LoadDateParserRegexes -     * month
10-28-2015 22:56:39.595 DEBUG LoadDateParserRegexes -     * day
10-28-2015 22:56:39.595 DEBUG LoadDateParserRegexes -     * hour
10-28-2015 22:56:39.595 DEBUG LoadDateParserRegexes -     * minute
10-28-2015 22:56:39.595 INFO  DateParser - Set timezone to: UTC
10-28-2015 22:56:39.595 DEBUG AggregatorMiningProcessor - Failed to parse timestamp. Defaulting to time specified by data input. - data_source="/opt/data/myfile_201510210345.txt", data_host="myhost", data_sourcetype="mysource"

Here is the data in splunk - you can see the same time for _time and _indextime. Splunk didnt even use the file time, it reverted to the indextime.

Im wondering if this needs to be a support case, I cannot seem to get the timestamp from a filename.

alt text

hxxps://www.dropbox.com/s/f40xx78v5zctpon/mysource.PNG?dl=0

0 Karma

Esteemed Legend

I would definitely open a support case on this. The logs clearly indicate that it is using your datetime.xml file. We know the RegEx works but still the parser is failing.

0 Karma

Path Finder

One thought also sprang to mind - are we all trying to perform something out of the processing order ?

When I re-read http://docs.splunk.com/Documentation/Splunk/6.3.0/Data/Overviewofeventprocessing

When Splunk Enterprise indexes events, it:

Configures character set encoding.
Configures linebreaking for multi-line events.
Identifies event timestamps (and applies timestamps to events if they do not exist).
Extracts a set of useful standard fields such as host, source, and sourcetype.
Segments events.
Dynamically assigns metadata to events, if specified.
Anonymizes data, if specified.

To me - this indicates that splunk will apply timestamps BEFORE extracting the source field ........ so if im trying to get the timestamp FROM source before it has been extracted ...... am I trying to tear a hole in the space-time continuum ?

0 Karma

Path Finder

an ugly search time hack makes the results appear correct when returning searches, but as the indexed timestamp is incorrect, I cant actually search on the specific day, just have to search for All Time.

Adding the following to props.conf overwrites _time at search time - as I said - U.G.L.Y .....

EXTRACT-timestamp = \w_(?<srctimestamp>\d+) in source
EVAL-_time = strptime(srctimestamp, "%Y%m%d%H%M")
0 Karma

Path Finder

I have also tried this on 6.2.3 & 6.3 with no success in either version.

0 Karma

Path Finder

I ran the import while splunk was running in debug mode & it appears to pull out the config ok (I updated the regex based off some other answers questions) - confirmed it matches on regex101

10-23-2015 12:36:49.101 DEBUG FilesystemFilter - Testing path=/sourcedata/mydatafiles/myfile_201510212245.txt(real=/sourcedata/mydatafiles/myfile_201510212245.txt) with global blacklisted paths
10-23-2015 12:36:49.101 INFO  AdminManager - feedName=oneshotinput, atomUrl=services
10-23-2015 12:36:49.101 INFO  UserManager - Unwound user context: admin -> NULL
10-23-2015 12:36:49.101 DEBUG InThreadActor - this=0x7f472c417150 waitForActorToComplete start actor=0x7f4723bfcdf0
10-23-2015 12:36:49.103 DEBUG InThreadActor - this=0x7f472c417150 waitForActorToComplete end actor=0x7f4723bfcdf0
10-23-2015 12:36:49.104 DEBUG ArchiveContext - /sourcedata/mydatafiles/myfile_201510212245.txt is NOT an archive file.
10-23-2015 12:36:49.104 DEBUG PropertiesMapConfig - Performing pattern matching for: source::/sourcedata/mydatafiles/myfile_201510212245.txt
10-23-2015 12:36:49.104 DEBUG PropertiesMapConfig - Pattern 'source::....(0t|a|ali|asa|au|bmp|cg|cgi|class|d|dat|deb|del|dot|dvi|dylib|elc|eps|exe|ftn|gif|hlp|hqx|hs|icns|ico|inc|iso|jame|jin|jpeg|jpg|kml|la|lhs|lib|lo|lock|mcp|mid|mp3|mpg|msf|nib|o|obj|odt|ogg|ook|opt|os|pal|pbm|pdf|pem|pgm|plo|png|po|pod|pp|ppd|ppm|ppt|prc|ps|psd|psym|pyc|pyd|rast|rb|rde|rdf|rdr|rgb|ro|rpm|rsrc|so|ss|stg|strings|tdt|tif|tiff|tk|uue|vhd|xbm|xlb|xls|xlw)' matches with lowest priority
10-23-2015 12:36:49.104 DEBUG OneShotWriter - Got new entry in the archive: /sourcedata/mydatafiles/myfile_201510212245.txt
10-23-2015 12:36:49.104 DEBUG OneShotWriter - Will call classifier with given_type="mysource".
10-23-2015 12:36:49.104 DEBUG FileClassifierManager - Finding type for file: /sourcedata/mydatafiles/myfile_201510212245.txt
10-23-2015 12:36:49.104 DEBUG PropertiesMapConfig - Performing pattern matching for: source::/sourcedata/mydatafiles/myfile_201510212245.txt
10-23-2015 12:36:49.104 DEBUG PropertiesMapConfig - Pattern 'source::....(0t|a|ali|asa|au|bmp|cg|cgi|class|d|dat|deb|del|dot|dvi|dylib|elc|eps|exe|ftn|gif|hlp|hqx|hs|icns|ico|inc|iso|jame|jin|jpeg|jpg|kml|la|lhs|lib|lo|lock|mcp|mid|mp3|mpg|msf|nib|o|obj|odt|ogg|ook|opt|os|pal|pbm|pdf|pem|pgm|plo|png|po|pod|pp|ppd|ppm|ppt|prc|ps|psd|psym|pyc|pyd|rast|rb|rde|rdf|rdr|rgb|ro|rpm|rsrc|so|ss|stg|strings|tdt|tif|tiff|tk|uue|vhd|xbm|xlb|xls|xlw)' matches with lowest priority
10-23-2015 12:36:49.104 DEBUG PropertiesMapConfig - Performing pattern matching for: source::/sourcedata/mydatafiles/myfile_201510212245.txt|mysource
10-23-2015 12:36:49.104 DEBUG PropertiesMapConfig - Pattern 'source::....(0t|a|ali|asa|au|bmp|cg|cgi|class|d|dat|deb|del|dot|dvi|dylib|elc|eps|exe|ftn|gif|hlp|hqx|hs|icns|ico|inc|iso|jame|jin|jpeg|jpg|kml|la|lhs|lib|lo|lock|mcp|mid|mp3|mpg|msf|nib|o|obj|odt|ogg|ook|opt|os|pal|pbm|pdf|pem|pgm|plo|png|po|pod|pp|ppd|ppm|ppt|prc|ps|psd|psym|pyc|pyd|rast|rb|rde|rdf|rdr|rgb|ro|rpm|rsrc|so|ss|stg|strings|tdt|tif|tiff|tk|uue|vhd|xbm|xlb|xls|xlw)' matches with lowest priority
10-23-2015 12:36:49.104 DEBUG PropertiesMapConfig - Pattern 'mysource' matches with priority 100
10-23-2015 12:36:49.104 DEBUG PropertiesMapConfig - Pattern 'mysource' matches with priority 100
10-23-2015 12:36:49.104 DEBUG PropertiesMapConfig - Performing pattern matching for: source::/sourcedata/mydatafiles/myfile_201510212245.txt|host::server|mysource|
10-23-2015 12:36:49.104 DEBUG PropertiesMapConfig - Pattern 'source::....(0t|a|ali|asa|au|bmp|cg|cgi|class|d|dat|deb|del|dot|dvi|dylib|elc|eps|exe|ftn|gif|hlp|hqx|hs|icns|ico|inc|iso|jame|jin|jpeg|jpg|kml|la|lhs|lib|lo|lock|mcp|mid|mp3|mpg|msf|nib|o|obj|odt|ogg|ook|opt|os|pal|pbm|pdf|pem|pgm|plo|png|po|pod|pp|ppd|ppm|ppt|prc|ps|psd|psym|pyc|pyd|rast|rb|rde|rdf|rdr|rgb|ro|rpm|rsrc|so|ss|stg|strings|tdt|tif|tiff|tk|uue|vhd|xbm|xlb|xls|xlw)' matches with lowest priority
10-23-2015 12:36:49.104 DEBUG PropertiesMapConfig - Pattern 'mysource' matches with priority 100
10-23-2015 12:36:49.104 DEBUG OneShotWriter - Setting sourcetype="sourcetype::mysource" 
10-23-2015 12:36:49.104 DEBUG OneShotWriter - Setting channelKey="2" 
10-23-2015 12:36:49.104 DEBUG PropertiesMapConfig - Performing pattern matching for: source::/sourcedata/mydatafiles/myfile_201510212245.txt|host::server|mysource|2
10-23-2015 12:36:49.104 DEBUG PropertiesMapConfig - Pattern 'source::....(0t|a|ali|asa|au|bmp|cg|cgi|class|d|dat|deb|del|dot|dvi|dylib|elc|eps|exe|ftn|gif|hlp|hqx|hs|icns|ico|inc|iso|jame|jin|jpeg|jpg|kml|la|lhs|lib|lo|lock|mcp|mid|mp3|mpg|msf|nib|o|obj|odt|ogg|ook|opt|os|pal|pbm|pdf|pem|pgm|plo|png|po|pod|pp|ppd|ppm|ppt|prc|ps|psd|psym|pyc|pyd|rast|rb|rde|rdf|rdr|rgb|ro|rpm|rsrc|so|ss|stg|strings|tdt|tif|tiff|tk|uue|vhd|xbm|xlb|xls|xlw)' matches with lowest priority
10-23-2015 12:36:49.104 DEBUG PropertiesMapConfig - Pattern 'mysource' matches with priority 100
10-23-2015 12:36:49.104 DEBUG StructuredDataHeaderExtractor - Read configuration: configured=1 mode=6 HEADER_FIELD_LINE_NUMBER=0 HEADER_FIELD_DELIMITER='|' HEADER_FIELD_QUOTE='"' FIELD_DELIMITER='|' FIELD_QUOTE='"'.
10-23-2015 12:36:49.104 DEBUG OneShotWriter - Structured data configurations loaded
10-23-2015 12:36:49.104 INFO  UTF8Processor - Converting using CHARSET="UTF-8" for conf "source::/sourcedata/mydatafiles/myfile_201510212245.txt|host::server|mysource|2"
10-23-2015 12:36:49.104 INFO  LineBreakingProcessor - Using truncation length 10000 for conf "source::/sourcedata/mydatafiles/myfile_201510212245.txt|host::server|mysource|2"
10-23-2015 12:36:49.104 INFO  LineBreakingProcessor - Using lookbehind 100 for conf "source::/sourcedata/mydatafiles/myfile_201510212245.txt|host::server|mysource|2"
10-23-2015 12:36:49.104 DEBUG StructuredDataHeaderExtractor - Read configuration: configured=1 mode=6 HEADER_FIELD_LINE_NUMBER=0 HEADER_FIELD_DELIMITER='|' HEADER_FIELD_QUOTE='"' FIELD_DELIMITER='|' FIELD_QUOTE='"'.
10-23-2015 12:36:49.105 INFO  AggregatorMiningProcessor - Setting up line merging apparatus for: source::/sourcedata/mydatafiles/myfile_201510212245.txt|host::server|mysource|2
10-23-2015 12:36:49.105 DEBUG LoadDateParserRegexes - put _mydate regex=source::.*?_(\d{4})(\d{2})(\d{2})
10-23-2015 12:36:49.105 DEBUG LoadDateParserRegexes -     * year
10-23-2015 12:36:49.105 DEBUG LoadDateParserRegexes -     * month
10-23-2015 12:36:49.105 DEBUG LoadDateParserRegexes -     * day
10-23-2015 12:36:49.105 DEBUG LoadDateParserRegexes - put _mytime regex=source::.*?_\d{8}(\d{2})(\d{2})
10-23-2015 12:36:49.105 DEBUG LoadDateParserRegexes -     * hour
10-23-2015 12:36:49.105 DEBUG LoadDateParserRegexes -     * minute
10-23-2015 12:36:49.105 INFO  DateParser - Set timezone to: UTC
10-23-2015 12:36:49.105 DEBUG AggregatorMiningProcessor - Failed to parse timestamp. Defaulting to time specified by data input. - data_source="/sourcedata/mydatafiles/myfile_201510212245.txt", data_host="server", data_sourcetype="mysource"
10-23-2015 12:36:49.105 DEBUG AggregatorMiningProcessor - Failed to parse timestamp. Defaulting to time specified by data input. - data_source="/sourcedata/mydatafiles/myfile_201510212245.txt", data_host="server", data_sourcetype="mysource"
10-23-2015 12:36:49.105 DEBUG AggregatorMiningProcessor - Failed to parse timestamp. Defaulting to time specified by data input. - data_source="/sourcedata/mydatafiles/myfile_201510212245.txt", data_host="server", data_sourcetype="mysource"
10-23-2015 12:36:49.105 DEBUG AggregatorMiningProcessor - Failed to parse timestamp. Defaulting to time specified by data input. - data_source="/sourcedata/mydatafiles/myfile_201510212245.txt", data_host="server", data_sourcetype="mysource"
10-23-2015 12:36:49.105 DEBUG AggregatorMiningProcessor - Failed to parse timestamp. Defaulting to time specified by data input. - data_source="/sourcedata/mydatafiles/myfile_201510212245.txt", data_host="server", data_sourcetype="mysource"
10-23-2015 12:36:49.105 DEBUG AggregatorMiningProcessor - Failed to parse timestamp. Defaulting to time specified by data input. - data_source="/sourcedata/mydatafiles/myfile_201510212245.txt", data_host="server", data_sourcetype="mysource"

To me - the following indicate that the config was successfully loaded from datetime.xml

10-23-2015 12:36:49.105 DEBUG LoadDateParserRegexes - put _mydate regex=source::.*?_(\d{4})(\d{2})(\d{2})
10-23-2015 12:36:49.105 DEBUG LoadDateParserRegexes -     * year
10-23-2015 12:36:49.105 DEBUG LoadDateParserRegexes -     * month
10-23-2015 12:36:49.105 DEBUG LoadDateParserRegexes -     * day
10-23-2015 12:36:49.105 DEBUG LoadDateParserRegexes - put _mytime regex=source::.*?_\d{8}(\d{2})(\d{2})
10-23-2015 12:36:49.105 DEBUG LoadDateParserRegexes -     * hour
10-23-2015 12:36:49.105 DEBUG LoadDateParserRegexes -     * minute

The regex listed does match & return the fields for the following source:

source::/sourcedata/mydatafiles/myfile_201510212245.txt|host::server|mysource|2

So im really at a bit of a loss at this point 😞

0 Karma