Getting Data In

Timestamp extraction from filename not working as specified in docs?

Contributor

Another project group has an XML based sourcetype that I've created the definition (its own app) for called "openscap". These are result files from each server's OpenSCAP tool output run. Unfortunately they're roughly 3MB in size each and around 22,000 lines in each file, thus "event". Despite this, while somewhat unwieldy, the parsing does work for their purposes. The problem came in in that they wanted to time-stamp the events with a "start-time=" entry from each file, which doesn't appear until roughly line 18,921.

While I can certainly create the proper TIMEFORMAT and TIMEPREFIX in props.conf easily enough, even if Splunk would permit a value in the millions for MAXTIMESTAMPLOOKAHEAD in order to reach the 18,921st line of each file, performance-wise, this seems discouraged.

So, instead, I've had them generate the result files with the format "hostname-xxxxxxxx.xml", where XML has been (throughout my testing today) alternatively: straight epoch time (10 digits), epoch plus milliseconds (13 digits) and then finally YYYYMMDDHHMMSS.

Despite me modifying the props.conf entry so that it cannot possibly locate the datetime in the event/file itself (which should be trying to pull it from the "source" path, including file, according to the specs in datetime.xml and the time extraction docs), Splunk has never timestamped on any of these entries. Instead, it's always timestamped with the ingest time.

Looking through datetime.xml, the epoch plus millisecond format is specifically addressed even in the default datetime.xml file. Why this wasn't captured I cannot understand. I've tried implementing a custome datetime.xml for both 10 digit epoch and YYYYMMDDHHMMSS, following the examples documented in MANY answers.splunk.com posts about custom datetime.xml setups. No luck.

A DEBUG run of Splunk on both the Universal Forwarder that's sending these logs and on the four 4 Indexers it's being sent to shows that it's parsing the custom datetime.xml file fine, but it always comes back with the Aggregator stating that no match can be found on the source, which makes no sense.

I'll be happy to post the exact config contents as they stand on my last run/attempts when I get back into the office tomorrow, but if anyone has any suggestions beforehand, I am ALL EARS.

Thanks in advance.

Apologies for the formatting in the paragraphs above. Splunk Answers hates SOMETHING in that text and was refusing to let me post it. After about 30 minutes of trying, this was the only way I could get it submitted.

0 Karma

Engager

Hello,

In my case filenames as Incidentes.YYYYMMDD.csv to study backlogs with daily granularity.

I have solved looking for a field with date an hour to obtain only the hour. TIMEPREFIX remove the date and TIMEFORMAT obtain the hour, so we jump to step 5 of http://docs.splunk.com/Documentation/SplunkCloud/6.6.1/Data/HowSplunkextractstimestamps

[csvbacklog]
DATETIME
CONFIG =
FIELDDELIMITER = tab
INDEXED
EXTRACTIONS = csv
KVMODE = none
MAX
DAYSHENCE =
**MAX
DIFFSECSAGO = 86400**
NOBINARYCHECK = true
SHOULDLINEMERGE = false
TIMESTAMP
FIELDS = your field with date and hour
TIMEFORMAT = %H:%M:%S
TIME
PREFIX = \d{1,2}/\d{1,2}/\d{2,4}

category = Custom
description = Comma-separated value format. Set header and other settings in "Delimited Settings"
disabled = false
pulldown_type = true

I hope it can help you.

0 Karma

Esteemed Legend

You don't need TIME_FORMAT = %Y%m%d%H%M%S but other than that, your configurations look just fine. Have you Deployed props.conf and datetime.xml to your Indexers and restarted all splunk instances there? If so then perhaps you are expecting that previously-indexed events will be effected but this is not the case; only those events that are indexed after the Indexer restarts will be modified.

0 Karma

Contributor

Yes, as I stated, the Indexers were all restarted each time I made any config change to either props.conf or datetime.xml (that's where the openscap app resides, on the indexers).

The Forwarder shouldn't need anything other than the inputs.conf correct? That's the case with every other input being sent from it. All special index time handling of the events is done on the Indexers is my understanding, so the props.conf and datetime.xml are unnecessary on the forwarder, right?

I'll try removing TIME_FORMAT, but if that shouldn't be affecting it either way, then I'm at a loss. It certainly is NOT working. I'm not expecting anything to be applied to events already in Splunk, no. It's the case that all new files we're ingesting on the forwarder still are not being timestamped properly. They are always coming in with the timestamp equal to when they were ingested... so the datetime field in the filename is clearly being ignored.

Any other ideas, or should I just open a support ticket?

0 Karma

Path Finder

Hi tmeader - did you get an answer to this ? did you go through support ?

I am having a very similar problem https://answers.splunk.com/answers/320978/how-to-extract-the-timestamp-from-a-filename-at-in.html and wondering if you found a solution ?

Thanks,
Ash

0 Karma

Esteemed Legend

Everything that you said is correct unless you are using Heavy Forwarders. A support case is a good idea at this point.

0 Karma

Esteemed Legend

You need to post your props.conf and datetime.xml files (also input.conf might help). We also need to know where you have your datetime.xml file (as in full path) and where you have deployed these files as well as whether you have restarted the Splunk instances on those servers.

0 Karma

Contributor

Okay, stepping through:

The inputs.conf on the Universal Forwarder contains the entry:

[monitor:///log/openscap/*.xml]
disabled = false
host_regex = openscap\/([^\/]*)-\d+\.xml
ignoreOlderThan = 2d
sourcetype = openscap
crcSalt = <SOURCE>
index = openscap

The props.conf (which is in an app called "openscap", under SPLUNK_HOME/etc/apps/openscap) on each of the Indexers contains:

[openscap]
SHOULD_LINEMERGE = true
BREAK_ONLY_BEFORE = ^<?xml version="1.0" encoding="UTF-8"?>\n^<Benchmark
MAX_EVENTS = 100000
DATETIME_CONFIG = /etc/apps/openscap/default/datetime.xml
MAX_TIMESTAMP_LOOKAHEAD = 20
TIME_FORMAT = %Y%m%d%H%M%S
TZ = UTC

The datetime.xml file is in the location cited above (which is also where the props.conf resides). Here are its contents:

<!--   Version 4.0 -->

<!-- datetime.xml -->
<!-- This file contains the general formulas for parsing date/time formats. -->

<datetime>

<define name="_openscapdate" extract="year, month, day, hour, minute, second">
        <text><![CDATA[source::.*?(\d{4})(\d{2})(\d{2})(\d{2})(\d{2})(\d{2})]]></text>
</define>

<timePatterns>
      <use name="_openscapdate"/>
</timePatterns>
<datePatterns>
      <use name="_openscapdate"/>
</datePatterns>

</datetime>

Note - I've also tried variations on that CDATA line of the following:

<text><![CDATA[source::.*?(\d{4})(\d{2})(\d{2})(\d{2})(\d{2})(\d{2})\.xml]]></text>
<text><![CDATA[source::.*?\-(\d{4})(\d{2})(\d{2})(\d{2})(\d{2})(\d{2})\.xml]]></text>
<text><![CDATA[source::.*?\-(\d{4})(\d{2})(\d{2})(\d{2})(\d{2})(\d{2})]]></text>

To no avail. Likewise, when I was attempting to use a 10 digit epoch, datetime.xml looked like this:

<!--   Version 4.0 -->

<!-- datetime.xml -->
<!-- This file contains the general formulas for parsing date/time formats. -->

<datetime>

<define name="_openscapdate" extract="utcepoch">
        <text><![CDATA[source::.*?\-(\d{10})\.xml]]></text>
</define>

<timePatterns>
      <use name="_openscapdate"/>
</timePatterns>
<datePatterns>
      <use name="_openscapdate"/>
</datePatterns>

</datetime>

Again, multiple matching variations for the regex were tried, as in the other version, but I won't list them out again.

As for what was restarted, the forwarder was restarted of course, and all the indexers were restarted after each of these changes, yes.

0 Karma

Contributor

Despite me modifying the props.conf entry so that it cannot possibly locate the datetime in the event/file itself (which should be trying to pull it from the "source" path, including file, according to the specs in datetime.xml and the time extraction docs), Splunk has never timestamped on any of these entries. Instead, it's always timestamped with the ingest time.

0 Karma