Getting Data In

How to combine year, month, day from a filename and contain the exact time of the event in nanoseconds?

aholzer
Motivator

I need to ingest a file that contains the year, month, and day in the filename, while also containing the exact time of the event (since midnight) within the day inside the file, in a field called nanos. I'm trying to figure out the best way to get a combination of these two pieces of information to get the correct _time field.

Sample file:
Filename: test_csv_parsing_20161011.txt

sequenceNumber,nanos,msgType,poolId,accountId,fixMsgType,fixData
29585650,62733712723932,'*',zzzz,zzzz,54,''

Given the following:

nanos => seconds => hours:mins:secs.nanos
62733712723932 => 62733.712723932 => 17:25:33.712723932

Expected results:

_time = 2016/10/11 17:25:33.712723932

I have already tried the following:

TIME_FORMAT=%s%9N
TIMESTAMP_FIELDS=nanos

Based on what it says in the configure timestamps Splunk docs, it should take the date from the filename if it can only find the time inside the event: "4. If no events in a source have a date, Splunk software tries to find a date in the source name or file name. Time of day is not identified in filenames. (This requires that the events have a time, even though they don't have a date.)"

But I get the following warnings in the "adddata/datapreview" dashboard: "The TIME_FORMAT specified is matching timestamps (Mon Oct 17 07:07:52 2168) outside of the acceptable time window. If this timestamp is correct, consider adjusting MAX_DAYS_AGO and MAX_DAYS_HENCE." & "Failed to parse timestamp. Defaulting to file modtime."

It's probably because it's giving the %s priority over the %9N in the TIME_FORMAT. If it first captured the last 9 digits as nanoseconds, and then used the rest as seconds this should work.

Can anybody provide guidance?

1 Solution

jrballesteros05
Communicator

I did something similar but it was a little bit tricky.

In my case the source field is "/root/test_csv_parsing_20161011.txt" then I extracted a new field called newdate from metadata "source" like this in the props.conf:

EXTRACT-newdate=\/.*?\/.*?\_(?P<newdate>\d+)\..* in source

Then you can concat the fields and send to another index with "collect" command with the new _time field:

index=proof | eval seconds=round(nanos*0.000000001) |eval newseconds=strftime(seconds,"%H%M%S") | eval newtimestamp=newdate.newseconds | eval c_time=strptime(newtimestamp,"%Y%m%d%H%M%S") |eval _time=c_time | eval _raw=_time + ": " + _raw | collect index=mynewindex 

Maybe this is not the cleanest way to do it but I'm sure it can help you.

View solution in original post

tchen_splunk
Splunk Employee
Splunk Employee

You'll need to customize datetime.xml to parse out the date from filename
https://answers.splunk.com/answers/172535/extract-date-time-from-source-path-with-custom-dat.html

aholzer
Motivator

Thanks @tchen_splunk - I'll give this solution a try and mark it correct if it works.

0 Karma

aholzer
Motivator

Unfortunately this doesn't seem to work.

A bug report has been submitted to follow up.

0 Karma

moaf13
Path Finder

I tried all the answers online and my own way of fix things, i ended up coming up with nothing that works. I ended up using eval to extract the filedate from source.

FYI

I don't know how this line worked for him
EXTRACT-newdate=\/.?\/.?_(?P\d+)..* in source

but i had to go to transforms.conf to get that working.

props.conf
REPORT-filedate= tranName

transforms.conf
[tranName]
blah blah

0 Karma

jrballesteros05
Communicator

I did something similar but it was a little bit tricky.

In my case the source field is "/root/test_csv_parsing_20161011.txt" then I extracted a new field called newdate from metadata "source" like this in the props.conf:

EXTRACT-newdate=\/.*?\/.*?\_(?P<newdate>\d+)\..* in source

Then you can concat the fields and send to another index with "collect" command with the new _time field:

index=proof | eval seconds=round(nanos*0.000000001) |eval newseconds=strftime(seconds,"%H%M%S") | eval newtimestamp=newdate.newseconds | eval c_time=strptime(newtimestamp,"%Y%m%d%H%M%S") |eval _time=c_time | eval _raw=_time + ": " + _raw | collect index=mynewindex 

Maybe this is not the cleanest way to do it but I'm sure it can help you.

aholzer
Motivator

@jrballesteros05

Thanks for your reply, but I was hoping to get the _time field properly extracted from the very start. I will attempt @tchen_splunk 's solution, and if that doesn't work I'll give your search time solution a try.

Thanks again.

0 Karma

maciep
Champion

what does your entire props stanza look like for this sourcetype? are you using indexed_extractions?

0 Karma

aholzer
Motivator

Yes I am

[ csv ]
CHARSET=UTF-8
INDEXED_EXTRACTIONS=csv
KV_MODE=none
SHOULD_LINEMERGE=false
category=Structured
description=Comma-separated value format. Set header and other settings in "Delimited Settings"
disabled=false
pulldown_type=true
TIME_FORMAT=%s%9N
FIELD_NAMES=sequenceNumber,nanos,msgType,poolId,accountId,fixMsgType,fixData
TIMESTAMP_FIELDS=nanos
0 Karma

maciep
Champion

I think the problem is that %s represents Epoch time, not seconds since midnight.

 %s The Unix Epoch Time timestamp, or the number of seconds since the Epoch: 1970-01-01 00:00:00 +0000 (UTC). (1352395800 is Thu Nov 8 09:30:00 2012)

So Splunk is grabbing 10 digits for epoch, ie 6273371272. And that resolves to GMT: Mon, 17 Oct 2168 11:07:52.

I don't think there is a common time format variable that represents seconds since midnight.

0 Karma

maciep
Champion

oh and is that configured on your forwarder or indexer or elsewhere?

0 Karma
Get Updates on the Splunk Community!

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...

New in Observability Cloud - Explicit Bucket Histograms

Splunk introduces native support for histograms as a metric data type within Observability Cloud with Explicit ...