Getting Data In
Highlighted

How do I keep date fields from doubling at ingestion?

Explorer

So I'm working on a project where i'm ingesting csv files.

These file's time stamp can't be read until I pass the:

Timestamp format = %Y-%m-%dT%H:%M:%S.%z
Timestamp fields = time (which isn't in the csv, but shows up in the "Set Source Type" page of "Add Data" )

Without it I get the errors:
- Could not use strptime to parse timestamp from "2020 february tuesday 11 3 2 local 18 2020-02-18T11:03:02.000-0600".
- Could not use regex to parse timestamp from "11 3 2 local 18 2020-02-18T1".

my problem is that after ingestion i get double fields of every time related field.
i.e.
dateyear = 2020
date
year = 2020
datemonth = february
date
month = february
datewday = tuesday
date
wday = tuesday
datemday = 18
date
mday = 18
datehour = 11
date
hour = 11
dateminute = 3
date
minute = 3
datesecond = 6
date
second = 6
datezone = local
date
zone = local

It still shows up in the fields sidebar as just single fields, but when I add them as an interesting field they double in the events page or when I list them using a table command.

I've tried also setting Extraction to current time in "Set Source Type" page of "Add Data", but I still get double.

*sample of when I table the date fields
alt text

*sample of what the field dates look like in events
alt text

0 Karma
Highlighted

Re: How do I keep date fields from doubling at ingestion?

Esteemed Legend

I don't know what is causing this HOWEVER I do know that these fields are FAR less "interesting" than you think that they are. These fields are for DEBUGGING purposes; not for general use. They are leftovers of the timesamping and timezoning process and are PRE-NORMALIZED meaning that they are NOT for general use. If you do not believe me then run this search and see for yourself:

index=* OR index=_* date_hour=*
| eval real = strftime(_time, "%A %B %d %T %Y %z")
| rex field=real "^(?<real_wday>\S+)\s+(?<real_month>\S+)\s+(?<real_mday>\S+)\s+(?<real_hour>\d+):(?<real_minute>\d+):(?<real_second>\d+)\s+(?<real_year>\S+)\s+(?<real_zone>\S+)"
| table real* date*
| foreach  real* date* [ eval <<FIELD>> = lower(<<FIELD>>)]
| stats count count(eval(real_wday!=date_wday)) AS diff_wday count(eval(real_month!=date_month)) AS diff_month count(eval(real_mday!=date_mday)) AS diff_mday count(eval(real_hour!=date_hour)) AS diff_hour count(eval(real_minute!=date_minute)) AS diff_minute count(eval(real_second!=date_second)) AS diff_second count(eval(real_year!=date_year)) AS diff_year count(eval(real_zone!=date_zone)) AS diff_zone

Just think about it for a second. If you have a different personal Time zone setting than somebody else and you both run the same search for "yesterday", you are GUARANTEED to get different answers because the events brought back will be different AND because you are using date_* fields that are pre-normalized. NEVER use the date_* fields; always calculate your own.

Re: How do I keep date fields from doubling at ingestion?

Explorer

Figured it out.

Seems I wasn't given a raw log file, but an extracted table of a sourcetype from and index. So it included the fields that Splunk generates on ingestion. Once I went through the log and deleted those fields it solved my problem.

Thanks @woodcock for your take regarding this, it tipped me in the right direction.

View solution in original post

0 Karma