Getting Data In

Tab Delimited Date Time Not Getting Correct Date/Time

aelliott
Motivator

I have a log file that is tab delimited. It has a field called "date" and a field called "time" next to each other.
This is the format of the fields:

2013-10-24  11:40:23

The issue I'm having is that sometimes another field such as "8.1.5"(just some random value) that does not match the TIME_FORMAT shows up in the user agent string and somehow takes control over the date and time fields, thus producing wrong date/time on those records.(in this case August 1st 2005)

I have tried TIME_PREFIX=[\t] and leaving off the TIME_PREFIX, no change.

I have tried:

TIME_FORMAT=%Y-%m-%d%n%H:%M:%S
TIME_FORMAT=%Y-%m-%d%t%H:%M:%S
TIME_FORMAT=%Y-%m-%d<tab>%H:%M:%S

Here is my props.conf:

SHOULD_LINEMERGE=false
TZ=GMT
pulldown_type=true
NO_BINARY_CHECK=1
TIME_PREFIX=[\t]
MAX_TIMESTAMP_LOOKAHEAD=500
TIME_FORMAT=%Y-%m-%d%n%H:%M:%S
REPORT-isawebw3c=isawebw3c

transforms.conf:

[isawebw3c]
DELIMS = "\t"
FIELDS="c-ip","cs-username","c-agent","sc-authenticated","date","time","s-svcname","s-computername","cs-referred","r-host","r-ip","r-port","time-taken","cs-bytes","sc-bytes","cs-protocol","cs-transport","s-operation","cs-uri","cs-mime-type","s-object-source","sc-status","s-cache-info","rule","FilterInfo","cs-Network","sc-Network","error-info","action","GMT-Time","AuthenticationServer"
Tags (2)
0 Karma
1 Solution

richgalloway
SplunkTrust
SplunkTrust

Have you considered using REGEX in your transforms.conf? Something like:

REGEX=(?<c-ip>[^\t]?)\t(?<cs-username>[^\t]?)\t...\t(?<date>[^\t]>)...
---
If this reply helps you, Karma would be appreciated.

View solution in original post

richgalloway
SplunkTrust
SplunkTrust

Have you considered using REGEX in your transforms.conf? Something like:

REGEX=(?<c-ip>[^\t]?)\t(?<cs-username>[^\t]?)\t...\t(?<date>[^\t]>)...
---
If this reply helps you, Karma would be appreciated.

aelliott
Motivator

I just came to that answer as well 🙂 thanks for your help
TIME_PREFIX=^[^\t]+\t[^\t]+\t[^\t]+\t[^\t]+\t

It worked perfectly

0 Karma

richgalloway
SplunkTrust
SplunkTrust

What you describe should be covered by [^\t]+\t[^\t]+\t[^\t]+\t[^\t]+\t. Can you share some sample data so I can experiment with regex strings?

---
If this reply helps you, Karma would be appreciated.

aelliott
Motivator

This is very close
^\S+\t\S+\t\S+\t\S+\t
However, there are spaces within the user agent strings that is the issue. Really it needs to be ^ non-tab, tab, non-tab, tab, non-tab, tab, non-tab, tab

0 Karma

richgalloway
SplunkTrust
SplunkTrust

Have you tried [\S\t\S\t\S\t\S\t]? I think \w is failing because of non-word characters (like '.') in one or more of your fields.

---
If this reply helps you, Karma would be appreciated.

aelliott
Motivator

Unfortunately this did not end up working. It still picked up the .. 10.8.5 within the c-agent field which is a mac computer operating system. It seemed to work on import as it was only highlighting the date fields once I put in that regex.

0 Karma

aelliott
Motivator

[^\w*\t\w*\t\w*\t\w*\t]
unless anyone can tell me different seems to have done the trick. Thanks for your help in getting me to the right answer!

0 Karma

richgalloway
SplunkTrust
SplunkTrust

I think I get it now. What you need is a MIN_TIMESTAMP_LOOKAHEAD-type of feature where Splunk will look for times x characters into the event. Until we have one of those, can you make your TIME_PREFIX string more specific? Maybe "[^\t]\t[^\t]\t[^\t]\t[^\t]\t"?

---
If this reply helps you, Karma would be appreciated.
0 Karma

aelliott
Motivator

Oh, the date and time fields parse just fine into the "date" and "time" fields.. the issue is that other fields show up in the _time field.. if they appear before the date/time fields. c-agent sometimes gets random IP's or random dates or random version numbers in it from a user's user agent string. Instead of the Timestamp using the date and time fields with the format I gave it, it grabs those random numbers and converts them into a date.

0 Karma

richgalloway
SplunkTrust
SplunkTrust

I suspect the TIME_FORMAT config doesn't understand you have a tab character within your time string. You also said that other fields sometimes appears in your date and time fields so maybe your transform isn't always working. Maybe a REGEX will work better or maybe it's Monday and my brain isn't at full steam yet. 🙂

---
If this reply helps you, Karma would be appreciated.
0 Karma

aelliott
Motivator

ok so how would that get me the right timestamp? My transform is working just fine.

0 Karma
Get Updates on the Splunk Community!

Demo Day: Strengthen Your SOC with Splunk Enterprise Security 8.1

Today’s threat landscape is more complex than ever. Security operation centers (SOCs) are overwhelmed with ...

Dashboards: Hiding charts while search is being executed and other uses for tokens

There are a couple of features of SimpleXML / Classic dashboards that can be used to enhance the user ...

Splunk Observability Cloud's AI Assistant in Action Series: Explaining Metrics and ...

This is the fourth post in the Splunk Observability Cloud’s AI Assistant in Action series that digs into how ...