Getting Data In

vix.input.1.et.regex - Log directory only contains year, month, and day. How are searches affected if there is no hour?

Explorer

When I perform a query "index=api" with date range for example 07/07/2015 - 07/07/2015, I only get results within the first second of midnight on 07/07/2015.

But if I perform the same query with date range 07/07/2015 - 07/08/2015, I get more results from the 07/07/2015 day. It includes more than just midnight.

Does not having hours and minutes in the directory affect the search? In Splunk, using the 07/07/2015-07/07/2015 gets me all results of that day.

My HDFS logs are partitioned by year/month/day.

virtual index
[api]
vix.provider = testprovider
vix.input.1.path = /projects/test/testlogs/api/...
vix.input.1.accept = .
vix.input.1.et.regex = /projects/test/test
logs/.+/(\d\d\d\d)/(\d\d)/(\d\d)/.+
vix.input.1.et.format = yyyyMMdd
vix.input.1.et.offset = 0
vix.input.1.lt.regex = /projects/test/test_logs/.+/(\d\d\d\d)/(\d\d)/(\d\d)/.+
vix.input.1.lt.format = yyyyMMdd
vix.input.1.lt.offset = 86400

Tags (2)
0 Karma
1 Solution

Builder

Typically the events themselves would have timestamps. Did you configure timestamp recognition?

Configuring timestamp recognition in splunk

For example, event contains:

2015-07-10T13:40:51Z syslog.tcp {"message":"<166>2015-07-10T13:40:51.076Z somehost.somedomain Vpxa: [FFEDEB90 verbose 'VpxaHalCnxHostagent' opID=WFU-c8713dc4] [WaitForUpdatesDone] Received callback","client_host":"10.6.50.104"}

props.conf:

[source::/my/source/...]
sourcetype = hadoop
priority = 100
ANNOTATEPUNCT = false
SHOULD
LINEMERGE = false
MAXTIMESTAMPLOOKAHEAD = 30
TIMEPREFIX = ^
TIME
FORMAT = %Y-%m-%dT%H:%M:%SZ
TZ=UTC

View solution in original post

Builder

Typically the events themselves would have timestamps. Did you configure timestamp recognition?

Configuring timestamp recognition in splunk

For example, event contains:

2015-07-10T13:40:51Z syslog.tcp {"message":"<166>2015-07-10T13:40:51.076Z somehost.somedomain Vpxa: [FFEDEB90 verbose 'VpxaHalCnxHostagent' opID=WFU-c8713dc4] [WaitForUpdatesDone] Received callback","client_host":"10.6.50.104"}

props.conf:

[source::/my/source/...]
sourcetype = hadoop
priority = 100
ANNOTATEPUNCT = false
SHOULD
LINEMERGE = false
MAXTIMESTAMPLOOKAHEAD = 30
TIMEPREFIX = ^
TIME
FORMAT = %Y-%m-%dT%H:%M:%SZ
TZ=UTC

View solution in original post

Explorer

I have not configured the timestamp recognition. Will try that and the timezone and see if it works.

0 Karma

Splunk Employee
Splunk Employee

Also, make sure to set the same timezone in indexes.conf for vix.input.1.et/lt.timezone - in many cases the timezone of the data is GMT while the search is ran in a user specified timezone, e.g. PST

0 Karma

Champion

"Does not having hours and minutes in the directory affect the search" - which directory is that? If splunk determines the timestamp for your events from the directory structure, then of course those are needed (and splunk will give events a midnight timestamp if only day is available).
Could you clarify how you run the three searches you mention above, with 1) from 07/07/2015 - 07/07/2015, 2) from 07/07/2015 - 07/08/2015 and 3) "In Splunk", and some example timestamps from those results?

0 Karma

Explorer

The directory and file looks like the following in HDFS.
/projects/test/testlogs/api/2015/07/07/apiserver.log.2015-07-07.gz

A line in the log looks like
[Thu Jul 09 02:03:02 2015] [error] [client 127.0.0.1] log={"messages": "test"}

Ran the search index="api"
Smart Mode

Used the "Date Range" option with Between "07/07/2015" and "07/07/2015".
Time Column Results

7/7/15
12:00:15.000 AM

Event Column Results
[Tue Jul 07 00:00:15 2015] [error] log={"messages": "test"}
29,000 events
In the results listings, I don't see anything beyond 00:00

Used the "Date Range" option with Between "07/07/2015" and "07/08/2015".
Time Column Results
7/7/15
12:00:15.000 AM

Event Column Results
[Tue Jul 07 00:00:15 2015] [error] log={"messages": "test"}
76,000,000 events
In the results listings, I don't see anything beyond 00:00

Question: It says 76 mil events matched, but results list only hour 0.

0 Karma

Champion

That looks like a problem with your timestamp recognition.

0 Karma