When I perform a query "index=api" with date range for example 07/07/2015 - 07/07/2015, I only get results within the first second of midnight on 07/07/2015.
But if I perform the same query with date range 07/07/2015 - 07/08/2015, I get more results from the 07/07/2015 day. It includes more than just midnight.
Does not having hours and minutes in the directory affect the search? In Splunk, using the 07/07/2015-07/07/2015 gets me all results of that day.
My HDFS logs are partitioned by year/month/day.
virtual index
[api]
vix.provider = testprovider
vix.input.1.path = /projects/test/test_logs/api/...
vix.input.1.accept = .
vix.input.1.et.regex = /projects/test/test_logs/.+/(\d\d\d\d)/(\d\d)/(\d\d)/.+
vix.input.1.et.format = yyyyMMdd
vix.input.1.et.offset = 0
vix.input.1.lt.regex = /projects/test/test_logs/.+/(\d\d\d\d)/(\d\d)/(\d\d)/.+
vix.input.1.lt.format = yyyyMMdd
vix.input.1.lt.offset = 86400
 
					
				
		
Typically the events themselves would have timestamps. Did you configure timestamp recognition?
Configuring timestamp recognition in splunk
For example, event contains:
2015-07-10T13:40:51Z syslog.tcp {"message":"<166>2015-07-10T13:40:51.076Z somehost.somedomain Vpxa: [FFEDEB90 verbose 'VpxaHalCnxHostagent' opID=WFU-c8713dc4] [WaitForUpdatesDone] Received callback","client_host":"10.6.50.104"}
props.conf:
[source::/my/source/...]
sourcetype = hadoop
priority = 100
ANNOTATE_PUNCT = false
SHOULD_LINEMERGE = false
MAX_TIMESTAMP_LOOKAHEAD = 30
TIME_PREFIX = ^
TIME_FORMAT = %Y-%m-%dT%H:%M:%SZ
TZ=UTC
 
					
				
		
Typically the events themselves would have timestamps. Did you configure timestamp recognition?
Configuring timestamp recognition in splunk
For example, event contains:
2015-07-10T13:40:51Z syslog.tcp {"message":"<166>2015-07-10T13:40:51.076Z somehost.somedomain Vpxa: [FFEDEB90 verbose 'VpxaHalCnxHostagent' opID=WFU-c8713dc4] [WaitForUpdatesDone] Received callback","client_host":"10.6.50.104"}
props.conf:
[source::/my/source/...]
sourcetype = hadoop
priority = 100
ANNOTATE_PUNCT = false
SHOULD_LINEMERGE = false
MAX_TIMESTAMP_LOOKAHEAD = 30
TIME_PREFIX = ^
TIME_FORMAT = %Y-%m-%dT%H:%M:%SZ
TZ=UTC
I have not configured the timestamp recognition. Will try that and the timezone and see if it works.
 
		
		
		
		
		
	
			
		
		
			
					
		Also, make sure to set the same timezone in indexes.conf for vix.input.1.et/lt.timezone - in many cases the timezone of the data is GMT while the search is ran in a user specified timezone, e.g. PST
 
		
		
		
		
		
	
			
		
		
			
					
		"Does not having hours and minutes in the directory affect the search" - which directory is that? If splunk determines the timestamp for your events from the directory structure, then of course those are needed (and splunk will give events a midnight timestamp if only day is available).
Could you clarify how you run the three searches you mention above, with 1) from 07/07/2015 - 07/07/2015, 2) from 07/07/2015 - 07/08/2015 and 3) "In Splunk", and some example timestamps from those results?
The directory and file looks like the following in HDFS.
  /projects/test/test_logs/api/2015/07/07/api_server.log.2015-07-07.gz
A line in the log looks like
  [Thu Jul 09 02:03:02 2015] [error] [client 127.0.0.1] log={"messages": "test"}
Ran the search index="api"
Smart Mode
Used the "Date Range" option with Between "07/07/2015" and "07/07/2015".
Time Column Results
7/7/15
12:00:15.000 AM                       
Event Column Results
[Tue Jul 07 00:00:15 2015] [error] log={"messages": "test"}
29,000 events
In the results listings, I don't see anything beyond 00:00
Used the "Date Range" option with Between "07/07/2015" and "07/08/2015".
Time Column Results
7/7/15
12:00:15.000 AM                       
Event Column Results
[Tue Jul 07 00:00:15 2015] [error] log={"messages": "test"}
76,000,000 events
In the results listings, I don't see anything beyond 00:00
Question: It says 76 mil events matched, but results list only hour 0.
 
		
		
		
		
		
	
			
		
		
			
					
		That looks like a problem with your timestamp recognition.
