Solved: Re: Why is Hunk 6.1.1 ignoring 'earliest' and 'lat...

shaskell_splunk · ‎08-14-2014

I have a vix defined with the following parameters:

[mydata]
vix.provider = myprovider
vix.input.1.path = /user/hunk/data/${country}/${season}/...
vix.input.1.et.regex = /user/hunk/data/\w+/(\d+)_\d+/
vix.input.1.et.format = yyyy
vix.input.1.et.offset = 0
vix.input.1.lt.regex = /user/hunk/data/\w+/\d+_(\d+)/
vix.input.1.lt.format = yyyy
vix.input.1.lt.offset = 31556926

Here is what props.conf looks like:

[source::/user/hunk/data/England/...]
priority = 100
sourcetype = england

[england]
ANNOTATE_PUNCT = false
SHOULD_LINEMERGE = false
KV_MODE = json
EVAL-_time = strptime('Date', "%d/%m/%y")
MAX_DAYS_AGO = 10000

The data is in CSV format and SimpleCSVRecordReader is being used to process the results.

When I run the following search with the time picker set to 'All time' earliest and latest are ignored in the search:

index=mydata earliest=08/01/2013:0:0:0 latest=06/01/2014:0:0:0

The job inspector shows earliest time as:

1969-12-31T16:00:00.00-08:00

The Hadoop job fires off with 90 maps and scans the entire dataset. I get 34,057 events returned (all the events).

When I run the same search with the time picker set to a date range of 'Between 01/01/2012 and 08/14/2014' it still ignores earliest and latest but properly prunes on time.

The Hadoop job fires off with 40 maps and returns only 7,266 events within the specified time range.

There is no notification in the job inspector saying that my timerange was substituted based on the search string.

This is Hunk 6.1.1 build 209731.

shaskell_splunk · ‎08-14-2014

I've narrowed the issue down to the following setting in props.conf

EVAL-_time = strptime('Date', "%d/%m/%y")

This setting causes some unexpected consequences like ignoring earliest and latest in the search. A bug has been filed for this and it's being looked into.

The CSV I'm parsing has a field called 'Date'. The fix is to update props with:

TIME_PREFIX = "Date":"
TIME_FORMAT = %d/%m/%y

At the provider level add the following setting:

vix.splunk.search.column.filter = false

View solution in original post

Claw · ‎09-12-2016

When you are working with Hadoop using Hunk or when you are working with Splunk and the time field you want to work with is not _time, you may want to use the time picker in a dashboard with the correct time field. Or you may want to use some timeseries or any other time based Splunk command on that specific time field.

Here is a solution you might use to make time selections work on every case including in panels.

| inputlookup SampleData.csv 
| eval _time=strptime(claim_filing_date,"%Y-%m-%d")
| sort _time
| addinfo

Lets Break this down into it’s parts.

| inputlookup 837SampleData

This is a way to pull in data directly from a csv file so that it behaves just like it would from one of your searches against a Hadoop file that has no _time value.
In your search, you would supply something like [ index=SampleData state=”FL” ]
Please remember to add enough filters to the search so that you aren’t working with the entire data set. In Hadoop this could be a serious situation leading to copying literally all of your data to a sort. Remember filter first munge later.

| eval _time=strptime(claim_filing_date,"%Y-%m-%d")

This converts the date in “claim_filing_date” into epoc time and stores it in “_time”.

| sort _time

This sorts all of the records by time since they weren’t in that order before.

| addinfo

This adds info_min_time and info_max_time fields which are the min and max of the new values for _time that you have. This is needed for the time control in reports and panels to make it work properly. This is not needed to execute splunk command that are time oriented but it is the magic that will make this work properly in the time drop down in your panels.

DalJeanis · ‎06-14-2017

Make that ...

 | sort 0 _time

... because if you're using Hadoop, you have to assume you will get enough records to hit sort's 10K default limit.

Ledion_Bitincka · ‎11-17-2014

If you're using 6.2 you can force Hunk to always return a required field (from which you're then extracting the _time using traditionally index time processing, ie TIME_FORMAT/PREFIX) for all structured data types (csv, avro etc). See the following example config for how to tell Hunk to always return a field

[vix]
....
vix.input.1.required.fields = Date

Ledion_Bitincka · ‎08-15-2014

The root cause of the problem here, as pointed out, is the _time is a calculated field. When the search process expands the search it notices that _time can be changed/set at search time by the calculated field and it expands the search to (assuming earliest=1234567890):

   ... (_time>1234567890 OR sourcetype=england ) ....

Which means that the search now doesn't have a set earliest time (notice the OR) and thus the observed problem.

In Hunk, calculated _time is needed because, for structured data (csv, avro ...), the required field optimization eliminates 'Date' when someone doesn't reference it in the search - ie runs a search like this: "index=vix | stats count by foo". Adding a calculated field helps with mapping _time to Date however it breaks partition pruning because it expands searches to all time.

We are planning to introduce a fix for this in Hunk 6.2

shaskell_splunk · ‎08-14-2014

I've narrowed the issue down to the following setting in props.conf

EVAL-_time = strptime('Date', "%d/%m/%y")

This setting causes some unexpected consequences like ignoring earliest and latest in the search. A bug has been filed for this and it's being looked into.

The CSV I'm parsing has a field called 'Date'. The fix is to update props with:

TIME_PREFIX = "Date":"
TIME_FORMAT = %d/%m/%y

At the provider level add the following setting:

vix.splunk.search.column.filter = false

grittonc · ‎04-25-2018

@shaskell, I know this is an old question but I'm seeing the same issue with Splunk Cloud. Is there a way I can see more information about this bug?

Why is Hunk 6.1.1 ignoring 'earliest' and 'latest' in search?

Welcome to the Splunk Community!

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Adoption of RUM and APM at Splunk