More info on this, All our data returned from a search is having CURRENT_TIME as its timestamp. Hence all the data from entire day in hive gets returned if I search for data from the current day alone. Any other search gives 0 data.
Ideally I would like timestamp to be extracted from my hive field called start_time that has the actual timestamp of the log. But if i set it up this way, any splunk search with a date range would require the search to go through every single partition in hive which is incredibly slow. This is because a partition such as /.../../buildlogs/time_slot=201701070930/...... is not guaranteed to contain logs from those 30 minutes in our case.
Practically, What i would like is for the timestamp of the log to be what folder it is in.
For example, all logs under 201701070930 will have this as their timestamp. The timestamp of the log and the folder timestamp are usually within an hour of each other so it doesn't matter much.
The start_time field from hive will be processed and visible in the search results anyway.
The last answer on this page does talk about it a bit
https://answers.splunk.com/answers/235458/how-to-set-time-from-hive-field.html
But it extracts 'yourTimeField' from hive and i would like it to be extracted from the regex in the folder explained above.
Can anyone help me with this?
... View more