Splunk Dev

How to do time-based lookup using python?

Jeremiah
Motivator

Anyone out there doing time-based lookups with an external python script? How do you handle the time portion of the lookup configuration? Same as you would for a CSV lookup?

1 Solution

araitz
Splunk Employee
Splunk Employee

Per Steve Zhang, the Director of Search, there are two ways to do a time-based external lookup.

  • Have Splunk’s lookup mechanism handle the temporal aspect

    • In this case, the external lookup returns all relevant matches over all-time, and Splunk will constrain matches based on the time_field, time_format, etc specified in transforms.conf. This is analogous to how time-based CSV-based lookups work in Splunk.

    • The time-based configuration would perform the comparison on the time values and return the relevant results to search based on the lookup configuration.

  • Let the external script handle the temporal aspect implicitly by adding _time as a field to match on

    • In this case, the external script will need to know that the value of _time is not an exact match against the “time” column above, but rather “closest but no later than”


Which approach to implement depends on how often the matching field changes – in other words, over all time, how many different rows are there that contain a given value to be matched upon?

If the number of rows/changes is small, then either option above should be fine. It the number is large, then letting the script handle time matching is likely better.

In either case, it would be wise to implement caching to reduce calls to the DB.

View solution in original post

araitz
Splunk Employee
Splunk Employee

Per Steve Zhang, the Director of Search, there are two ways to do a time-based external lookup.

  • Have Splunk’s lookup mechanism handle the temporal aspect

    • In this case, the external lookup returns all relevant matches over all-time, and Splunk will constrain matches based on the time_field, time_format, etc specified in transforms.conf. This is analogous to how time-based CSV-based lookups work in Splunk.

    • The time-based configuration would perform the comparison on the time values and return the relevant results to search based on the lookup configuration.

  • Let the external script handle the temporal aspect implicitly by adding _time as a field to match on

    • In this case, the external script will need to know that the value of _time is not an exact match against the “time” column above, but rather “closest but no later than”


Which approach to implement depends on how often the matching field changes – in other words, over all time, how many different rows are there that contain a given value to be matched upon?

If the number of rows/changes is small, then either option above should be fine. It the number is large, then letting the script handle time matching is likely better.

In either case, it would be wise to implement caching to reduce calls to the DB.

Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

[Puzzles] Solve, Learn, Repeat: Character substitutions with Regular Expressions

This challenge was first posted on Slack #puzzles channelFor BORE at .conf23, we had a puzzle question which ...

Splunk Community Badges!

  Hey everyone! Ready to earn some serious bragging rights in the community? Along with our existing badges ...

[Puzzles] Solve, Learn, Repeat: Matching cron expressions

This puzzle (first published here) is based on matching timestamps to cron expressions.All the timestamps ...