Hi,
I have a KV time-based lookup generated from DHCP logs with content like this:
time,ip,hostname,mac
1709093697,10.223.5.43,host-43,aa:bb:cc:dd:ee:ff
and transforms.conf for it:
[dhcp_timebased_lookup]
collection = dhcp_timebased_collection
external_type = kvstore
fields_list = _key,time,ip,hostname,mac
max_offset_secs = 691200
min_offset_secs = 0
time_field = time
time_format = %s
Lookup works well when I run search which pulls events from index:
index=test source=timebased
| lookup dhcp_timebased_lookup ip AS dest_ip OUTPUT hostname
| table _time dest_ip hostname
Hostname is there:
_time | dest_ip | hostname |
1709093697 | 10.223.5.43 | host-43 |
But when I use this lookup after non-event-generating commands it does not work:
index=test source=timebased
| table _time dest_ip
| lookup dhcp_timebased_lookup ip AS dest_ip OUTPUT hostname
OR
index=test source=timebased
| stats count BY _time dest_ip
| lookup dhcp_timebased_lookup ip AS dest_ip OUTPUT hostname
OR
| makeresults
| eval dest_ip = "10.223.5.43", _time = 1709093697
| lookup dhcp_timebased_lookup ip AS dest_ip OUTPUT hostname
OR
| tstats from datamodel=SomeDM count BY _time SomeDM.dest_ip span=1s
| lookup dhcp_timebased_lookup ip AS "SomeDM.dest_ip" OUTPUT hostname
Hostname doesn't show up.
If I turn time-based setting for this lookup off it outputs hostnames for all searches above.
It makes me think there is some difference between _time field in events' metadata and _time field in statistics. Is it so? And is there solution besides "join with inputlookup and addinfo" workaround?
Hi @YuriSpirin,
When we define our lookup with time_format = %s, the time field in our collection should have type number:
# collections.conf
[dhcp_timebased_lookup]
enforceTypes = true
field.time = number
field.ip = string
field.hostname = string
field.mac = string
# transforms.conf
[dhcp_timebased_lookup]
collection = dhcp_timebased_lookup
external_type = kvstore
fields_list = time,ip,hostname,mac
max_offset_secs = 691200
min_offset_secs = 0
time_field = time
time_format = %s
We can populate the lookup with test data:
| makeresults format=csv data="time,ip,hostname,mac
1709251200,1.2.3.4,host-43,aa:bb:cc:dd:ee:ff
1709164800,1.2.3.4,host-42,aa:bb:cc:dd:ee:fe
1709078400,1.2.3.4,host-41,aa:bb:cc:dd:ee:fd"
| outputlookup dhcp_timebased_lookup
and validate it with an additional test:
| makeresults format=csv data="_time,dest_ip
1709208000,1.2.3.4"
| lookup dhcp_timebased_lookup ip as dest_ip output hostname
_time dest_ip hostname
2024-02-29 07:00:00 1.2.3.4 host-42
We can also experiment with accelerated fields to improve performance, although we may not see the performance returns we expect:
[dhcp_timebased_lookup]
enforceTypes = true
field.time = number
field.ip = string
field.hostname = string
field.mac = string
accelerated_fields.ip = {"time": -1, "ip": 1}
Compare with a similar file-based lookup with a size less than or equal to the configured max_memtable_bytes limit:
# limits.conf
[lookup]
# default 25 MB; increase max_memtable_bytes to a value greater than our
# largest lookup file, assuming we have adequate phyiscal memory available
max_memtable_bytes = 26214400
I cannot seem to reproduce your results. I put your sample CSV into my laptop. The setting is
[dhcp_timebased_lookup]
batch_index_query = 0
case_sensitive_match = 1
filename = dhcp_timebased_lookup.csv
max_offset_secs = 691200
time_field = time
Then, I run this search
| makeresults
| eval dest_ip = "10.223.5.43"
| stats values(dest_ip) as dest_ip by _time
| lookup dhcp_timebased_lookup ip AS dest_ip output time hostname
| eval lag = _time - time
| eval time = strftime(time, "%F %T")
It gives
_time | dest_ip | hostname | lag | time |
2024-02-28 14:16:08 | 10.223.5.43 | host-43 | 64871 | 2024-02-27 20:14:57 |
I had a suspicion about that time_format setting. (Splunk defaults to using second.) But even after I add this, it still returns result.
Could you try this on a spare instance?
Hi, @yuanliu!
Thanks for your reply and clue.
I have no spare instance right now but I've exported contents of this KV Store time-based lookup into CSV file, reconfigured lookup definition to use that CSV - and now it works. Looks like the problem is related only to KV Store time-based lookups.
So this particular problem is solved but I'd like to know if such behavior is expected with KV Store lookups or it is some bug or my misconfiguration. My deployment is Splunk Enterprise 7.2.6 with MongoDB.
Glad you can work around this issue! Anyway, I run a spare instance on my laptop always for simple matters that can be emulated. You may consider the same. (You can also observe how versions may affect these.)
I wish I learned how to set up KV store on my laptop so I could help more. But KV Store vs. CSV files has no indication that time-based lookup should function differently. So you would have a support case except 7 might be out of support.