Splunk Search

Dedup vs. Lookup performance

responsys_cm
Builder

I have some very high volume firewall records. I want to check the destination IP address against a lookup table that contains known malware C&C IPs.

Is it more efficient to dedup the records and then do the lookup or is it faster to do a lookup on each one?

Thx.

Craig

Tags (3)
1 Solution

dwaddle
SplunkTrust
SplunkTrust

Part of it will be depend on the size of the lookup table. Neither choice has a nonzero cost. Dedup'ing can be expensive for a large number of events, but comparing against a large lookup table might also be expensive.

I would recommend that you model out both scenarios and use the search job inspector to compare and contrast the amount of time spent in each.

Also, you may want to consider summary indexing as a 3rd alternative. A summary that gets updated every few minutes on something as simple as | sistats count by destip could give you a workably fast solution.

View solution in original post

ziegfried
Influencer

Additionally, if the lookup table is small enough, you could use inputlookup in a subsearch to query only addresses from the lookup list:

sourcetype=myfirewall [ | inputlookup cc_ipslist | return 10000 ip ]

will expand to smth like

sourcetype=myfirewall ( ip=1.1.1.1 OR ip=2.2.2.2 OR ip=3.3.3.3 ....)

dwaddle
SplunkTrust
SplunkTrust

Part of it will be depend on the size of the lookup table. Neither choice has a nonzero cost. Dedup'ing can be expensive for a large number of events, but comparing against a large lookup table might also be expensive.

I would recommend that you model out both scenarios and use the search job inspector to compare and contrast the amount of time spent in each.

Also, you may want to consider summary indexing as a 3rd alternative. A summary that gets updated every few minutes on something as simple as | sistats count by destip could give you a workably fast solution.

Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Monitoring AI Agents with Splunk Observability Cloud

Let’s say I’m running a travel planning AI app in production. A user asks for three concise hotel options in ...

[Puzzles] Solve, Learn, Repeat: Tiling

This puzzle (first published here) is based on finding groups of tessellated tiles (inspired by floor tiles I ...

SOK it to Me: Top 3 Benefits of Using Splunk Operator on Kubernetes that’ll Make ...

    Thursday, July 9, 2026  |  11:00AM–12:00PM PDT Duration: 1 hour (includes Q&A) Managing can feel like a ...