Splunk Search

How to find matching data from two indexes

BigJohnQ
New Member

Hi all, thank in advance for your time!

I have a problem writing a properly working query with this case study:

I need to take data from index=email1 to find matching data from index=email2. I tried to do it this way: from index=email1 I take the fields src_user and recipient and use the appropriate search to look for it in the email2 index.

Query examples that I used:

index=email1 sourcetype=my_sourcetype source_user=*
[ search index=email2 sourcetype=my_sourcetype source_user=* | fields source_user ]



OR

index=email1 sourcetype=my_sourcetype
| join src_user, recipient [search index=emai2 *filters*]



Everything looked OK in the control sample (I found events in a 10-minute window, e.g. 06:00-06:10), which at first glance matched, but when I extended the search time, e.g. to 24h, it did not show me any events, even those that matched in a short time window (even though they were in these 24 hours).

Thank you for any ideas or solutions for this case.

Labels (3)
Tags (2)
0 Karma

PickleRick
SplunkTrust
SplunkTrust

You already had some sugestions which are OK but the question is what are your limitations on this search. How many events do you expect from each of those data sets, how long is the search supposed to take - these can warrant a different approach to the problem.

For example, since you're dealing with email data, it's a relatively valid question why aren't you using CIM datamodel (and have it accelerated).

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @BigJohnQ ,

your first solution or the one from @ITWhisperer are the most efficient if in the subsearch you have less than 50,000 results.

If instead you could have in the subsearch more than 50,000 results you should try another solution:

index IN (email1,email2) sourcetype=my_sourcetype source_user=*
| stats dc(index) AS index_count values(*) AS * BY source_user
| where index_count>1

you can replace the values(*) AS * with the list of all fields you need to have in the results.

Avoid you second solution because it's very slow!

Ciao.

Giuseppe

0 Karma

PickleRick
SplunkTrust
SplunkTrust

10k results, not 50k. The 50k results limit is for join command. "Normal" subsearch has a default 10k results limit.

(yes, all those limits can be confusing and are easy to mistake with one another).

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Try something like this

index=email2 sourcetype=my_sourcetype source_user=* [
search index=email1 sourcetype=my_sourcetype source_user=* | eval recipient = source_user | fields recipient | dedup recipient | format]
0 Karma
Get Updates on the Splunk Community!

Splunk Observability Cloud’s AI Assistant in Action Series: Analyzing and ...

This is the second post in our Splunk Observability Cloud’s AI Assistant in Action series, in which we look at ...

Elevate Your Organization with Splunk’s Next Platform Evolution

 Thursday, July 10, 2025  |  11AM PDT / 2PM EDT Whether you're managing complex deployments or looking to ...

Splunk Answers Content Calendar, June Edition

Get ready for this week’s post dedicated to Splunk Dashboards! We're celebrating the power of community by ...