Splunk Search

How to create a table with fields from two indexes based on one common field?

markwymer
Path Finder

Hi all,

I've found many answers to questions that are similar to my question, but not quite the same. Still, my apologies if this has been answered before......

We have live events from a web based application being indexed into, e.g., indexA
We also have a daily CSV file (generated from a SQL query) being ingested into indexB

The events look something like (actually they don't but I hope you will understand my example:) )
indexA - common_field, fieldB, fieldC,fieldD,fieldE
IndexB - fieldM,fieldN,common_field, fieldO,fieldP

My objective is to produce a table that has...
common_field,fieldC,fieldE,fieldN,fieldO

I've tried playing around with subsearches, but can't seem to get all the fields that I need. I did, also, toy with join, but got lost on that too!

Thank you for any hints, tips, advice
Mark.

0 Karma

somesoni2
Revered Legend

Try something like this

index=A OR index=B | table index common_field,fieldC,fieldE,fieldN,fieldO | stats values(*) as * by common_field | where mvcount(index)=2 | fields  - index

This will take all events from both index=A and index=B, group them by common_field, and show only the events which are present in both index (mvcount(index)=2)

Ricapar
Communicator

How big is that CSV you load up daily? It may make more sense to use that CSV as a lookup table. If you're already indexing, you can use outputlookup to create one from it:

index=index-with-csv | table common_field fieldA fieldB | outputlookup my-csv-lookup

And then use that lookup against your app logs:

index=index-with-applog | lookup my-csv-lookup common_field | table common_field fieldA fieldB ... fieldY fieldZ

This method will be a lot more efficient than a join or a transaction. If you're ingesting the CSV daily (likely via a scheduled job somewhere?) you can have a scheduled search to run that | outputlookup search and regenerate the lookup table within Splunk at a time shortly after the CSV gets dropped off for Splunk to index it.

Alternatively, if you happen to be using DB Connect in your environment, you can use that to run your SQL query directly against your database and generate the lookup table automatically, or even use an on-the-fly lookup. However, the above should work with your existing setup.

0 Karma

markwymer
Path Finder

Thanks for the response Ricapar. The CSV is a daily update which accumulated in Splunk so single common_field may have multiple entries tracking its usage.

You are quite correct, the most appropriate would be to use DB Connect to directly query the fields from our Data Warehouse. unfortunately, our DBA's are quite reluctant to give us, effectively, API access to the data. It is being worked on and so are they 😉

0 Karma

richgalloway
SplunkTrust
SplunkTrust

Have you tried something like this?

index=indexA | join common_field [search index=indexB] | table common_field,fieldC,fieldE,fieldN,fieldO
---
If this reply helps you, Karma would be appreciated.
0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.

Can’t make it to .conf25? Join us online!

Get Updates on the Splunk Community!

Leveraging Automated Threat Analysis Across the Splunk Ecosystem

Are you leveraging automation to its fullest potential in your threat detection strategy?Our upcoming Security ...

Can’t Make It to Boston? Stream .conf25 and Learn with Haya Husain

Boston may be buzzing this September with Splunk University and .conf25, but you don’t have to pack a bag to ...

Splunk Lantern’s Guide to The Most Popular .conf25 Sessions

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...