Splunk Search

How to take sample events in an index from a CSV file?

phamxuantung
Communicator

Hello,

I have a csv file that have some summary stats from an index, but the requirement  is to show an sample event with all the info in that index.

The CSV file have a hash number (from a account number), some calculated status. For example:

PAN total_trans total_amount HASHPAN
1234******5678 15 15000 ABC123

 

The index have all the transaction and detail. We need to take an sample in the index from the csv and output to a new csv that have sample detail. Something like this:

PAN total_trans total_amount HASHPAN _time TRACE TRANSACTIONID
1234******5678 15 15000 ABC123 xxxxxxx xxxxxx xxxxxxxxx

 

I don't really like to use join, because the index have a lot of events (around 32 mils events). Are there any elegants way to get the data?

Labels (5)
0 Karma

Manasa_401
Communicator

Hello @phamxuantung 

If I understand well, your requirement is to join data from index and csv lookup without using Join command. If yes then try the below query

index=test_index
| lookup transactions.csv common_field OUTPUT total_amount total_trans

In the output you can specify all the required fields from the lookup. 

Thanks,
Manasa

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @phamxuantung,

if you have the csv in a lookup, you can use the lookup command, something like this:

index=your_index
| lookup your_lookup.csv PAN
| table PAN total_trans total_amount HASHPAN _time TRACE TRANSACTIONID

if instead the csv is in another index, in general you have to find a unique Correlation Key (e.g.: PAN or HASHPAN) and use it to correlate data using the stats command,  something like this:

index=index1 OR index=index2
| stats 
   values(total_trans) AS total_trans
   values(total_amount) AS total_amount
   values(HASHPAN) AS HASHPAN
   earliest(_time) AS _time
   values(TRACE) AS TRACE
   values(TRANSACTIONID) AS TRANSACTIONID
   BY PAN

Ciao.

Giuseppe

0 Karma

phamxuantung
Communicator

Accroding to my understanding, if you specify the index first and |lookup the csv, the addition information will populate on all the events in the index (in my case, all the events that have the same HASHPAN), is it not?

But I want the opposite. The csv hold, for example, 50 uniques HASHPAN, and I want to take sample events of that HASHPAN in the index (first 50 or last 50 events that have that HASHPAN, base on which is faster).

My code for my intergration is

|inputlookup mycsv.csv
|join HASHPAN 
[search index=myindex
|dedup HASHPAN
]
|table <detail fields that I need>

looking at this, I think you'll understand what I mean, but I want a more "efficient" waty to doing this.

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @phamxuantung,

in other words, you want to filter (find) events in the index that match the values of the csv, if this is your need, please try this:

index=myindex [ | inputlookup mycsv.csv | fields HASHPAN ]
| lookup your_lookup.csv HASHPAN 
| table PAN total_trans total_amount HASHPAN _time TRACE TRANSACTIONID

otherwise, if you're sure that a field of the lookup (e.g. PAN) has always values, you could also use a simpler solution:

index=your_index
| lookup your_lookup.csv HASHPAN OUTPUTNEW PAN total_trans total_amount
| where PAN=*
| table PAN total_trans total_amount HASHPAN _time TRACE TRANSACTIONID

Ciao.

Giuseppe

0 Karma
Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...