Hello,
I have a csv file that have some summary stats from an index, but the requirement is to show an sample event with all the info in that index.
The CSV file have a hash number (from a account number), some calculated status. For example:
PAN | total_trans | total_amount | HASHPAN |
1234******5678 | 15 | 15000 | ABC123 |
The index have all the transaction and detail. We need to take an sample in the index from the csv and output to a new csv that have sample detail. Something like this:
PAN | total_trans | total_amount | HASHPAN | _time | TRACE | TRANSACTIONID |
1234******5678 | 15 | 15000 | ABC123 | xxxxxxx | xxxxxx | xxxxxxxxx |
I don't really like to use join, because the index have a lot of events (around 32 mils events). Are there any elegants way to get the data?
Hello @phamxuantung
If I understand well, your requirement is to join data from index and csv lookup without using Join command. If yes then try the below query
index=test_index
| lookup transactions.csv common_field OUTPUT total_amount total_trans
In the output you can specify all the required fields from the lookup.
Thanks,
Manasa
Hi @phamxuantung,
if you have the csv in a lookup, you can use the lookup command, something like this:
index=your_index
| lookup your_lookup.csv PAN
| table PAN total_trans total_amount HASHPAN _time TRACE TRANSACTIONID
if instead the csv is in another index, in general you have to find a unique Correlation Key (e.g.: PAN or HASHPAN) and use it to correlate data using the stats command, something like this:
index=index1 OR index=index2
| stats
values(total_trans) AS total_trans
values(total_amount) AS total_amount
values(HASHPAN) AS HASHPAN
earliest(_time) AS _time
values(TRACE) AS TRACE
values(TRANSACTIONID) AS TRANSACTIONID
BY PAN
Ciao.
Giuseppe
Accroding to my understanding, if you specify the index first and |lookup the csv, the addition information will populate on all the events in the index (in my case, all the events that have the same HASHPAN), is it not?
But I want the opposite. The csv hold, for example, 50 uniques HASHPAN, and I want to take sample events of that HASHPAN in the index (first 50 or last 50 events that have that HASHPAN, base on which is faster).
My code for my intergration is
|inputlookup mycsv.csv
|join HASHPAN
[search index=myindex
|dedup HASHPAN
]
|table <detail fields that I need>
looking at this, I think you'll understand what I mean, but I want a more "efficient" waty to doing this.
Hi @phamxuantung,
in other words, you want to filter (find) events in the index that match the values of the csv, if this is your need, please try this:
index=myindex [ | inputlookup mycsv.csv | fields HASHPAN ]
| lookup your_lookup.csv HASHPAN
| table PAN total_trans total_amount HASHPAN _time TRACE TRANSACTIONID
otherwise, if you're sure that a field of the lookup (e.g. PAN) has always values, you could also use a simpler solution:
index=your_index
| lookup your_lookup.csv HASHPAN OUTPUTNEW PAN total_trans total_amount
| where PAN=*
| table PAN total_trans total_amount HASHPAN _time TRACE TRANSACTIONID
Ciao.
Giuseppe