New to splunk - trying to use lookup to find splun...

stephenreece · ‎12-19-2018

Thanks in advance all.

I have a lookup file called ga.csv. This has a single column of Google Client IDs under a column title of clientId.

In splunk, any interaction with google is recorded in an event under the element detail.gaRef (for example stats count by detail.gaRef will list all refs and event counts).

In the same audit Source/Type we also record an X-Session-ID.

I am looking to get the splunk X-Session-ID for all of the users in this lookup file but i can't figure out how to do it.

Where I am at:
- I have uploaded the csv successfully and can see the contents using | inputlookup ga.csv
- I have tested the search manually by entering multiple Client IDs with OR statements between the id's and that works fine as it shows any sessionID where the client id also appears...i.e.

MySearchInAuditSourcetype AND (clientId1 OR clientId2 OR clientId3)
| table request.tags.X-Session-ID
| outputlookup CIDSessionID.csv

I have 250k client IDs so i need a csv solution to this search.... any ideas would be much appreciated.
My current search

MySearchInAuditSourcetype
[| inputlookup "ga.csv" 
 | table clientId
 | rename request.tags.X-Session-ID as search ]
| dedup request.tags.X-Session-ID 
| table request.tags.X-Session-ID 
| outputlookup p800SessionID.csv

any ideas?

DalJeanis · ‎12-31-2018

So, first thing to notice

 [| inputlookup "ga.csv" | table clientId | rename request.tags.X-Session-ID as search]

That rename is doing nothing, since that field does not exist after the table command.

Now, depending on the frequency of various things, there are lots of different potential solutions. We'll discuss those one by one...

METHOD ONE

Use the clientIds from the csv in a subsearch to select only those records from the larger mass.

Pseudocode for this option:

 index=foo sourcetype=bar other words for the events you want 
 [| inputcsv mycsv | table JustTheFieldIWant | rename justTheFieldIWant as WhatItsCalledInTheEvents]

Discussion - this works great for csvs with a few values. Not recommended for large csv (over 1k records).

The subsearch resolves to a string that looks like

( ( WhatItsCalledInTheEvents="firstvalue" ) OR ( WhatItsCalledInTheEvents-"secondvalue" ) OR ... )

If you'd like to see exactly what yours results in, then use the explicit format command like this...

  | inputcsv mycsv 
  | head 10 
  | table JustTheFieldIWant 
  | rename justTheFieldIWant as WhatItsCalledInTheEvents 
  | format

You can see that the search string can get VERY long, and that all those tests are O(N^2) so they'd get inefficient quickly.

As such, this method is completely out of bounds for your use case.

METHOD TWO

Set up the csv as a lookup, use it as a lookup, and throw away any events where the lookup had not result.

 index=foo sourcetype=bar other words for the events you want 
| fields ... just the fields that you want...
| lookup mycsv  justTheFieldIWant as WhatItsCalledInTheEvents OUTPUT justTheFieldIWant as foundme
| where isnotnull(foundme)

Discussion: Highly recommended for medium-large csvs. This is pretty easy and useful, if you have admin access to set up lookup tables. If not, then you have to use the next one.

METHOD THREE

Search the records, append the csv data, use eventstats to roll a marker from the csv to the events, then throw away the csv records and any events that don't have a csv value.

 index=foo sourcetype=bar other words for the events you want 
| fields ... just the fields that you want...
| append [ 
    | inputcsv mycsv  
    | table justTheFieldIWant  
    | rename justTheFieldIWant as WhatItsCalledInTheEvents 
    | eval killme="killme", foundme="foundme"
    ]
| eventstats max(foundme) as foundme by WhatItsCalledInTheEvents
| where isnotnull(foundme) and isnull(killme)

Discussion: Slightly less efficient than an actual lookup, but overall workable.

Now, depending on your actual data, and the density of the lookup compared to your actual data, it may be more efficient to do calculations and aggregations before, or after the lookup.

My rule of thumb is, if it's not intellectually obvious that there's a 2-1 or 3-1 advantage of using one method over the other, then try both methods and see what the results are.

When theory and the results conflict, then always believe the results.

woodcock · ‎12-31-2018

You need to spend as much time reformatting the question as answering them. By the time I get done reformatting, you have already answered many of them!!!!

DalJeanis · ‎12-31-2018

@woodcock - sometimes I do. This time, not so much.

p_gurav · ‎12-26-2018

Hi,

Is clientId is common in your lookup csv and logs? If yes, try to use lookup command, and then get list of session id.
For eg.

[search auditsource/type] | lookup clientId OUTPUT request.tags.X-Session-ID as sessionId |  stats values(sessionId) as sessionId by clientId

If its not common rename the user field in your logs as clientId before lookup command, rest of the search is same.

New to splunk - trying to use lookup to find splunk events and return X-session-ID

Extending Observability Content to Splunk Cloud

More Control Over Your Monitoring Costs with Archived Metrics!

New in Observability Cloud - Explicit Bucket Histograms