Thanks in advance all.
I have a lookup file called ga.csv
. This has a single column of Google Client ID
s under a column title of clientId
.
In splunk, any interaction with google is recorded in an event under the element detail.gaRef
(for example stats count by detail.gaRef
will list all refs
and event count
s).
In the same audit
Source/Type we also record an X-Session-ID
.
I am looking to get the splunk X-Session-ID
for all of the users
in this lookup file but i can't figure out how to do it.
Where I am at:
- I have uploaded the csv successfully and can see the contents using | inputlookup ga.csv
- I have tested the search manually by entering multiple Client ID
s with OR
statements between the id's and that works fine as it shows any sessionID
where the client id
also appears...i.e.
MySearchInAuditSourcetype AND (clientId1 OR clientId2 OR clientId3)
| table request.tags.X-Session-ID
| outputlookup CIDSessionID.csv
I have 250k client ID
s so i need a csv solution to this search.... any ideas would be much appreciated.
My current search
MySearchInAuditSourcetype
[| inputlookup "ga.csv"
| table clientId
| rename request.tags.X-Session-ID as search ]
| dedup request.tags.X-Session-ID
| table request.tags.X-Session-ID
| outputlookup p800SessionID.csv
any ideas?
So, first thing to notice
[| inputlookup "ga.csv" | table clientId | rename request.tags.X-Session-ID as search]
That rename
is doing nothing, since that field does not exist after the table
command.
Now, depending on the frequency of various things, there are lots of different potential solutions. We'll discuss those one by one...
METHOD ONE
Use the clientIds from the csv in a subsearch to select only those records from the larger mass.
Pseudocode for this option:
index=foo sourcetype=bar other words for the events you want
[| inputcsv mycsv | table JustTheFieldIWant | rename justTheFieldIWant as WhatItsCalledInTheEvents]
Discussion - this works great for csvs with a few values. Not recommended for large csv (over 1k records).
The subsearch resolves to a string that looks like
( ( WhatItsCalledInTheEvents="firstvalue" ) OR ( WhatItsCalledInTheEvents-"secondvalue" ) OR ... )
If you'd like to see exactly what yours results in, then use the explicit format command like this...
| inputcsv mycsv
| head 10
| table JustTheFieldIWant
| rename justTheFieldIWant as WhatItsCalledInTheEvents
| format
You can see that the search string can get VERY long, and that all those tests are O(N^2) so they'd get inefficient quickly.
As such, this method is completely out of bounds for your use case.
METHOD TWO
Set up the csv as a lookup, use it as a lookup, and throw away any events where the lookup had not result.
index=foo sourcetype=bar other words for the events you want
| fields ... just the fields that you want...
| lookup mycsv justTheFieldIWant as WhatItsCalledInTheEvents OUTPUT justTheFieldIWant as foundme
| where isnotnull(foundme)
Discussion: Highly recommended for medium-large csvs. This is pretty easy and useful, if you have admin access to set up lookup tables. If not, then you have to use the next one.
METHOD THREE
Search the records, append the csv data, use eventstats to roll a marker from the csv to the events, then throw away the csv records and any events that don't have a csv value.
index=foo sourcetype=bar other words for the events you want
| fields ... just the fields that you want...
| append [
| inputcsv mycsv
| table justTheFieldIWant
| rename justTheFieldIWant as WhatItsCalledInTheEvents
| eval killme="killme", foundme="foundme"
]
| eventstats max(foundme) as foundme by WhatItsCalledInTheEvents
| where isnotnull(foundme) and isnull(killme)
Discussion: Slightly less efficient than an actual lookup, but overall workable.
Now, depending on your actual data, and the density of the lookup compared to your actual data, it may be more efficient to do calculations and aggregations before, or after the lookup.
My rule of thumb is, if it's not intellectually obvious that there's a 2-1 or 3-1 advantage of using one method over the other, then try both methods and see what the results are.
When theory and the results conflict, then always believe the results.
You need to spend as much time reformatting the question as answering them. By the time I get done reformatting, you have already answered many of them!!!!
@woodcock - sometimes I do. This time, not so much.
Hi,
Is clientId is common in your lookup csv and logs? If yes, try to use lookup command, and then get list of session id.
For eg.
[search auditsource/type] | lookup clientId OUTPUT request.tags.X-Session-ID as sessionId | stats values(sessionId) as sessionId by clientId
If its not common rename the user field in your logs as clientId before lookup command, rest of the search is same.