Scenario:
I am searching email event logs. I can find some of the needed fields by a unique id (UID) and I find some fields by diffferent unique id (X-UID). Some events contain both UID and X-UID but not all the fields I need.
Here is a sample of the code:
[search index=mail sourcetype=xemail subject = "Blah" |stats count by UID| fields UID]
|stats list(subject) as subj list(sender) as sender list(recipient) as recp list(vendor_action) as status by UID
[search index=mail sourcetype=xemail sender = "sender@domain.com" |stats count by XUID| fields XUID]
|stats list(dest) as dest_ip list(sender) by XUID
Ultimately I would like results to show
subj sender recp status dest_ip
Thank you
Re your long comment on the question: That's exactly what transaction
does, even spanning multiple chained ID fields.
Here's a working example:
| stats count as raw | eval raw = "subject=foo uid=123
subject=foo2 uid=321
sender=bar2 uid=321 xuid=cba
sender=bar uid=123 xuid=abc
recipient=baz xuid=abc
recipient=baz2 xuid=cba" | makemv delim="
" raw | mvexpand raw | rename raw as _raw | extract
| eval _time = time()-(random()%1000) | sort - _time
| transaction uid xuid | table _time duration eventcount subject sender recipient uid xuid
Make sure you keep the line breaks as they are here, that's important for this dirty kind of dummy data generation from within the search bar.
First I set up six events, three events per email, each event containing only one "email-y" field.
The events for subject and sender are tied together with uid
, the events for sender and recipient are tied together with xuid
, and the event for sender ties together the uid
and the xuid
giving you a nice transitive transaction.
If you want to search for subject, sender, etc before building the transaction you can either do that manually:
The good case: You have an event with the field to filter by (say, sender)
and both ID fields.
index=mail sourcetype=xemail
[ search index=mail sourcetype=xemail sender=foo | fields UID XUID | dedup UID XUID | format "(" "(" "OR" ")" "OR" ")" ]
| transaction UID XUID
That will search for events matching your sender and use the UID and XUID field to search all potential matches beyond the "sender-event", then build the transaction from there.
The bad case, #1: You have an event with the field to filter by, but only the UID field.
index=mail sourcetype=xemail
[ search index=mail sourcetype=xemail
[ search index=mail sourcetype=xemail sender=foo | fields UID | dedup UID ]
| fields UID XUID | dedup UID XUID | format "(" "(" "OR" ")" "OR" ")" ]
| transaction UID XUID
The innermost subsearch will go in, search for your sender, and come back with a list of UIDs. Those are inserted into the outer subsearch, that will go and retrieve all events with that UID - some of those will have the missing XUID! From there it proceeds like the good case above, using UID or XUID to collect together all relevant events and run the transaction.
The bad case, #2: You have an event with the field to filter by, but only the XUID field.
This works like the bad case #1, but you need to add two Xs to the innermost subsearch... so when filtering, you need to know which of the two bad cases to run 😞
Even though the two bad cases mean you have to go through your index thrice, each run should be a fairly rare search. This will be slower if you search for a sender that sent 90% of all emails, mind.
Instead of building those subsearch-monsters manually, there is a much-forgotten search command searchtxn
to do just that for you (I think, don't have data handy to actually test).
To use that, you first have to set up a transaction type in transactiontypes.conf like this:
[xemail]
fields = UID, XUID
search = index=mail sourcetype=xemail
To confirm that this type works, run a regular non-filtering search with | transaction name=xemail
and see that it returns the same things as manually specifying the fields. Once that's done, run this with nothing else in the search bar:
| searchtxn xemail sender=foo
That should collect together all the required IDs and neatly return only matching transactions without scanning everything.
http://docs.splunk.com/Documentation/Splunk/6.3.3/SearchReference/searchtxn
One caveat about searchtxn
, it's not going to honour your time range picker.
I may regret this, but thought I'd take a stab at it too. But first things first, I saw that martin asked for sample data almost immediately, but I didn't see any except what he later came up with. Can you share some of these logs with us? Do you have time obfuscate/mask private info? Or just come up with fake data in that log format for us? When it comes to these more complex searches, you will find (are finding?) there's a lot of "try this"..."nope that didn't work" back-and-forth, because we can't really test these searches to see if they give you what you think you need. Sample data would be helpful is all.
After reading everything (I think?), is it safe to assume the following:
If that's the case, i'd be leaning toward a couple joins. Something like
index=email subject="important subject" | join uid [search index=email xuid=*] | join xuid sender dest [index=email xuid=* NOT sender=*] | format all everything nice here
Not sure if that would work for you, just another idea. And of course, if I got those assumptions wrong, then you can disregard this answer, because then I still don't understand the rules/scenario of the problem.
Thank you, I have considered join but could not get it to work properly. I will give a try too and let you know.
Re your long comment on the question: That's exactly what transaction
does, even spanning multiple chained ID fields.
Here's a working example:
| stats count as raw | eval raw = "subject=foo uid=123
subject=foo2 uid=321
sender=bar2 uid=321 xuid=cba
sender=bar uid=123 xuid=abc
recipient=baz xuid=abc
recipient=baz2 xuid=cba" | makemv delim="
" raw | mvexpand raw | rename raw as _raw | extract
| eval _time = time()-(random()%1000) | sort - _time
| transaction uid xuid | table _time duration eventcount subject sender recipient uid xuid
Make sure you keep the line breaks as they are here, that's important for this dirty kind of dummy data generation from within the search bar.
First I set up six events, three events per email, each event containing only one "email-y" field.
The events for subject and sender are tied together with uid
, the events for sender and recipient are tied together with xuid
, and the event for sender ties together the uid
and the xuid
giving you a nice transitive transaction.
If you want to search for subject, sender, etc before building the transaction you can either do that manually:
The good case: You have an event with the field to filter by (say, sender)
and both ID fields.
index=mail sourcetype=xemail
[ search index=mail sourcetype=xemail sender=foo | fields UID XUID | dedup UID XUID | format "(" "(" "OR" ")" "OR" ")" ]
| transaction UID XUID
That will search for events matching your sender and use the UID and XUID field to search all potential matches beyond the "sender-event", then build the transaction from there.
The bad case, #1: You have an event with the field to filter by, but only the UID field.
index=mail sourcetype=xemail
[ search index=mail sourcetype=xemail
[ search index=mail sourcetype=xemail sender=foo | fields UID | dedup UID ]
| fields UID XUID | dedup UID XUID | format "(" "(" "OR" ")" "OR" ")" ]
| transaction UID XUID
The innermost subsearch will go in, search for your sender, and come back with a list of UIDs. Those are inserted into the outer subsearch, that will go and retrieve all events with that UID - some of those will have the missing XUID! From there it proceeds like the good case above, using UID or XUID to collect together all relevant events and run the transaction.
The bad case, #2: You have an event with the field to filter by, but only the XUID field.
This works like the bad case #1, but you need to add two Xs to the innermost subsearch... so when filtering, you need to know which of the two bad cases to run 😞
Even though the two bad cases mean you have to go through your index thrice, each run should be a fairly rare search. This will be slower if you search for a sender that sent 90% of all emails, mind.
Instead of building those subsearch-monsters manually, there is a much-forgotten search command searchtxn
to do just that for you (I think, don't have data handy to actually test).
To use that, you first have to set up a transaction type in transactiontypes.conf like this:
[xemail]
fields = UID, XUID
search = index=mail sourcetype=xemail
To confirm that this type works, run a regular non-filtering search with | transaction name=xemail
and see that it returns the same things as manually specifying the fields. Once that's done, run this with nothing else in the search bar:
| searchtxn xemail sender=foo
That should collect together all the required IDs and neatly return only matching transactions without scanning everything.
http://docs.splunk.com/Documentation/Splunk/6.3.3/SearchReference/searchtxn
One caveat about searchtxn
, it's not going to honour your time range picker.
I have significantly extended the actual answer, please give the searchtxn
approach a shot and let us know if it works. H/T to @sideview!
Thank you Martin. I will review and apply your suggestions and definitely let you know if it works.
Thank you Martin, your first option works. The searchtxn I have not confirmed yet, but I will keep working on it. Thank you for all the help!!!
Thank you Martin for enduring my long response and providing the code. I will try it and let you know.
Thank you Martin, I see your strategy, however my objective was to search by one field value, like subject = blah, and automate the subsequent search that would produce the other needed fields. With your suggestion I would need to lookup the uid and xuid with multiple searches. I am trying to avoid the tedious/manual multiple searches.
Do you know how to search for a field value and use the result to automatically launch a subsequent search based on the returned value? That is where I am stuck.
Thank you
I see, you want to filter by either sender
, subject
, etc. without loading all the transactions. That's possible, but a bit more tricky.
The good case: You have an event with the field to filter by (say, sender)
and both ID fields.
index=mail sourcetype=xemail
[ search index=mail sourcetype=xemail sender=foo | fields UID XUID | dedup UID XUID | format "(" "(" "OR" ")" "OR" ")" ]
| transaction UID XUID
That will search for events matching your sender and use the UID and XUID field to search all potential matches beyond the "sender-event", then build the transaction from there.
The bad case, #1: You have an event with the field to filter by, but only the UID field.
index=mail sourcetype=xemail
[ search index=mail sourcetype=xemail
[ search index=mail sourcetype=xemail sender=foo | fields UID | dedup UID ]
| fields UID XUID | dedup UID XUID | format "(" "(" "OR" ")" "OR" ")" ]
| transaction UID XUID
The innermost subsearch will go in, search for your sender, and come back with a list of UIDs. Those are inserted into the outer subsearch, that will go and retrieve all events with that UID - some of those will have the missing XUID! From there it proceeds like the good case above, using UID or XUID to collect together all relevant events and run the transaction.
The bad case, #2: You have an event with the field to filter by, but only the XUID field.
This works like the bad case #1, but you need to add two Xs to the innermost subsearch... so when filtering, you need to know which of the two bad cases to run 😞
Even though the two bad cases mean you have to go through your index thrice, each run should be a fairly rare search. This will be slower if you search for a sender that sent 90% of all emails, mind.
All I think you need to do is to add sender to the list of fields for transaction:
| transaction sender uid xuid | table _time duration eventcount subject sender recipient uid xuid
Or perhaps, assume that you have used a form to collect the value of the subject field and have stored in a token named $subject$
Your search could be
index=mail sourcetype=xemail UID=* OR XUID=* sender=* subject="$subject$*"
| transaction sender uid xuid
| table _time duration eventcount subject sender recipient uid xuid
You could use any field, not just subject. However, the trick to this search is that first you retrieve only the events that have the field of interest. Then use transaction to group them based on the sender and the uids. Note that sender
is actually the only field that you named as being part of both events with UID and events with XUID. So sender
needs to be part of the transaction command.
So as you may have gathered, I am new to splunk...
Not sure what you mean by "using a form" is that like creating a lookup file? Say using the search to store results in a lookup and then use the lookup to find the other field values?
From a high level, is there a way to search and use the results to kick off another search with those results... say eval and searchmatch....? Basically chaining a search... other than transaction, b/c that is too slow...
Thank you
Use a short time range for testing.
Thank you for the advice, using the short time range, however I was aware of that. The problem is that "transaction " is just too expensive.
But perhaps I should rephrase my question for you again.
So I am working with email logs. The logs are such that subject, sender, recipient, attachment, etc are all in separate events that share a unique ID (uid). If I use:
[search index=mail sourcetype=xemail subject = "Blah" |stats count by UID| fields UID]
|stats list(subject) as subj list(sender) as sender list(recipient) as recp list(vendor_action) as status by UID
I will get some of the fields that I want (e.g. subject, sender, recipient, status, etc.).
However there are other fields I want that are not associated with UID but rather XUID. If I use:
[search index=mail sourcetype=xemail sender = "Blah" |stats count by XUID| fields XUID]
|stats list(dest) as dest_ip list(recipient) as recp by XUID
I will get some other fields I want (e.g. dest_ip).
There are events that contain both UID and XUID, but I don't know how to create a secondary search to pull up all the events containing the XUID found by the primary search of the UID.
Bottom line I am looking for a way to search a subject and get all the fields associated with the UID, which includes the XUID. Then use the XUID results to find all fields associated with the XUID number to combine all fields by UID and XUID for that email session.
If I am going about this the hard way and you have a better solution please let me know.
Thank you
For multiple-chained-keys transactions the transaction
command should be ideal. What happens if you run this?
index=mail sourcetype=xemail | transaction UID XUID | table _time duration subject sender recipient vendor_action dest
Thank you, I will give it a try, however there are a large number of events so I will have to [subsearch] first and then use transaction otherwise it will take too long.
index=mail sourcetype=xemail UID=* OR XUID=* sender=*
| stats list(recipient) as recipient list(subject) as subject list(vendor_action) as status
list(dest_ip) as dest_ip list(UID) as UID list(XUID) as XUID by sender
I would avoid the use of the transaction
command if there is a large number of events.
This search is very fast and simple. Use "values" instead of "list" in the stats command if you want remove duplicates from the results. Add "_time" at the end of the stats command if you want to show the time that each event occurred.
Thank you I will give it a try.
How about using the transaction command on sender?
yoursearch | transaction mvlist=1 sender | table _time subject sender UID X_UID
You may want to through a fillnull command in there. You could also add another case/coalesce based field if you need to conditionally include based on UID X_UID.
So, if they share the sender
field, how do you distinguish two conversations by the same sender?