I have a set of results from using set diff which is all good.
I am now wanting to output another field from those results and need some help.
This is the set diff query:
| set diff [search index=sit_ces "Processing user with email address" | rex "(?i)Processing user with email address= (?P<processuser>[^,]+)" | dedup processuser | table processuser] [search index=sit_ces "Activating user with the email address" | rex "(?i)Activating user with the email address (?P<activateuser>[^ ]+)" | dedup activateuser | table activateuser]
It outputs a list of email addresses which are the field processuser minus the activateuser results
Here's what I'm having trouble with:
There is a field in the first result set called companyName, I would like to table that field with the results from set diff, so I end up with
processuser companyName email company email company email company etc etc
The result of this would be a table with users that haven't finished an activation process.
stats and avoid all the
subsearch based commands; try this:
index=sit_ces "Processing user with email address" OR "Activating user with the email address" | rex "(?i)ing user with(?: the)? email address=? (?<user>[^,]+)" | stats values(*) AS * count(eval(searchmatch("Processing user with email address"))) AS countP count(eval(searchmatch("Activating user with the email address"))) AS countA BY user | search countA=0 | table user companyName
@proylea be sure to come back here, try all the answers,
UpVote any answers/comments that were helpful and then pick the best one and click
Answer to close the question and help others.
The issue was that the rex in line 2 will only find the processing line. Something more like this would be needed to make it work for both.
| rex "(?i)ing user with (the )?email address[= ]+(?<user>[^, ]+)"
You can always use a left join instead of set diff.
index=sit_ces "Processing user with email address" | rex "(?i)Processing user with email address= (?P<processuser>[^,]+)" | dedup processuser | table processuser companyName | join type=left processuser [ search index=sit_ces "Activating user with the email address" | rex "(?i)Activating user with the email address (?P<activateuser>[^ ]+)" | dedup activateuser | table activateuser | rename activateuser as processuser | eval foundflag="ignoreme"] |where isnull(foundflag)
Sure. There's LOTS of ways to join two files in splunk (join, lookup, set, append, and "stew"...dumping them into a single search as below) so choosing the one that's the easiest or most readable is often the way to go for moderate amounts of data.
If you were comparing HUGE files, then you'd run into the limits on subsearches, so you'd do it by dropping all the records into one search and then using stats to clean it all up, like so
index=sit_ces ("Processing user with email address" OR "Activating user with the email address" ) | rex "(?i)Processing user with email address= (?P<processuser>[^,]+)" | rex "(?i)Activating user with the email address (?P<activateuser>[^ ]+)" | eval user=coalesce(processuser,activateuser) | stats values(processuser) as processuser, values(activateuser) as activateuser, values(companyName) as companyName by user |where isnull(activateuser)
The key here is to make sure there is a common field in the two types of records so that you can get them connected. In this case, that was trivial, so assuming the above gets you what you want, then I'd use it in preference to the join.
Based on the changes in Activating and Processing events, I have changed the rex command. However, you should definitely use Field Extractions for the two events.
Since companyName is Key Value pair, I expect the same to be available during Search Time as Interesting Field.
I have added companyName in the stats command.
If your goal is to find the users who have Processing action but no Activation action event you can try the following:
index=sit_ces "Processing user with email address" OR "Activating user with the email address" | rex "(?i)(?<action>\w+) user with[\w|\s]+email address[\s|\=]*(?<email>[\w|\d|\.|\@]+)" | stats count as actionevents values(action) as action min(_time) as EarliestAction max(_time) as LatestAction values(companyName) as companyName by email | search actionevents>1 action="Processing" and action="Activating" | eval duration=EarliestAction=LatestAction | fieldformat EarliestAction =strftime(EarliestAction,"%c") | fieldformat LatestAction =strftime(LatestAction,"%c")
PS: I have used simple regex for the example, however, you should try Extract new fields from search results to make use of
Splunk's Interactive Field Extraction to come up with regular expression as Field Extraction knowledge object. That way above fields will be available at search time and you do not have to explicitly perform rex command (easy maintenance of code).
Thanks for responding.
But I already have the list of users that have not activated I just want to include the comanyName field in the output.
Here is a sample of the 2 types of events
17/02/2017 01:51:38,681 INFO [UserAccountController:228] Activating user with the email address firstname.lastname@example.org and user ID 00u9mbewggjBpq4Pu0h7 17/02/2017 01:40:00,629 INFO [UserProfileUtil:704] Processing user with email address= email@example.com, User Type= C, operation type= N, customerId= 10000001, companyName="Test Company 1", mobileNo=
I just want to display a list of all the email addresses with companyName of emails that appear in the first event type but do not appear in the second event type.
@proylea, I have updated my query to include companyName. Also changed rex regular expression to pull email for both the events. However, as stated earlier, please create Field Extractions for both Activating and Processing events which are not Key Value pairs like user ID in Activating event.
@proylea... Please go ahead and accept his answer so that this question gets marked as Answered. Only difference with coming up with proper field extraction would have been that you would have got a chance to avoid join and also would have saved field extraction for easier use in search query as a knowledge object. I am glad you have a fix!!!