Hi Splunkers
I have a set of results from using set diff which is all good.
I am now wanting to output another field from those results and need some help.
This is the set diff query:
| set diff [search index=sit_ces "Processing user with email address" | rex "(?i)Processing user with email address= (?P<processuser>[^,]+)" | dedup processuser | table processuser] [search index=sit_ces "Activating user with the email address" | rex "(?i)Activating user with the email address (?P<activateuser>[^ ]+)" | dedup activateuser | table activateuser]
It outputs a list of email addresses which are the field processuser minus the activateuser results
Here's what I'm having trouble with:
There is a field in the first result set called companyName, I would like to table that field with the results from set diff, so I end up with
processuser companyName
email company
email company
email company
etc etc
The result of this would be a table with users that haven't finished an activation process.
Thanks everyone for all the responses you guys are great, the Splunk community is such a fantastic resource.
I fixed my answer thanks to @DalJeanis. You should upvote any working answers and click Accept
on the best one.
Stick to stats
and avoid all the subsearch
based commands; try this:
index=sit_ces "Processing user with email address" OR "Activating user with the email address"
| rex "(?i)ing user with(?: the)? email address=? (?<user>[^,]+)"
| stats values(*) AS * count(eval(searchmatch("Processing user with email address"))) AS countP count(eval(searchmatch("Activating user with the email address"))) AS countA BY user
| search countA=0
| table user companyName
Thanks Woody, unfortunately the search still returns all 28 emails with companyName, there are only 25 that have not been activated.
DalJeanis has provided the answer
Try it again; it had a bug and I fixed the original answer.
The search still returns all 28 email addresses.
There are 3 that appear in both events types so the number of rows returned should be 25
I had another mistake. I updated it to fix that. Try one more time.
@proylea be sure to come back here, try all the answers, UpVote
any answers/comments that were helpful and then pick the best one and click Answer
to close the question and help others.
The issue was that the rex in line 2 will only find the processing line. Something more like this would be needed to make it work for both.
| rex "(?i)ing user with (the )?email address[= ]+(?<user>[^, ]+)"
ACK! I did not notice the the
! You are correct. I have updated my answer and then it should work AND not have any transaction
nor subsearch
limits. What a silly log variation.
the = is also different between the two.
ARGH! Updated again. Hopefully it works now!
You can always use a left join instead of set diff.
index=sit_ces "Processing user with email address"
| rex "(?i)Processing user with email address= (?P<processuser>[^,]+)"
| dedup processuser
| table processuser companyName
| join type=left processuser
[ search index=sit_ces "Activating user with the email address"
| rex "(?i)Activating user with the email address (?P<activateuser>[^ ]+)"
| dedup activateuser
| table activateuser
| rename activateuser as processuser
| eval foundflag="ignoreme"]
|where isnull(foundflag)
Thanks Dal
The search is spot on, returns all emails with companyName that have not completed activation.
Cheers
Sure. There's LOTS of ways to join two files in splunk (join, lookup, set, append, and "stew"...dumping them into a single search as below) so choosing the one that's the easiest or most readable is often the way to go for moderate amounts of data.
If you were comparing HUGE files, then you'd run into the limits on subsearches, so you'd do it by dropping all the records into one search and then using stats to clean it all up, like so
index=sit_ces ("Processing user with email address" OR "Activating user with the email address" )
| rex "(?i)Processing user with email address= (?P<processuser>[^,]+)"
| rex "(?i)Activating user with the email address (?P<activateuser>[^ ]+)"
| eval user=coalesce(processuser,activateuser)
| stats values(processuser) as processuser, values(activateuser) as activateuser, values(companyName) as companyName by user
|where isnull(activateuser)
The key here is to make sure there is a common field in the two types of records so that you can get them connected. In this case, that was trivial, so assuming the above gets you what you want, then I'd use it in preference to the join.
[Updated]
Based on the changes in Activating and Processing events, I have changed the rex command. However, you should definitely use Field Extractions for the two events.
Since companyName is Key Value pair, I expect the same to be available during Search Time as Interesting Field.
I have added companyName in the stats command.
If your goal is to find the users who have Processing action but no Activation action event you can try the following:
index=sit_ces "Processing user with email address" OR "Activating user with the email address"
| rex "(?i)(?<action>\w+) user with[\w|\s]+email address[\s|\=]*(?<email>[\w|\d|\.|\@]+)"
| stats count as actionevents values(action) as action min(_time) as EarliestAction max(_time) as LatestAction values(companyName) as companyName by email
| search actionevents>1 action="Processing" and action="Activating"
| eval duration=EarliestAction=LatestAction
| fieldformat EarliestAction =strftime(EarliestAction,"%c")
| fieldformat LatestAction =strftime(LatestAction,"%c")
PS: I have used simple regex for the example, however, you should try Extract new fields from search results to make use of
Splunk's Interactive Field Extraction to come up with regular expression as Field Extraction knowledge object. That way above fields will be available at search time and you do not have to explicitly perform rex command (easy maintenance of code).
Thanks for responding.
But I already have the list of users that have not activated I just want to include the comanyName field in the output.
Here is a sample of the 2 types of events
17/02/2017 01:51:38,681 INFO [UserAccountController:228] Activating user with the email address mtact20@mailinator.com and user ID 00u9mbewggjBpq4Pu0h7
17/02/2017 01:40:00,629 INFO [UserProfileUtil:704] Processing user with email address= mtact20@mailinator.com, User Type= C, operation type= N, customerId= 10000001, companyName="Test Company 1", mobileNo=
I just want to display a list of all the email addresses with companyName of emails that appear in the first event type but do not appear in the second event type.
@proylea, I have updated my query to include companyName. Also changed rex regular expression to pull email for both the events. However, as stated earlier, please create Field Extractions for both Activating and Processing events which are not Key Value pairs like user ID in Activating event.
Thanks again but the search returns no results.
DalJeanis has provided the answer.
@proylea... Please go ahead and accept his answer so that this question gets marked as Answered. Only difference with coming up with proper field extraction would have been that you would have got a chance to avoid join and also would have saved field extraction for easier use in search query as a knowledge object. I am glad you have a fix!!!