adding another field after using set diff

proylea · ‎03-14-2017

Hi Splunkers

I have a set of results from using set diff which is all good.
I am now wanting to output another field from those results and need some help.

This is the set diff query:

| set diff [search index=sit_ces "Processing user with email address" | rex "(?i)Processing user with email address= (?P<processuser>[^,]+)" | dedup processuser | table processuser] [search index=sit_ces "Activating user with the email address" | rex "(?i)Activating user with the email address (?P<activateuser>[^ ]+)" | dedup activateuser | table activateuser]

It outputs a list of email addresses which are the field processuser minus the activateuser results

Here's what I'm having trouble with:
There is a field in the first result set called companyName, I would like to table that field with the results from set diff, so I end up with

processuser      companyName
email            company
email            company
email            company
etc               etc

The result of this would be a table with users that haven't finished an activation process.

proylea · ‎03-16-2017

Thanks everyone for all the responses you guys are great, the Splunk community is such a fantastic resource.

woodcock · ‎03-16-2017

I fixed my answer thanks to @DalJeanis. You should upvote any working answers and click Accept on the best one.

woodcock · ‎03-15-2017

Stick to stats and avoid all the subsearch based commands; try this:

index=sit_ces "Processing user with email address" OR "Activating user with the email address"
| rex "(?i)ing user with(?: the)? email address=? (?<user>[^,]+)"
| stats values(*) AS * count(eval(searchmatch("Processing user with email address"))) AS countP count(eval(searchmatch("Activating user with the email address"))) AS countA BY user
| search countA=0
| table user companyName

proylea · ‎03-15-2017

Thanks Woody, unfortunately the search still returns all 28 emails with companyName, there are only 25 that have not been activated.
DalJeanis has provided the answer

woodcock · ‎03-16-2017

Try it again; it had a bug and I fixed the original answer.

proylea · ‎03-16-2017

The search still returns all 28 email addresses.
There are 3 that appear in both events types so the number of rows returned should be 25

woodcock · ‎03-16-2017

I had another mistake. I updated it to fix that. Try one more time.

woodcock · ‎04-13-2019

@proylea be sure to come back here, try all the answers, UpVote any answers/comments that were helpful and then pick the best one and click Answer to close the question and help others.

DalJeanis · ‎03-16-2017

The issue was that the rex in line 2 will only find the processing line. Something more like this would be needed to make it work for both.

| rex "(?i)ing user with (the )?email address[= ]+(?<user>[^, ]+)"

woodcock · ‎03-16-2017

ACK! I did not notice the the! You are correct. I have updated my answer and then it should work AND not have any transaction nor subsearch limits. What a silly log variation.

DalJeanis · ‎03-16-2017

the = is also different between the two.

woodcock · ‎03-16-2017

ARGH! Updated again. Hopefully it works now!

DalJeanis · ‎03-15-2017

You can always use a left join instead of set diff.

index=sit_ces "Processing user with email address" 
| rex "(?i)Processing user with email address= (?P<processuser>[^,]+)" 
| dedup processuser 
| table processuser  companyName
| join type=left processuser  
    [ search index=sit_ces "Activating user with the email address" 
    | rex "(?i)Activating user with the email address (?P<activateuser>[^ ]+)" 
    | dedup  activateuser 
    | table  activateuser 
    | rename activateuser as processuser 
    | eval foundflag="ignoreme"]
|where isnull(foundflag)

proylea · ‎03-15-2017

Thanks Dal
The search is spot on, returns all emails with companyName that have not completed activation.

Cheers

DalJeanis · ‎03-16-2017

Sure. There's LOTS of ways to join two files in splunk (join, lookup, set, append, and "stew"...dumping them into a single search as below) so choosing the one that's the easiest or most readable is often the way to go for moderate amounts of data.

If you were comparing HUGE files, then you'd run into the limits on subsearches, so you'd do it by dropping all the records into one search and then using stats to clean it all up, like so

 index=sit_ces ("Processing user with email address"  OR "Activating user with the email address" )
 | rex "(?i)Processing user with email address= (?P<processuser>[^,]+)" 
 | rex "(?i)Activating user with the email address (?P<activateuser>[^ ]+)" 
 | eval user=coalesce(processuser,activateuser)
 | stats values(processuser) as processuser, values(activateuser) as activateuser, values(companyName) as companyName by user
 |where isnull(activateuser)

The key here is to make sure there is a common field in the two types of records so that you can get them connected. In this case, that was trivial, so assuming the above gets you what you want, then I'd use it in preference to the join.

niketn · ‎03-15-2017

[Updated]

Based on the changes in Activating and Processing events, I have changed the rex command. However, you should definitely use Field Extractions for the two events.

Since companyName is Key Value pair, I expect the same to be available during Search Time as Interesting Field.
I have added companyName in the stats command.

If your goal is to find the users who have Processing action but no Activation action event you can try the following:

index=sit_ces "Processing user with email address" OR "Activating user with the email address"
|  rex "(?i)(?<action>\w+) user with[\w|\s]+email address[\s|\=]*(?<email>[\w|\d|\.|\@]+)"
| stats count as actionevents values(action) as action min(_time) as EarliestAction max(_time) as LatestAction values(companyName) as companyName by email
| search actionevents>1 action="Processing" and action="Activating"
| eval duration=EarliestAction=LatestAction
| fieldformat EarliestAction =strftime(EarliestAction,"%c")
| fieldformat  LatestAction =strftime(LatestAction,"%c")

PS: I have used simple regex for the example, however, you should try Extract new fields from search results to make use of
Splunk's Interactive Field Extraction to come up with regular expression as Field Extraction knowledge object. That way above fields will be available at search time and you do not have to explicitly perform rex command (easy maintenance of code).

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"

proylea · ‎03-15-2017

Thanks for responding.
But I already have the list of users that have not activated I just want to include the comanyName field in the output.

Here is a sample of the 2 types of events

17/02/2017 01:51:38,681 INFO  [UserAccountController:228] Activating user with the email address mtact20@mailinator.com and user ID 00u9mbewggjBpq4Pu0h7

17/02/2017 01:40:00,629 INFO  [UserProfileUtil:704] Processing user with email address= mtact20@mailinator.com, User Type= C, operation type= N, customerId= 10000001, companyName="Test Company 1", mobileNo=

I just want to display a list of all the email addresses with companyName of emails that appear in the first event type but do not appear in the second event type.

niketn · ‎03-15-2017

@proylea, I have updated my query to include companyName. Also changed rex regular expression to pull email for both the events. However, as stated earlier, please create Field Extractions for both Activating and Processing events which are not Key Value pairs like user ID in Activating event.

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"

proylea · ‎03-15-2017

Thanks again but the search returns no results.
DalJeanis has provided the answer.

niketn · ‎03-16-2017

@proylea... Please go ahead and accept his answer so that this question gets marked as Answered. Only difference with coming up with proper field extraction would have been that you would have got a chance to avoid join and also would have saved field extraction for easier use in search query as a knowledge object. I am glad you have a fix!!!

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"

adding another field after using set diff

Security Professional: Sharpen Your Defenses with These .conf25 Sessions

First Steps with Splunk SOAR

How To Build a Self-Service Observability Practice with Splunk Observability Cloud

Are you a member of the Splunk Community?

adding another field after using set diff

Security Professional: Sharpen Your Defenses with These .conf25 Sessions

First Steps with Splunk SOAR

How To Build a Self-Service Observability Practice with Splunk Observability Cloud