Splunk Search

adding another field after using set diff

proylea
Contributor

Hi Splunkers

I have a set of results from using set diff which is all good.
I am now wanting to output another field from those results and need some help.

This is the set diff query:

| set diff [search index=sit_ces "Processing user with email address" | rex "(?i)Processing user with email address= (?P<processuser>[^,]+)" | dedup processuser | table processuser] [search index=sit_ces "Activating user with the email address" | rex "(?i)Activating user with the email address (?P<activateuser>[^ ]+)" | dedup activateuser | table activateuser]

It outputs a list of email addresses which are the field processuser minus the activateuser results

Here's what I'm having trouble with:
There is a field in the first result set called companyName, I would like to table that field with the results from set diff, so I end up with

processuser      companyName
email            company
email            company
email            company
etc               etc

The result of this would be a table with users that haven't finished an activation process.

Tags (1)
0 Karma

proylea
Contributor

Thanks everyone for all the responses you guys are great, the Splunk community is such a fantastic resource.

0 Karma

woodcock
Esteemed Legend

I fixed my answer thanks to @DalJeanis. You should upvote any working answers and click Accept on the best one.

0 Karma

woodcock
Esteemed Legend

Stick to stats and avoid all the subsearch based commands; try this:

index=sit_ces "Processing user with email address" OR "Activating user with the email address"
| rex "(?i)ing user with(?: the)? email address=? (?<user>[^,]+)"
| stats values(*) AS * count(eval(searchmatch("Processing user with email address"))) AS countP count(eval(searchmatch("Activating user with the email address"))) AS countA BY user
| search countA=0
| table user companyName
0 Karma

proylea
Contributor

Thanks Woody, unfortunately the search still returns all 28 emails with companyName, there are only 25 that have not been activated.
DalJeanis has provided the answer

0 Karma

woodcock
Esteemed Legend

Try it again; it had a bug and I fixed the original answer.

0 Karma

proylea
Contributor

The search still returns all 28 email addresses.
There are 3 that appear in both events types so the number of rows returned should be 25

0 Karma

woodcock
Esteemed Legend

I had another mistake. I updated it to fix that. Try one more time.

0 Karma

woodcock
Esteemed Legend

@proylea be sure to come back here, try all the answers, UpVote any answers/comments that were helpful and then pick the best one and click Answer to close the question and help others.

0 Karma

DalJeanis
SplunkTrust
SplunkTrust

The issue was that the rex in line 2 will only find the processing line. Something more like this would be needed to make it work for both.

| rex "(?i)ing user with (the )?email address[= ]+(?<user>[^, ]+)"

woodcock
Esteemed Legend

ACK! I did not notice the the! You are correct. I have updated my answer and then it should work AND not have any transaction nor subsearch limits. What a silly log variation.

0 Karma

DalJeanis
SplunkTrust
SplunkTrust

the = is also different between the two.

0 Karma

woodcock
Esteemed Legend

ARGH! Updated again. Hopefully it works now!

0 Karma

DalJeanis
SplunkTrust
SplunkTrust

You can always use a left join instead of set diff.

index=sit_ces "Processing user with email address" 
| rex "(?i)Processing user with email address= (?P<processuser>[^,]+)" 
| dedup processuser 
| table processuser  companyName
| join type=left processuser  
    [ search index=sit_ces "Activating user with the email address" 
    | rex "(?i)Activating user with the email address (?P<activateuser>[^ ]+)" 
    | dedup  activateuser 
    | table  activateuser 
    | rename activateuser as processuser 
    | eval foundflag="ignoreme"]
|where isnull(foundflag)

proylea
Contributor

Thanks Dal
The search is spot on, returns all emails with companyName that have not completed activation.

Cheers

0 Karma

DalJeanis
SplunkTrust
SplunkTrust

Sure. There's LOTS of ways to join two files in splunk (join, lookup, set, append, and "stew"...dumping them into a single search as below) so choosing the one that's the easiest or most readable is often the way to go for moderate amounts of data.

If you were comparing HUGE files, then you'd run into the limits on subsearches, so you'd do it by dropping all the records into one search and then using stats to clean it all up, like so

 index=sit_ces ("Processing user with email address"  OR "Activating user with the email address" )
 | rex "(?i)Processing user with email address= (?P<processuser>[^,]+)" 
 | rex "(?i)Activating user with the email address (?P<activateuser>[^ ]+)" 
 | eval user=coalesce(processuser,activateuser)
 | stats values(processuser) as processuser, values(activateuser) as activateuser, values(companyName) as companyName by user
 |where isnull(activateuser)

The key here is to make sure there is a common field in the two types of records so that you can get them connected. In this case, that was trivial, so assuming the above gets you what you want, then I'd use it in preference to the join.

0 Karma

niketn
Legend

[Updated]

Based on the changes in Activating and Processing events, I have changed the rex command. However, you should definitely use Field Extractions for the two events.

Since companyName is Key Value pair, I expect the same to be available during Search Time as Interesting Field.
I have added companyName in the stats command.


If your goal is to find the users who have Processing action but no Activation action event you can try the following:

index=sit_ces "Processing user with email address" OR "Activating user with the email address"
|  rex "(?i)(?<action>\w+) user with[\w|\s]+email address[\s|\=]*(?<email>[\w|\d|\.|\@]+)"
| stats count as actionevents values(action) as action min(_time) as EarliestAction max(_time) as LatestAction values(companyName) as companyName by email
| search actionevents>1 action="Processing" and action="Activating"
| eval duration=EarliestAction=LatestAction
| fieldformat EarliestAction =strftime(EarliestAction,"%c")
| fieldformat  LatestAction =strftime(LatestAction,"%c") 

PS: I have used simple regex for the example, however, you should try Extract new fields from search results to make use of
Splunk's Interactive Field Extraction to come up with regular expression as Field Extraction knowledge object. That way above fields will be available at search time and you do not have to explicitly perform rex command (easy maintenance of code).

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma

proylea
Contributor

Thanks for responding.
But I already have the list of users that have not activated I just want to include the comanyName field in the output.

Here is a sample of the 2 types of events

17/02/2017 01:51:38,681 INFO  [UserAccountController:228] Activating user with the email address mtact20@mailinator.com and user ID 00u9mbewggjBpq4Pu0h7

17/02/2017 01:40:00,629 INFO  [UserProfileUtil:704] Processing user with email address= mtact20@mailinator.com, User Type= C, operation type= N, customerId= 10000001, companyName="Test Company 1", mobileNo= 

I just want to display a list of all the email addresses with companyName of emails that appear in the first event type but do not appear in the second event type.

0 Karma

niketn
Legend

@proylea, I have updated my query to include companyName. Also changed rex regular expression to pull email for both the events. However, as stated earlier, please create Field Extractions for both Activating and Processing events which are not Key Value pairs like user ID in Activating event.

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma

proylea
Contributor

Thanks again but the search returns no results.
DalJeanis has provided the answer.

0 Karma

niketn
Legend

@proylea... Please go ahead and accept his answer so that this question gets marked as Answered. Only difference with coming up with proper field extraction would have been that you would have got a chance to avoid join and also would have saved field extraction for easier use in search query as a knowledge object. I am glad you have a fix!!!

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma
Get Updates on the Splunk Community!

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...