Splunk Search

How to search by user ID?

BenWilliamson1
New Member

My data consists of individual messages, tagged with the userID of the user who sent them. I want to count the number of users who say "burger" and "fries", but not necessarily in the same message.

In the example

UserID Message
1 "I'd like to order a burger"
2 "The weather is nice"
1 "I'd also like some fries"
2 "I'd only like a burger"

 

User 1 should be counted by user 2 shouldn't.

I believe a way to do this would be inner joining by the userID on two separate searches

 

 

 

index=idx_chatbot_gb_p  component=chatbot-ms
| spath "userID"
| spath input=payload output=Message path=messages.message{}.plaintext
| search (Message=* burger *)
| join type=inner userID [ seach (Message=* fries *) ]

 

 

 

 

I get zero results when I try this, even though I get results on the individual searches and many users order burgers and fries.

Does anyone know a better way to do this or can spot what I've done wrong? Thanks

Labels (2)
Tags (1)
0 Karma

PickleRick
SplunkTrust
SplunkTrust

Firstly, your syntax is off.

 

| search (Message=* burger *)

 

This says (filling all implicit operators) the same as

| search Message=* AND burger AND *

Which means that it looks for an event that fulfills all the conditions at the same time

  • The Message field of the event contains anything
  • The event contains the word "burger"
  • The event contains anything at all

It's still done after the spath operation which is quite costly because it is not a simple delimiter-based extraction but has some heavier logic applied to it so it's best to filter the data as early as possible so if you're looking for the words "fries" and "burger" it's best to filter all other events out at the very beginning of your pipeline.

Furthermore, you're trying to do some join which is neither needed here nor properly specified since I assume you would want to operate on the Message field (the above remarks about your search apply also to the fries search), but the subsearch used in join command is a completely separate data procesing path so it doesn't know about what you parsed out in the main path (in short - the subsearch doesn't know what you did with spath).

I assume you have some background with relational DB's because you're eager to use the join command which in splunk should be used very sparingly and in very specific cases. Usually you can do without it and be just fine.

In your case I'd do something like that:

1. Filter the events so you only get the events containing words "burger" or "fries" (due to splunk's indexing it's a relatively fast operation reliably limiting the amount of data you're dealing with later)

index=idx_chatbot_gb_p component=chatbot-ms ("burger" OR "fries")

 2. Parse the events with spath as you did before:

 

| spath "userID"
| spath input=payload output=Message path=messages.message{}.plaintext

 

3. Now you can focus on clasifying your data regarding fries and burgers. For example like this

| eval hasFries=if(like(Message,"%fries%"),1,0)
| eval hasBurger=if(like(Message,"%burger%"),1,0)

This approach is relatively fast but has the drawback of accepting thing like "wordwithfries", due to wildcard pattern matching. If you want to be more strict about mathing the strings you could use regex-based commands but they are more CPU-intensive.

For now let's assume that this approach is sufficient. You have each event classified as having fries or burgers in it (or none at all because the words might have appeared in a completely different part of the event)

So we need to check which users used which word.

4. Let's summarize things up a little

| stats values(hasFries) as hasFries values(hasBurger) as hasBurger by userID

After this you'll have a table of users and fields hasFries and hasBurger will contain zeroes if there was a message from this userID where there was no fries or burgers respectively and one if there was such message.

5. Now we only need to search for the users which uttered both of those terrible words

| search hasFries=1 hasBurger=1

So effectively your search will look like this:

"burger" OR "fries"
| spath "userID"
| spath input=payload output=Message path=messages.message{}.plaintext
| eval hasFries=if(like(Message,"%fries%"),1,0)
| eval hasBurger=if(like(Message,"%burger%"),1,0)
| stats values(hasFries) as hasFries values(hasBurger) as hasBurger by userID
| search hasFries=1 hasBurger=1

 

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @BenWilliamson1,

please try something like this (I used your extraction supposing that are correct):

index=idx_chatbot_gb_p  component=chatbot-ms (burger OR fries)
| spath "userID"
| spath input=payload output=Message path=messages.message{}.plaintext
| eval type=case(searchmatch="burger" AND searchmatch="fries","Both", searchmatch="burger" AND searchmatch!="fries","burger",searchmatch!="burger" AND searchmatch="fries", "fries")
| stats values(type) AS type BY userID

Ciao.

Giuseppe

0 Karma
Get Updates on the Splunk Community!

.conf25 Community Recap

Hello Splunkers, And just like that, .conf25 is in the books! What an incredible few days — full of learning, ...

Splunk App Developers | .conf25 Recap & What’s Next

If you stopped by the Builder Bar at .conf25 this year, thank you! The retro tech beer garden vibes were ...

Congratulations to the 2025-2026 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...