Compare two query results and run third query

maramk · ‎11-04-2021

Hi,

I have a log file looks like below. In first block of logs i need to extract x value1 and in second block of logs i need to extract the x value2. If both the values matches i need to run another query to get the output of proj_id.

Logs:

Code=Info
words=check
text=Checking for messages... received \x1B[0;m job\x1B[0;x=value1 proj_id=edcbidh

Code=Info
words=check
text=\x1B[0;33mwarning: failed \x1B[0;m \x1B[0;33mduration\x1B[0;m=00.0006ms \x1B[0;33mjob\x1B[0;x=value2 \x1B[0;

Any help is appreciated.

Thanks.

maramk · ‎11-05-2021

@yuanliu ,

Can you please suggest separate rex commands to extract x1 and x2 values individually from the field 'text' in the same event?

Thanks.

yuanliu · ‎11-05-2021

Yes, I'll make some assumptions about your log format.

| rex "text=Checking.+x=(?<x1>\w+) proj_id=(?<proj_id>\w+)"
| rex "text=.*warning:.+x=(?<x2>\w+)"
| table x1 x2 proj_id _time

maramk · ‎11-04-2021

@richgalloway

Thanks for replying to my question. The reason i need to run third query is to extract project_id is only when the x values (value1, value2) matches from two log blocks. Both values should be same to extract the project_id

yuanliu · ‎11-04-2021

when the x values (value1, value2) matches from two log blocks. Both values should be same to extract the project_id

Could you clarify if each of these "log blocks" is ingested as its own event or do these blocks exist in the same event? From your example, they are likely in different events.

If this is the case, @richgalloway's solution will work; alternatively you can use eventstats for the same purpose if you want to preserve events.

| eventstats values(proj_id) as proj_id count by x
| where count > 1

If they are unfortunately mingled in one event, you can use mvcount() to determine if x has more than one value, like

| eventstats values(proj_id) as proj_id by x
| where mvcount(x) > 1
| eventstats dc(x) as x_values by proj_id _time
| where x_values = 1

After this, all logs that only have one "log block" containing x will be removed; any single event that contains non-matching x will also be removed.

Caveat: I throw in _time in the last eventstats because your example really doesn't give any additional identifying feature to single out event. _time is OK only if your log do not have complex timestamp overlaps. If, say, several events have the same _time, but some of them have matching x and some non-matching, the search will not be very accurate.

In short, if those "log blocks" are currently in the same event, I strongly suggest that your admin adjust indexing to make sure they are separate. If that is not feasible, your developer should add some unique event identifier so we don't rely on _time.

maramk · ‎11-04-2021

@yuanliu Both the log bocks are from the same event

richgalloway · ‎11-04-2021

This is good information to include in the OP.

Given this new information, I suggest this query.

index=foo Code=Info words=check text=*
```Extract x and proj_id.  Delete the next two lines if the fields are extracted already.```
| rex field=text max_match=0 "x=(?<x>\S+)"
| rex field=text "proj_id=(?<proj_id>\S+)"
```Discard results that don't have 2 x values```
| where mvcount(x) < 2
```Keep results where all values of x are the same```
| where mvcount(mvdedup(x)) = 1

---
If this reply helps you, Karma would be appreciated.

richgalloway · ‎11-04-2021

Have you tried the two solutions provided so far? If so, how do they not meet your expectations?

---
If this reply helps you, Karma would be appreciated.

yuanliu · ‎11-04-2021

By default, Splunk indexer will extract the string after equal sign (=) into a variable preceding the sign. So, in the first event, you already have variable x and variable proj_id; in the second, you already have x. I interpret your question as a need to make proj_id available to all events whose x value matches that of the first event, i.e., value1.

My favourite method is to use stats in order to speed up subsequent operations. Not knowing your application context, I can suggest using eventstats, like

| eventstats values(proj_id) as proj_id by x

richgalloway · ‎11-04-2021

Why run a third query when you already have proj_id?

index=foo Code=Info words=check text=*
```Extract x and proj_id.  Delete the next two lines if the fields are extracted already.```
| rex field=text "x=(?<x>\S+)"
| rex field=text "proj_id=(?<proj_id>\S+)"
```Group the results by x```
| stats values(proj_id) as proj_id count by x
```Discard results that didn't come from 2 records```
| where x > 1

---
If this reply helps you, Karma would be appreciated.

maramk · ‎11-04-2021

Just FYI,

Both the log blocks are from the same event. Also, in the command you suggested, where do the splunk query is comparing the values of x (value1 and value2).

yuanliu · ‎11-04-2021

where do the splunk query is comparing the values of x (value1 and value2).

The approach we suggested is different from string-to-string comparison. The groupby clause automatically sort by the groupby variables, in this case, x. So, if x has two different values value1 and value2, they would be sorted into different rows; using the where command, rows that do not meet your criteria will be removed.

You can use some (quirky) forced regex to do direct comparison, like

| rex max_match=0 "text=.+x=(?<x1>\w+).*proj_id=(?<proj_id>\w+).*\n\s.*Code=(?<code>\w+).*\swords=(?<words>\w+).*\stext=.+x=(?<x2>\w+)"
| where isnotnull(x2) AND x1 == x2

This example depends on the very strict structure of your example. A more flexible regex might be

| rex max_match=0 "text=.+x=(?<x1>\w+).*proj_id=(?<proj_id>\w+).*\n(\s.*)*\stext=.+x=(?<x2>\w+)"
| where isnotnull(x2) and x1 == x2

As I noted, indexer should have extracted proj_id already, so the above can be further simplified to

| rex max_match=0 "text=.+x=(?<x1>\w+).*\n(\s.*)*\stext=.+x=(?<x2>\w+)"
| where isnotnull(x2) and x1 == x2

Just note that such calculations are more expensive than using stats groupby.

maramk · ‎11-04-2021

Hi @yuanliu ,

I tried to run the 3rd command as you suggested. But the splunk not returning any values for x1 and x2. Both the values are returning as null. can you please check if some error in the query based on the logs i provided?

Thanks.

yuanliu · ‎11-04-2021

The search was tested using makeresults; you can try this too:

| makeresults 
| eval _raw = "Code=Info
words=check
text=Checking for messages... received \x1B[0;m job\x1B[0;x=value1 proj_id=edcbidh

Code=Info
words=check
text=\x1B[0;33mwarning:  failed \x1B[0;m \x1B[0;33mduration\x1B[0;m=00.0006ms \x1B[0;33mjob\x1B[0;x=value2 \x1B[0;"

The above simulates the input you posted. If you follow that with

| rex max_match=0 "text=.+x=(?<x1>\w+).*\n(\s.*)*\stext=.+x=(?<x2>\w+)"

the output is

_raw

_time

x1

x2

Code=Info
words=check
text=Checking for messages... received \x1B[0;m job\x1B[0;x=value1 proj_id=edcbidh

Code=Info
words=check
text=\x1B[0;33mwarning:  failed \x1B[0;m \x1B[0;33mduration\x1B[0;m=00.0006ms \x1B[0;33mjob\x1B[0;x=value2 \x1B[0;

2021-11-04 14:25:42

value1

value2

If the search doesn't work with your actual search output, it is likely that some unprintable characters is in the way. You will need to tweak the regex a bit more; maybe first lose the second extraction as a test strategy, etc. (I.e., just |rex max_match=0 "x=(?<x1>\w+)" to see if it would extract the two values.)

Some additional pointers about debugging:

max_match=0 is unnecessary for this strategy; it was a vestige from my initial testing strategy. You can take advantage of it during debugging
\s following \n in my example is used to signal start of a newline; subsequent line starts are signaled with \s alone. \n(\s.*)*, therefore, matches any number of lines of text between the two x= expressions. I found this trick in this forum as I don't recognise it as PCRE. But it works with the generated data.
You can also lose text= to further loosen the search. Just beware that the loose the search, the more possible false positives.

maramk · ‎11-04-2021

hi @yuanliu ,

I tried this command as you suggsted, but its not returning any values for x1 and x2. As per your command, you posted some data to splunk and retrieving it "_raw" data. But for me, the query is looking to extract the values from event data. May be thats the reason that command is not working for me.

yuanliu · ‎11-05-2021

@maramk _raw field always includes complete event data, hence the emulation.

It is generally very difficult to cut hair over the phone line, as they say. So bear with me if you want to go debug step by step. For your events, try this first:

| rex max_match=0 "text=.+x=(?<x1>\w+)"
| table x x1 proj_id _time Code words m

This tests two things:

Does your indexer automatically extract variables based on key=value pattern?
Is there anything preventing the simplest rex extraction from happening?

Compare two query results and run third query

regex

rex

subsearch

Cultivate Your Career Growth with Fresh Splunk Training

Introducing a Smarter Way to Discover Apps on Splunkbase

How to Send Splunk Observability Alerts to Webex teams in Minutes

Are you a member of the Splunk Community?

Compare two query results and run third query

regex

rex

subsearch

Cultivate Your Career Growth with Fresh Splunk Training

Introducing a Smarter Way to Discover Apps on Splunkbase

How to Send Splunk Observability Alerts to Webex teams in Minutes