Re: How to write a search to return unique field v...

Bags · ‎04-12-2016

Hello,

I am trying to make a search that will return the messages from logs from one set, but not from the other. Unfortunately, I only want the unique results of one set, not the unique results of both of them. So I think that is akin to set A - Set B in set theory.

I tried:

| set diff [search tag=$env$ level=error earliest=-2d | dedup message | table message] [search tag=$env$ level=error | dedup message | table message]

Which works, but gives me the elements unique to both sets combined then, which isn't good for what I want.

So I tried to do the difference intersected with the "today" search. Kind of a round about way that would require more searching, but I wanted to get it working.

This is my attempt at diff intersect original search:

| set intersect [search tag=$env$ level=error earliest=4/9/2016:00:00:00 latest=now | dedup message | table message] [set diff [search tag=$env$ level=error earliest=4/9/2016:00:00:00 latest=now | dedup message | table message] | search NOT  [search tag=$env$ level=error | dedup message | table message ]] | return message

My syntax is incorrect, so I get

Error in 'set' command: Three arguments are required: [] [+(union) | -(diff) | ^(intersect)] [].

from Splunk. I tried a bunch of tweaks to the syntax, so I'm probably way off at this point. I'm not entirely sure what the syntax would be to pull this off, if it's possible at all.

Finally, I tried to follow this:
https://answers.splunk.com/answers/151315/how-to-find-differences-between-two-searches-with-set-diff...

My search was:

[search tag=$env$ level=error earliest=4/9/2016:00:00:00 latest=now | dedup message | table message] NOT [search tag=$env$ level=error | dedup message | table message]

It resulted in an error of:

Regex: regular expression is too large

Not sure what that means.

Anyways, I don't know if any of my searches are salvageable. To repeat, I basically want to pull all of the unique messages (or whichever field I choose) out of a given time period that did not exist before that time period. So for example:

Past:
message: Error 1
timestamp: 1/1/2016
message: Error 2
timestamp:1/2/2016

Today (or whatever time period is between earliest and latest):
message: Error 1
timestamp: 4/2/2016
message: Error 2
timestamp: 4/3/2016
message: Error 3
timestamp: 4/3/2016

*Final output of new unique messages: *
message: Error 3

Thanks.

somesoni2 · ‎04-12-2016

Try this

tag=$env$ level=error earliest=4/9/2016:00:00:00 latest=now  NOT [search tag=$env$ level=error | dedup message | table message]
| dedup message | table message

OR
Hardcoded date

tag=$env$ level=error | stats min(_time) as firstSeen by message | where firstSeen>=strftime("4/9/2016","%m/%d/%Y")

Relative Date

tag=$env$ level=error | stats min(_time) as firstSeen by message | where firstSeen>=relative_time(now(),"-2d@d")

Bags · ‎04-13-2016

The first one results in an error of "Regex: regular expression is too large error."

I ran the second query in verbose mode, but was getting tons of messages that also occurred before 4/9/2016. I got over 85,000 events matched before I stopped the query. If it returns as expected, there should be probably under 100 unique new messages since 4/9/2016.

Thanks for the attempt.

somesoni2 · ‎04-13-2016

Query 1: The subsearch would generate a nested OR list ( message=A OR message=B...) and there are lot of messages from the subsearch as it's running for all time, hence the error.

Query 2: When you say your got 85,000 events, did you see that number in Events tab OR in Statistics tab?

Bags · ‎04-13-2016

Would it be possible to fix the first search by breaking up the past queries into a bunch of individual, smaller searches? IE smaller search NOT 1 day ago NOT 2 day ago NOT 3 day ago... etc?

The statistics tab was empty, they all loaded in the events tab once I switched to verbose search.

Is there any way to do the set diff and then intersect that result set with the more narrow search? It's kind of messy and does 1 more search than needed, but I am wondering if that's what it would take to get this to work. Either way, I can't figure out the syntax to get nested set operations to work. I had posted my attempt at this in my original question.

Also, I want to make sure that it is only unique messages. That is, if the same message appears in two events with two different time stamps, only one of them returns. I noticed that set diff would consider something that is entirely the same except for time stamp as two different entries, so that is why I piped it through | table message before doing so, so it would only compare messages.

Thanks.

somesoni2 · ‎04-13-2016

Is this report a one time effort OR you would need to do frequently (weekly/monthly etc).

What I would suggest is to run one search that would capture all the messages that have occurred so far and store it in lookup table with a firstSeen date. Once this query is run and complete, we can utilize the result (stored in lookup table) to find the unique messages occurred after a certain date.

Lookup generation query

tag=$env$ level=error | stats min(_time) as firstSeen by message | outputlookup message_lookup.csv

Once this is run and completed, use this to find the unique messages after 4/9/16.

 | inputlookup message_lookup.csv | append [search tag=$env$ level=error earliest=4/9/2016:00:00:00 latest=now | stats min(_time) as firstSeen by message ] | stats min(firstSeen) as firstSeen by message  | where firstSeen>=strftime("4/9/2016","%m/%d/%Y")

Bags · ‎04-13-2016

I deleted the comment you replied to, sorry about that 🙂

So I tried

tag=actualtag level=error | dedup message | eval Period=if(_time>=strftime("4/9/2016","%m/%d/%Y"),"New","Old")   | stats values(Period) as Period by message | where mvcount(Period)=1 AND Period="New" | table _time, message

And I ended up with 7,132 events spanning from April 1st to April 14th (my time range was month to date) and nothing in statistics.

I also tried

tag=tagtagtag level=error | dedup message | eval Period=if(_time>=strftime("4/9/2016","%m/%d/%Y"),"New","Old")   | stats values(Period) as Period by message | where mvcount(Period)=1 AND Period="New"

which didn't have my modifications and got 5,000 events from April 5th to 11th. So again, not what I need 😞

somesoni2 · ‎04-13-2016

Loose the dedup command from above

Bags · ‎04-13-2016

(Sorry if I spam you inbox, but my new comments aren't appearing...)

I almost got it with this...

newTime comes out like this: 04-09-16 12:59:58

However, when I add the "where newTime>..." clause, I go from 200 results to 0 results. Which means I messed that up. New time isn't actually necessary, I just couldn't get the string format to match the original _time format of 2016-04-09 12:59:58.114. At any rate, I want to filter the table at that step to show only rows with time past the given MM/DD/YY/HH/MM. Don't really need seconds, but those can be included if it's easier that way.

I would like for earliest=4/9/2016:12:00:00 and the bolded in :: newTime>=strftime("04-09-16 12:00:00", "%m-%d-%y %H:%M:%S") to be able to be parametrized, if possible. I hope to attach a drop down value to it so the time can be picked in the dashboard. So it'd be something like earliest=$releaseTime$ and newTime>=strftime($releaseTime$, "%m-%d-%y %H:%M:%S"), essentially.

At any rate, my first goal is to get the proof of concept working, then I can worry about making it take paramters instead of hard coded times.

Bags · ‎04-13-2016

tag=tagtagtag level=error | eval Period=if(_time>=strftime("4/9/2016","%m/%d/%Y"),"New","Old")   | stats values(Period) as Period by message | where mvcount(Period)=1 AND Period="New"

Gave me a bunch of hits before and after 4/9. 114k between 4/7 - 4/11 (the big date range I chose). Nothing in stats.

=====================================================

I almost got it with this...

newTime comes out like this: 04-09-16 12:59:58

However, when I add the "where newTime>..." clause, I go from 200 results to 0 results. Which means I messed that up. New time isn't actually necessary, I just couldn't get the string format to match the original _time format of 2016-04-09 12:59:58.114. At any rate, I want to filter the table at that step to show only rows with time past the given MM/DD/YY/HH/MM. Don't really need seconds, but those can be included if it's easier that way.

I would like for earliest=4/9/2016:12:00:00 and the bolded in :: newTime>=strftime("04-09-16 12:00:00", "%m-%d-%y %H:%M:%S") to be able to be parametrized, if possible. I hope to attach a drop down value to it so the time can be picked in the dashboard. So it'd be something like earliest=$releaseTime$ and newTime>=strftime($releaseTime$, "%m-%d-%y %H:%M:%S"), essentially.

At any rate, my first goal is to get the proof of concept working, then I can worry about making it take paramters instead of hard coded times.

somesoni2 · ‎04-13-2016

And you're looking at statistics tab right??

Bags · ‎04-13-2016

What about these two?

|set diff [search tag=tagtagtag  level=error earliest=4/9/2016:10:00:00 latest=4/9/2016:12:00:00 | eval messageSubstring = substr(message,1, 100) | dedup messageSubstring | table  messageSubstring ] [search tag=tagtagtag level=error| eval messageSubstring = substr(message,1, 100) | dedup messageSubstring | table messageSubstring ] | table messageSubstring | outputlookup message_lookupNEW.csv

[search tag=tagtagtag  level=error earliest=4/9/2016:10:00:00 latest=4/9/2016:12:00:00 | eval messageSubstring = substr(message,1, 100) | dedup messageSubstring | table messageSubstring ] | join type=inner messageSubstring  [|inputlookup message_lookupNEW.csv]

The first one creates all of the unique pre-release and all of the unique post release messages.

The second one performs an inner join on the post messages and the subquery above, which would return all of the events that match, but for an inner join "The results of an inner join do not include events from the main search that have no matches in the subsearch", so all of the pre-release would not be returned.

Bags · ‎04-13-2016

This is something that will be ran once a month after a release to check for new errors introduced. At the moment we basically do it manually so it takes a lot of time to identify new errors by hand.

I ran

tag=$env$ level=error | dedup message | stats min(_time) as firstSeen by message | outputlookup message_lookup.csv

And then

| inputlookup message_lookup.csv | append [search tag=$env$ level=error earliest=4/9/2016:00:00:00 latest=now | stats min(_time) as firstSeen by message ] | stats min(firstSeen) as firstSeen by message  | where firstSeen>=strftime("4/9/2016","%m/%d/%Y")

and got 0 event matches with the second query.

Since set diff gives the unique elements to each search, combined, would it be possible to do a set diff and then filter out results that only occurred after a given time? At the moment I can only get set diff to work properly by filtering out only the message and throwing away everything like the time stamps. Without the time stamps, obviously I can't throw out the ones after a certain time. I am not sure if there's a way to get diff to ignore everything but the message, but keep the time stamps?

Thanks.

How to write a search to return unique field values for a certain time range that did not exist before that period?

Get the T-shirt to Prove You Survived Splunk University Bootcamp

Introducing the Splunk Community Dashboard Challenge!

Wondering How to Build Resiliency in the Cloud?