I need to query the logs to find records with names that only exists in the current month but not in the previous month (or ever).
The SQL equivalent would be something on these lines
select * FROM visitors where datevisit >= '10/1/2014' and datevisit <= '10/31/2014' AS Oct_Visits
Outer join visitors where datevisit >= '9/1/2014' and datevisit <= '9/30/2014' as Sep_Visits
ON Oct_Visits.name = Sep_Visits.name AND Sep_Visits.name = null
Appreciate your help!
Thanks,
Vijay
Give this a try
source=visitors earliest=-4mon@mon latest=@mon | eval Month=strftime(_time,"%Y-%m") | stats values(Month) as Month by name | where mvcount(Month)=1 AND Month=[|gentimes start=-1 | addinfo | eval search="\"".strftime(relative_time(info_max_time,"-1mon"),"%Y-%m")."\"" | return $search]
This is a little crazy, but it does most of the job. I think some tweaking might be in order, though:
info_max_time
with info_search_time
will make it dependent on when you actually execute the search, not the latest time in the time picker window. Otherwise, if your time picker has a range in it that doesn't extend to a day in the current month, your results will be off.stats values(Month)
with stats count values(Month)
will add the event count information you seem to be looking for.| fields - Month
will remove the Month
column from the result table, so you'll just have event count by name
. Alternatively, if you just want a total count of events rather than a count by name
, you can add | stats sum(count) as count
instead.Something like this:
sourcetype=visitors earliest=@mon NOT [search sourcetype=visitors earliest=-1mon@mon latest=@mon | table name | dedup name]
Add the following to your main search:
earliest=@mon
This tells your search to "snap" the earliest
time in your search to the beginning of the current month.
See http://docs.splunk.com/Documentation/Splunk/6.1.4/Search/Specifytimemodifiersinyoursearch for more details on this subject.
I tried:
source=visitors earliest=-1mon@mon latest=@mon NOT [search source=visitors earliest=-4mon@mon latest=-1mon@mon | table name | dedup name] | stats count
The result is same with or without the NOT clause. It appears that when the query runs, the time frame is set to last month and hence the inner search does not kick in.
Shouldn't be. The inner search is actually run first. As a subsearch, this creates something like ((name="a") OR (name="b") OR (name="c"))
. By putting a NOT
in front of this, you should be negating the contents of that, so that you only see things that don't match it.
What results do you get when you just do the following:
source=visitors earliest=-4mon@mon latest=-1mon@mon | table name | dedup name | format
By adding the format
command to the end, you should get the string that is generated as the subsearch for your main search above. Is it what you think it is supposed to be?
(Also note, this may not work with a real-time search. Choose any other kind of search with the time picker to see the results properly.)
When I run this individually, the result list looks okay. I haven't tried the format that you suggested but I see results come back when I run just "source=visitors earliest=-4mon@mon latest=-1mon@mon | table name | dedup name" with "All time" option.
It is only when I execute "source=visitors earliest=-1mon@mon latest=@mon NOT [search source=visitors earliest=-4mon@mon latest=-1mon@mon | table name | dedup name] | stats count" that the result doesn't look accurate.
Also, I see the following lines above the results window that tell me the timeframe is not being picked up correctly.
[subsearch]: Your timerange was substituted based on your search string
[subsearch]: No matching fields exist
≥ 1 result during September 2014
If I run the sub-query on its own, the timeframe indicated is correct "June through August"
Right, because the main search is during September. That is what you would expect to see.
What makes you believe the result doesn't "look accurate?" By adding a stats count
to the end of it, you're not inspecting the data but guessing based on a statistic. You need to inspect the data in order to see whether it's actually doing what you want or not.
Instead of doing a stats count
at the end, why not replace it with a table name | dedup name
there, too? That should at least get you some data to inspect: it should be a list of names that logged in during September that did not log in between June and August. Once you have that list, do some searches between June and August on those names, and they shouldn't show up there.
These two queries return the same result as a list and count.
source=visitors earliest=-1mon@mon latest=@mon NOT [search source=visitors earliest=-4mon@mon latest=-1mon@mon | table name | dedup name]
source=visitors earliest=-1mon@mon latest=@mon
I did a spot check for some of the results that show up. They exist in the June-aug range as well.
You still haven't run the format
version of the subsearch, as listed above. What does it return?
(Also, is the field name you're looking for really "name"? If not, what is it really?)
Sorry, I wasn't explicit. I ran the query with format turned on as well. If I run the inner query on its own, then it returns accurate data.
However, in following syntax, it seems to have no bearing on the results. The following two queries return identical result (count, format, table, ...)
source=visitors earliest=-1mon@mon latest=@mon NOT [search source=visitors earliest=-4mon@mon latest=-1mon@mon | table name | dedup name]
source=visitors earliest=-1mon@mon latest=@mon
The actual field is "Email Address".
source=emailusers earliest=-1mon@mon latest=@mon NOT [search source=visitors earliest=-4mon@mon latest=-1mon@mon | table "Email Address" | dedup "Email Address"]
I hate to ask the "is it plugged in" question, but in that last search in your response with the "Email Address" field, you're not really passing source=visitors
in the subsearch, are you? If your real source is emailusers
then that needs to be emailusers
also.
Assuming that's the case, it is possible that you could get the same response from your first and second searches if, in fact, every one of your visitors from September did not visit you between June and August. But you are asserting that there are visitors from September that show up between June and August. I'm wondering if you aren't running into a subsearch limit problem...
Given that I can't seem to come up with a way to make this work for you, you should try @somesoni2's answer below (with my suggested tweaks).
Thanks but how should I go about joining the results from two different time ranges.