We have a lookup file that has a list of series stored in a field — TS_SERIES_ID. We want to find the count of series that don't exist in logs and we used the below query to achieve the same.
| inputlookup tss_usage_csv | table TS_SERIES_ID
| search NOT [search index=web_timeseries | mvexpand SeriesUT.series{} | fields SeriesUT.series{} |rename SeriesUT.series{} as TS_SERIES_ID] | stats distinct_count(TS_SERIES_ID)
Issue : The results are getting truncated because we cannot have more than 10K results from the subsearch (We need this value to be around 300K and maximum can be set to 10500 in limits.conf).
Can you please let us know if there is any other way to achieve this?
Thanks
index=web_timeseries | mvexpand SeriesUT.series{} | fields SeriesUT.series{} |rename SeriesUT.series{} as TS_SERIES_ID]
| lookup tss_usage_csv TS_SERIES_ID
|where isnull(TS_SERIES_ID) or TS_SERIES_ID==""
| stats distinct_count(TS_SERIES_ID)
Don't use subsearch and inputlookup
Like this:
index=web_timeseries
| stats first(_time) BY SeriesUT.series{}
| rename SeriesUT.series{} AS TS_SERIES_ID
| eval which="search"
| apendpipe [
|inputlookup tss_usage_csv
| table TS_SERIES_ID
| eval which="lookup" ]
| stats dc(which) AS whichCount values(which) AS whiches BY TS_SERIES_ID
| where whichCount==1 AND whiches=="lookup"
This will be faster and scale better than any of the other solutions that might work.
How can we display the count of distinct series id instead of displaying all 3 columns?
Nice one. Just note the typo in apendpipe, you're missing a p there 🙂
Why not use natural usage of lookup? If you add a control value for a match, you can do it easy.
| inputlookup mylook
| eval flag="y"
| outputlookup mylook
Then in the search:
| amazing search...
| lookup mylook ... OUTPUT flag
| search NOT flag="*"
It will run in the indexer because the late search but you will have an output of values that are not in the lookup.
Try the search below. It simply appends the 2 lists of series together (so no expensive joins or restricted sub searches) and then counts the total occurrence of each TS_SERIES_ID, filtering for those serie IDs that came from the lookup and only occurred once in the combined list (ie they did not occur in the list from the logs).
| inputlookup tss_usage_csv
| table TS_SERIES_ID
| eval source=lookup
| append [
search index=web_timeseries | mvexpand SeriesUT.series{} | fields SeriesUT.series{} |rename SeriesUT.series{} as TS_SERIES_ID | eval source=logs ]
| eventstats count by TS_SERIES_ID
| where source=lookup count=1
Not working.Getting below error :-
Subsearch produced 50000 results,truncating to maxout 50000
Ok, my bad, apparently that limit also applies when doing append.
Is the lookup also that big? Otherwise, try swapping log search and inputlookup:
index=web_timeseries | mvexpand SeriesUT.series{} | fields SeriesUT.series{} |rename SeriesUT.series{} as TS_SERIES_ID | eval source=logs
| append [
| inputlookup tss_usage_csv
| table TS_SERIES_ID
| eval source=lookup ]
| eventstats count by TS_SERIES_ID
| where source=lookup count=1
Does that log search return only unique series, or is that full of duplicates? Then you could also throw in a dedup (as you're only interested in the unique series anyway.
The lookup has unique values.
Can we increase the limit for subsearch from existing 50K to 500K? Will that impact performance or affect other searches?
Lookup has 571007 rows 😕
Any other way to achieve this?
HI @AnmolKohli
Can you please try this?
| inputlookup tss_usage_csv | table TS_SERIES_ID
| join type=left TS_SERIES_ID [ search index=web_timeseries | mvexpand SeriesUT.series{} | fields SeriesUT.series{} |rename SeriesUT.series{} as TS_SERIES_ID | eval temp=1 ] | where isnull(temp) | stats distinct_count(TS_SERIES_ID)
Not working.Getting below error :-
Subsearch produced 50000 results,truncating to maxout 50000
HI @AnmolKohli
Can you please try below searches?
| inputlookup tss_usage_csv
| table TS_SERIES_ID
| eval temp1=1
| append
[ search index=web_timeseries
| mvexpand SeriesUT.series{}
| fields SeriesUT.series{}
| rename SeriesUT.series{} as TS_SERIES_ID
| eval temp2=1 ]
| stats values(temp1) as temp1 values(temp2) as temp2 by TS_SERIES_ID
| where isnull(temp2)
| stats count(TS_SERIES_ID) as count
AND
index=web_timeseries
| mvexpand SeriesUT.series{}
| fields SeriesUT.series{}
| rename SeriesUT.series{} as TS_SERIES_ID
| eval temp2=1
| append
[ | inputlookup tss_usage_csv
| table TS_SERIES_ID
| eval temp1=1 ]
| stats values(temp1) as temp1 values(temp2) as temp2 by TS_SERIES_ID
| where isnull(temp1)
| stats count(TS_SERIES_ID) as count
Same error 😞
Can you please provide Output from below searches?
index=web_timeseries
| mvexpand SeriesUT.series{}
| fields SeriesUT.series{}
| rename SeriesUT.series{} as TS_SERIES_ID
| stats distinct_count(TS_SERIES_ID) as count
| inputlookup tss_usage_csv
| table TS_SERIES_ID
| stats distinct_count(TS_SERIES_ID) as count
Running the query for last 7 days :
Output of #1 - 215830
Output of #2 - 571007
Both values are correct. We just need to subtract them now (not sure how)
HI @AnmolKohli
Can you please try below searches?
index=web_timeseries
| mvexpand SeriesUT.series{}
| fields SeriesUT.series{}
| rename SeriesUT.series{} as TS_SERIES_ID
| eval temp2=1
| append
[
| inputlookup tss_usage_csv
| table TS_SERIES_ID
| eval a=1 | accum a | eval subset=a%50000 | stats values(TS_SERIES_ID) as TS_SERIES_ID by subset
| eval temp1=1
]
| mvexpand TS_SERIES_ID
| stats values(temp1) as temp1 values(temp2) as temp2 by TS_SERIES_ID
| where isnull(temp1)
| stats count(TS_SERIES_ID) as count
Output is not correct.
Any other way to achieve this?