About w564432

w564432 · ‎05-12-2025

Reviving a dead thread but I believe your solution is precisely the same as @rocketboots_ser's, except using _time rather than a fixed time. In either case the changing of time zones doesn't affect the outcome. A more compact rewrite is this: | eval time_UTC = strftime(2 * _time - strptime(strftime(_time, "%F %TZ"),"%F %T%Z") Which relies on the same tricking of strptime() into thinking the output of strftime() is in UTC with the %Z variable. Doesn't matter if you use _time or 2000-01-01 as long as you're consistent.

w564432 · ‎04-03-2025

Sorry kind of fell off there, but just wanted to update in case others see this. Basically the problem is for the "fully populated" case. For fully populated data, why not use this? index=example | stats avg(field1) perc95(field2) by x,y,z a,b,c I may not have been very clear here, but basically this would not work because what I'm looking for is: avg(field1) perc95(field2) x y z a b c f1g1 10 20 30 f2g2 1 2 3 f1g3 40 50 60 f2g4 4 5 6 Here we have agg stats for four groups, g1to g4. For example g1 represents the stats for the grouping x=10, y=20, z=30, a=*, b=*, c=*, and g4 represents the stats for the group of transactions with x=*, y=*, z=*, a=4, b=5, c=6. Just a stats doesn't help us here because of overlap, for instance g1 contains events of g2 (g1 contains events with a=1,b=2,c=3 and g2 contains events with x=10,y=20,z=30)

w564432 · ‎10-24-2023

Yes, field1, field2, x,y,z,a,b,c are all from the same set of events and are all non-null, and in general, we might have other groupbys besides xyz and abc -- in one of my frequent use cases I have three: x, xy, and xyz, for instance (say, when I want to calculate statistics with different levels of granularity -- e.g. percentile response times by hour, or hour-IP, or hour-IP-server ). I guess the question is rather more of a data-engineering problem rather than an analytics one: regardless of if we want two tables or one, how do we generate the data in a fast way? As it happens, doing two or more separate searches is significantly slower than, say, running one and doing some fancy stats magic on it, even if it's more complicated. Also just out of curiosity, what do we mean by normalized tables here?

w564432 · ‎10-24-2023

I often run into a case where I find I need to take the same dataset and compute aggregate statistics on different group-by sets, for instance if you want the output of this: index=example | stats avg(field1) by x,y,z | append [ index=example | stats perc95(field2) by a,b,c ] I am using the case n=2 groupbys for convenience. In the general case there are N groupbys, and arbitrary stats functions... what is the best way to optimize this kind of query, without using append (which runs into subsearch limits)? Some of the patterns I can think of are below. One way is to use appendpipe. index=example | appendpipe [ | stats avg(field1) by x,y,z ] | appendpipe [ | stats perc95(field2) by a,b,c ] Unfortunately this seems kind of slow, especially once you start having to add more subsearches and preserving and passing a large number of non-transformed events throughout the search. Another way is to use eventstats to preserve the events data, finishing it off with a final stats. index=example | eventstats avg(field1) as avg_field1 by x,y,z | stats first(avg_field1) as avg_field1, perc95(field2) by a,b,c Unfortunately this is not much faster. I think there is another way using streamstats in place of eventstats, but I still haven't figured out how to retrieve the last event without just invoking eventstats last() or relying on an expensive sort. Another way I've tried is intentionally duplicating your data using mvexpand which has the best performance by far. index=example ```Duplicate all the data``` | eval key="1,2" | makemv delim="," key | mvexpand key ```Set groupby = concatenation of groupby field values``` | eval groupby=case(key=1,x.",".y.",".z, key2=a.",".b.",".c, true(), null()) | stats avg(field1), perc95(field2) by groupby Are there any other patterns that are easier/faster? I'm curious as to how Splunk processes things under the hood, I know something called "map-reduce" is part of it but would be curious to know if anyone knows how to optimize this computation and why it's optimal in a theoretical sense.

w564432 · ‎06-13-2023

Just wanted to come back to this in case anyone else reads this. Although this solution works , it doesn't appear to be much faster than if you searched the entire timeframe between the absolute earliest and absolute latest. For instance, if you were searching a year but only needed to sample a few sparse holidays, I believe the performance is closer to searching the whole year than running separate searches for the holidays only.

w564432 · ‎03-20-2023

Not sure actually, I am actually not the admin of our cluster but I know both of those things (reassign captain, rolling restart) have happened between then and now either due to other issues or updates. Either way it seems to have cleared up some time ago now...

w564432 · ‎11-18-2021

Hello, We have an application pulling search results from a scheduled search using Splunk API periodically, but encountering an issue where there is an excess of expired jobs (5000+) which are being kept for 1 month+ for some reason. Because the application has to look through each of these jobs it's taking too long and timing out. We tried deleting the expired jobs through the UI but they keep popping back up/not going away. Some of these now say "Invalid SID" when I try to inspect them. Is there any way we can clear these bulk, preferable without resorting to UI (which only shows 50 at a time)?

w564432 · ‎10-01-2021

I meant to accept your post I replied to as solution, but accidentally hit my reply instead. Can I fix this?

w564432 · ‎09-28-2021

That's wild. I always assumed earliest and latest were static parameters, and nothing in the documentation seems to suggest you can do something like (earliest=X AND latest=Y) OR (earliest=A AND latest=B). Is there some general concept I am missing on how Splunk parses parameters which should make this obvious? Or is it just specific to this one thing? Like you mentioned you lose the series labeling you get for free with multisearch, but you could patch this up with an eval although I'd imagine the performance suffers. But still this is a great, relatively simple solution!

w564432 · ‎09-28-2021

This is mostly just a curiosity, motivated by this post on how to compare a particular time interval across multiple larger time periods. Effectively the solution seems to be to generate a list of time intervals and run map subsearches on each entry. When I have multiple time periods that I'd like to run stats on, I typically use a multisearch command followed by a chart, as follows: | multisearch [ index=potato et=<et1> lt=<lt2> | eval series=1 ] [ index=potato et=<et2> lt=<lt2> |eval series=2 ] . . . [ index=potato et=<etn> lt=<ltn> | eval series=n ] | timechart count by series I suppose you could make it work by substituting the et's and lt's via subsearch, but it won't work if the number of time intervals, n, is also dynamically generated by some prior search. I know you can use a number of different techniques, but they all have different drawbacks. You could use map, which offers pretty much all the flexibility/dynamic-ness you need (I've abused it plenty of times doing things like map search=`searchstring($searchstring$)` ), but there are performance issues with this as subsearches can time out as it doesn't offer the same optimization as multisearch does when you just need to string multiple streams together. You can just search the entire timerange and use some eval logic to filter out the time intervals you need, but isn't this suboptimal since you're searching more events than you need? Multisearch seems to be great at streaming multiple different time intervals together and I'd love to have that optimization without At this point, would you just have to resort to REST to schedule searches? How would we tie the data together? I'm not very familiar with what is possible with REST as all of my experience is with just plain SPL. In a word, how do we stream events across multiple, dynamically generated time intervals without running into subsearch limitations?

w564432 · ‎10-28-2019

I am running a map command off of an initial search. The map ends with a sendemail command which sends a table of results. I would like to send a message that computes totals and other stats on this table -- however, I would not like to include this data as a totals row the table/search results, only in the message. In other words, the whole email would look something like: Subject: Alert condition triggered Sum(Field 1) of type X results: 524 Table of results ----------------------------------------------- | Field 1 | Field 2 | ... ... ... ... I know this can be done by running yet another subsearch for the "message" parameter in Splunk. However, this means I'm effectively running the same search twice... when performance-wise it would be better to just run the stats off of the table after it is generated. I know how to implement this in a dashboard with base searches, but I would like to know how to do this in 1 search. I think the problem is that there is no "scope" outside of the search results to which I can write a variable. I can think of a clunky solution using lookup/outputlookup. Is there some way to maybe pipe the table into a separate subsearch that generates a variable/token but does not actually append to the main search?

w564432 · ‎10-24-2019

Sorry I was unclear. Basically, Yes, I am charting by 5m over day, which results in a multi-time series timechart, which is a standard 5m timechart with several lines, each representing 1 particular day. It's not that I want then to bin by hours, but rather have the x-axis show just the hourly tick marks (or anything that would fit on the screen, really... better than being absolutely blank). What happens is that the x-axis will try to display all 288 bins. Visualization using "timechart" handles this by displaying less frequent time intervals (obviously it isn't always hours, but in my case it likely will be). Visualization using "chart" just straight up refuses to display the x-axis bins altogether since there's no room (unless you zoom). I am wondering if I can do this without using the timechart command, since because of the hacky way I'm using, every point in the time series needs to conform to 1 day (1971-1-1) which is not so pretty.

w564432 · ‎10-24-2019

I tried that as well. It seems to break the visualization though...

w564432 · ‎10-24-2019

Hi guys, I am trying to chart multiple days on the same line chart, kind of like in this example (https://docs.splunk.com/Documentation/Splunk/7.3.2/Search/Comparehourlysumsmultipledays) . However I am plotting 5m intervals and not hours like in the example above: ... | bin span = 5m _time | eval clock=strftime(_time,"%H:%M:%S") | chart avg(Total_ct) by clock,day However, the x-axis labels aren't showing up because there are too many (288 of them). What I want is to have 5m data (and thus 288 intervals over the whole day) but still be able to see some labels on the x-axis. This is what Splunk is trying to show and failing, because of the pixel limit: [ 00:00 00:05 00:10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23:55 ] I would be okay if the x-axis looked like the following: [ 0 1 . . . . . 23] Note: it doesn't have to be an hourly interval, I just want some kind of labels to show up on the x-axis. As it is now, there are no labels and its really hard to tell at a glance what the times are unless I mouse over. I did find a hacky way to get leverage the smart time-scaling display using timechart while also keeping the "chart by time over day" effect. This actually gives me an hourly scale with all 5m time interval data, but when I mouse over each point I see "1971-01-01" on each time (I basically converted all the days into one ridiculous day so I could overlay them). ... | eval clock=strftime(_time,"%H:%M:%S") | eval reclock="1971-01-01"." ".'clock' | eval day=strftime(_time,"%D") | eval _time=strptime(reclock, "%Y-%m-%d %H:%M:%S") | timechart span=5m cont=f avg(Total_ct) by day limit=0 Is there some way I can have my cake and eat it too?

w564432 · ‎08-06-2019

I have a dropdown that reads from a lookup but would like to allow the user to enter in a value that doesn't exist in the dropdown. Is this possible?

Posts	15
Solutions	1
Karma Given	4
Karma Received	0
Member Since	‎08-06-2019

Online Status	Offline
Date Last Visited	‎08-18-2025 06:46 AM

What are your best patterns for handling stats by ...

Bulk-deleting expired search jobs?

How to build multisearch dynamically?

How to use a value without including it in search ...

Display x-axis scale for a field that isn't _time

How to allow user to enter a value that doesn't ex...

Re: Force displayed timezone in results to be UTC ...

Re: What are your best patterns for handling stats...

Re: What are your best patterns for handling stats...

What are your best patterns for handling stats by ...

Re: Building multisearch dynamically

Re: Bulk-deleting expired search jobs

Bulk-deleting expired search jobs?

Re: Building multisearch dynamically

Re: Building multisearch dynamically

How to build multisearch dynamically?

How to use a value without including it in search ...

Re: Display x-axis scale for a field that isn't _t...

Re: Display x-axis scale for a field that isn't _t...

Display x-axis scale for a field that isn't _time

How to allow user to enter a value that doesn't ex...

Join the Conversation