All Posts

Find Answers
Ask questions. Get answers. Find technical product solutions from passionate members of the Splunk community.

All Posts

As always - join is very rarely the way to go. The typical way is the stats approach of course. (one set of conditions) OR (another set of conditions) | conditional eval/rename to get common field(... See more...
As always - join is very rarely the way to go. The typical way is the stats approach of course. (one set of conditions) OR (another set of conditions) | conditional eval/rename to get common field(s) | optional renaming so that field names don't clash | optional filtering | stats values(*) as * by common field(s) That's the typical setup. Sometimes it might be easier to generate events by using multisearch instead of generating a single stream of events and selectively modify it. The problem here is that the common field(s) in both sets must be exactly the same while you want your _time to be within some range, not exactly equal between two events. I don't have an exact answer but I'd try to save the original times in some temporary fields and then try to offset _time by an hour and then use bin command to check if they both fall within acceptable  "time bucket". And then filter with direct comparison of left values vs right values. The only issue I see here (for which I have no solution at the moment) is that you might have multiple values matching this way - that would need some mvexpanding probably.  
OK. We might need to a bit more tricky for those multivalued fields.     | tstats prestats=t `summariesonly` count from datamodel="Web" where sourcetype="f5:bigip:ltm:http:irule" by _time Web.site... See more...
OK. We might need to a bit more tricky for those multivalued fields.     | tstats prestats=t `summariesonly` count from datamodel="Web" where sourcetype="f5:bigip:ltm:http:irule" by _time Web.site span=10m | timechart span=10m count as event_count by Web.site | timewrap span=1w series=short | foreach *_s0 [ eval <<MATCHSTR>>_combined=<<MATCHSTR>>_s0."|".<<MATCHSTR>>_s1 ] | fields _time *_combined | untable _time Web.series values | eval values=split(values,"|") | eval old=mvindex(values,0), new=mvindex(values,1) | fields - values | where (old-new)/old>0.3    
Is it possible to password protect emailed reports?
Well, that's how timechart works - if you split by a field, you get several separate time series (and timewrap multiplies that of course by cutting the time range into smaller chunks). Actually, tim... See more...
Well, that's how timechart works - if you split by a field, you get several separate time series (and timewrap multiplies that of course by cutting the time range into smaller chunks). Actually, timewrap might still be what you want, just process the results   | tstats prestats=t `summariesonly` count from datamodel="Web" where sourcetype="f5:bigip:ltm:http:irule" by _time Web.site span=10m | timechart span=10m count as event_count by Web.site | foreach *_s0 [ eval <<MATCHSTR>>_combined=mvappend(<<MATCHSTR>>_s0,<<MATCHSTR_s1) ] | fields _time *_combined | untable _time Web.series values | eval old=mvindex(values,0), new=mvindex(values,1) | fields - values | where (old-new)/old>0.3   Something like that. There might be a better way to do it but this should work. And remember that with timechart you might want to tweak limit and useother parameters. EDIT: Hmm... there is something fishy about untable and multivalued fields. I'll have to investigate it further.
@richgalloway This is great, this is getting me somewhere!  None of those fixes worked because one of the root causes of my problem is the string for the key is so long it gets truncated off of the s... See more...
@richgalloway This is great, this is getting me somewhere!  None of those fixes worked because one of the root causes of my problem is the string for the key is so long it gets truncated off of the screen so I can't roll my mouse over a whole highlight.  I do have a short term fix of just mouse zooming out quickly to grab a long field but its like a very difficult video game.  Thanks for your help!  
In my SPL JOIN query, I want to get the events for, let's say, between T1 and T2; however, the relevant events on the right side of the query happened between T1-60m and T2. I can't figure out how to... See more...
In my SPL JOIN query, I want to get the events for, let's say, between T1 and T2; however, the relevant events on the right side of the query happened between T1-60m and T2. I can't figure out how to do it in the dashboard or just a report. Using relative_time won't work for some reason. I appreciate any help. index=myindex | fields a, b, c | join type=inner left=l right=r where l.keyid=r.keyid [search index=myindex ```<- how to change the earliest to earliest-60m?``` |fields d, f ] | table l.a, l.b, l.c, r.d, r.f    
Hi, I've been struggling for some time with the way baselines seem to work - to the extent that I'm feeling like I can't trust them to be used to alert us to degraded performance in our systems.  I ... See more...
Hi, I've been struggling for some time with the way baselines seem to work - to the extent that I'm feeling like I can't trust them to be used to alert us to degraded performance in our systems.  I thought I would describe the issue and get the thoughts of the community.  Looking for some thoughts from folks who are happy with baselines and how they are mitigating the issue I’m experiencing.  Or some input confirming that my thinking on this is correct. I have proposed what I think could be a fix towards the end.  Apologies if this ends up being a bit of a long read but it feels to me like this is an important issue – baselines are fundamental to AppD alerting and currently I don’t see how they can reliably be used. To summarise the issue before I go into more detail it looks to me like AppD baselines, and the moving average used for transaction thresholds, ingest bad data when there is performance degradation which renders baselines unfit for their purpose of representing ‘normal’ performance.  This obviously then impacts on any health rules or alerting that make use of these baselines. Let me provide an example which will hopefully make the issue clear. A short time ago we had a network outage which resulted in a Major Incident (MI) and significantly increased average response time (ART) for many of our BTs. Because the ART metric baseline uses these abnormal ART values to generate the ongoing baseline it meant that the baseline itself rapidly increased. The outage should have significantly exceeded multiple SDs above the expected ‘normal’ baseline.  But because the bad data from the outage increased the baseline it meant that other than the very brief spike right at the start the increase in ART barely reached 1SD above baseline. Furthermore, the nature of the Weekly Trend – Last 3 Months baseline means that this ‘bad’ baseline will propagate forward.  Looking at the first screenshot above we can clearly see that the baseline is expecting ‘normal’ ART to be significantly elevated every Tuesday morning now.  Presumably this will continue until the original outage spike moves out of the baseline rolling window in 3 months. This is more clearly shown if we look more closely at the current week so that the chart re-scales without the original ART spike present. As far as the baseline is concerned, a large spike in ART every Tuesday morning is now normal.   This mean that less extreme (but still valid) ART degradation will not trigger any health rules that use this baseline.  In fact, this could also generate spurious alerts on healthy performance if we were using an alert based on < baseline SD as the healthy ART now looks to be massively below ‘normal’ baseline. To my mind this simply can’t be correct behaviour by the baseline.  It clearly no longer represents normal performance which by my understanding is the very purpose of the baselines. The same problem is demonstrated if we use other baselines but I’ll not include my findings here for the sake of this already long post not becoming a saga. This issue of ingesting bad data also impacts the Slow/VerySlow/Stalled thresholds and the Transaction Score chart: As can be seen we had a major network outage which caused an increase in ART for an extended period.  This increase was correctly reflected in the Transaction Score chart for a short period but as the bad data was ingested and increased the value of the moving average used for thresholds we can see that even though the outage continued and ART stayed at abnormal level, the health of the transactions stopped being orange Very Slow and moved through yellow Slow back to green Normal.  And yet the outage was ongoing, the Major Incident was ongoing, the ART had not improved from its abnormally high service impacting value.  These later transactions are most certainly not Normal by a very long way and yet AppD believes them to be normal because the moving average has been polluted by ingesting the outage ART data.  So after a short period of time the moving average used to define a Slow/Very Slow transaction no longer represents normal ART but instead has decided that the elevated ART caused by the outage is the new normal.  I’d like to think that I’m not the only one who thinks this is undesirable. Any alerting based on using slow transaction metrics would stop alerting and would report normal performance even though the outage was ongoing with service still being impacted. Now it’s not my way to raise a problem without at least trying to provide a potential solution and in this case I have two initial thoughts: AppD adds the ability to lock the baseline in much the same way as we lock BTs.  So a BT is allowed to build up a baseline until it looks like it matches ‘normal’ behaviour as closely as we’re likely to get.  At this point the baseline is locked and no further data is added to the baseline.  If a service changes and we believe we have a new normal performance then the baseline can be unlocked to ingest the new metrics and update the baseline to the new normal, at which point it can be locked again. Instead of locking baselines AppD could perhaps implement a system whereby bad data is not ingested into the baseline.  Perhaps something like: any data point which comes in which triggers a health rule (or transaction threshold) is taken as evidence of abnormal performance and is not used to generate the baseline, maybe instead the last known non-triggering data point is used for the baseline.  This would mean that the baseline probably would still increase during an outage (working on the assumption that a service degrades before failing so the points immediately prior to the triggering of an alert might still be elevated above normal).  But this should mean that the baseline change would not be as fast or as catastrophic as the current method of calculating the rolling baseline/moving average. Well, that pretty much wraps it up I think.  If you've made it this far then thanks for your time and I'd really appreciate knowing if other folks are having a similar issue with baselines or have found ways to work around it.
I made those changes and when I go to the webpage it prompts me for a pin then I get the following error after entering my cac pin: This XML file does not appear to have any style information associ... See more...
I made those changes and when I go to the webpage it prompts me for a pin then I get the following error after entering my cac pin: This XML file does not appear to have any style information associated with it. The document tree is shown below. <response> <messages> <msg type="ERROR">Unauthorized</msg> </messages> </response>
Good evening everyone, we have a problem in a Splunk cluster, composed of 3 indexers, 1 CM, 1 SH, 1 Deployer, 3 HF, 3 UF. The UFs receive logs from different Fortinet sources via syslog, and write ... See more...
Good evening everyone, we have a problem in a Splunk cluster, composed of 3 indexers, 1 CM, 1 SH, 1 Deployer, 3 HF, 3 UF. The UFs receive logs from different Fortinet sources via syslog, and write them to a specific path via rsyslog. Splunk_TA_fortinet_fortigate is installed on the forwarders. These logs must be saved to a specific index in Splunk, and a copy must be sent to two distinct destinations (third-party devices), in two different formats (customer needs). Since the formats are different (one of the two contains TIMESTAMP and HOSTNAME, the other does not), via rsyslog they are saved to two distinct paths applying two different templates. So far so good. The issues we have encountered are: - Some events are indexed twice in Splunk - Events sent to the customer do not always have a format that complies with the required ones For example, in one of the two cases the required format is the following: <PRI> date=2024-09-12 time=14:15:34 devname="device_name" ... But looking at the sent packets via tcpdump, some are correct, others are in the format <PRI> <IP_address> date=2024-09-12 time=14:15:34 devname="device_name" ... and more in the format <PRI> <timestamp> <IP_address> date=2024-09-12 time=14:15:34 devname="device_name" ... The outputs.conf file is as follow: [tcpout] defaultGroup = default-autolb-group [tcpout-server://indexer_1:9997] [tcpout-server://indexer_2:9997] [tcpout-server://indexer_3:9997] [tcpout:default-autolb-group] server = indexer_1:9997,indexer_2:9997,indexer_3:9997 disabled = false [syslog] [syslog:syslogGroup1] disabled = false server = destination_IP_1:514 type = udp syslogSourceType = fortigate [syslog:syslogGroup2] disabled = false server = destination_IP_2:514 type = udp syslogSourceType = fortigate priority = NO_PRI This is the props.conf: [fgt_log] TRANSFORMS-routing = syslogRouting [fortigate_traffic] TRANSFORMS-routing = syslogRouting [fortigate_event] TRANSFORMS-routing = syslogRouting and this is the trasforms.conf: [syslogRouting] REGEX=. DEST_KEY=_SYSLOG_ROUTING FORMAT=syslogGroup1,syslogGroup2 Any ideas? Thank you, Andrea  
Sometimes you can right-drag and then opt to copy without formatting.  If not, copy the field name and use Ctrl-Shift-V to paste it without the HTML.
Hope it did not mention partial search I share the concern regarding subsearches with you, maybe appendpipe could help ... This is a challenge for me, I should not update the lookup when the s... See more...
Hope it did not mention partial search I share the concern regarding subsearches with you, maybe appendpipe could help ... This is a challenge for me, I should not update the lookup when the search is seeing partial results. It happens very rarely, maybe doing it differently could help. Sorry, I hopped there is something in SPL that would tell me that the search results are kind of limited. require is a very good hint!  
Good day, I often run up against the issue of wanting to drag the text of a field name from the browser into a separate text editor.  Whenever I drag it, it works but it brings all the html metadata ... See more...
Good day, I often run up against the issue of wanting to drag the text of a field name from the browser into a separate text editor.  Whenever I drag it, it works but it brings all the html metadata with it.  Sometimes these field names are very long and so truncated on the screen its very tough without copying and pasting.   Has anyone found good work around for this?  Right now the field names, when dragged from the web browser into a text editor, comes through like this: https://fakebus.splunkcloud.com/en-US/app/search/search?q=search%20%60blocks%60&sid=1726153610.129675&display.page.search.mode=verbose&dispatch.sample_ratio=1&workload_pool=&earliest=-30m%40m&latest=now#https://fakebus.splunkcloud.com/en-US/app/search/search?q=search%20%60blocks%60&sid=1726153610.129675&display.page.search.mode=verbose&dispatch.sample_ratio=1&workload_pool=&earliest=-30m%40m&latest=now# Ironically dragged text field from splunk into this web dialog box work fine.
It's not clear what is meant by "partial search" and how Splunk is to know a search returned partial results or just fewer results. The subsearch idea likely won't work because subsearches execute b... See more...
It's not clear what is meant by "partial search" and how Splunk is to know a search returned partial results or just fewer results. The subsearch idea likely won't work because subsearches execute before the main search and so would be unable to detect errors in the main search. There is the require command that will abort a query if there are zero results.  That may not meet your requirements, however.
Hello, I'm not sure how to troubleshoot this at all.  So I've created a new Python based App thru the Add-On builder that is using a Collection Interval every 60 sec.  The App Input is set to 60 se... See more...
Hello, I'm not sure how to troubleshoot this at all.  So I've created a new Python based App thru the Add-On builder that is using a Collection Interval every 60 sec.  The App Input is set to 60 sec as well.  When I test the script which makes chained API calls that creates events based off of the last API call, it returns within 20 sec. The App would create about 50 events for each interval, when performing a Search, I would expect every 1 min to see about 50 events, but I'm seeing 6 or 7 per minute.   I ran the following query, and it's showing that the event time and index time are within ms.   source=netscaler| eval indexed_time=strftime(_indextime, "%Y-%m-%d %H:%M:%S") | eval event_time=strftime(_time, "%Y-%m-%d %H:%M:%S") | table _raw event_time indexed_time When looking at the App log, I see it's only making the final API calls every 20 sec instead of all 50 of the final API calls within ms. Does anyone have any idea why this would occur and how I could resolve this lag that is occurring?   Thanks for your help, Tom    
Run it like a...  | makeresults | eval base_search="| makeresults | appendcols [| inputlookup ticket_templates where _key=5d433a4e10a7872f3a197e81 | stats max(*) as *]" | map search="| makeresul... See more...
Run it like a...  | makeresults | eval base_search="| makeresults | appendcols [| inputlookup ticket_templates where _key=5d433a4e10a7872f3a197e81 | stats max(*) as *]" | map search="| makeresults | map search="$base_search$  This would definitely work!!
Currently the above fix is only for Microsoft ADFS, but it is possible using Okta and F5 using the SAML configuration with the prompt being on the IdP side. What is your IdP?
Since Splunk 6.x we have been using a proxy server (Apache) with Splunk to pass the user's CAC credentials to Splunk.  Is it true that with 9.2.2, a proxy is no longer needed?  I'm also trying to ... See more...
Since Splunk 6.x we have been using a proxy server (Apache) with Splunk to pass the user's CAC credentials to Splunk.  Is it true that with 9.2.2, a proxy is no longer needed?  I'm also trying to implement CAC authentication following Configure Splunk Enterprise to use a common access card for authentication - Splunk Documentation and Configuring Splunk for Common Access Card (CAC) authentication - Splunk Lantern, but now getting the following error message: "This site can't be reached"
So just to be clear, this would not be a candidate for KV Store?
Assuming you have co-located the syslog-ng app install on the same server where you have located your Splunk HF then there are a few options available to you.  You can continue to create the stand al... See more...
Assuming you have co-located the syslog-ng app install on the same server where you have located your Splunk HF then there are a few options available to you.  You can continue to create the stand alone app to provide some experience and learning opportunities. Or... Most people when running syslog-ng will set the destination of syslog events to a file and separated by some sort of host details.  Since the file is local to the disk then you can leverage the HF web interface to set a Data Input monitor file option and it will guide you through common event breaking, line breaking, time extraction options, and some field extraction options.  Essentially all the things a stand alone app would do but easier to manage down the road by continuing to use the web interface. In a previous life I set the destination of the syslog-ng as a HEC receiver.  Which in your situation can be the local host HF or the IDX cluster, but that takes a bit of work and you already have a lot of development to write to file so maybe not the right idea for you.
From Splunk's perspective, no action is needed to stop using the software.  Your company, however, may have its own requirements, such as archiving the data before decommissioning the server.