All Posts

Find Answers
Ask questions. Get answers. Find technical product solutions from passionate members of the Splunk community.

All Posts

(how do I give negative Karma?)  
Hello community, we are currently a bit desperate because of a Splunk memory leak problem under Windows OS that most probably all of you will have, but may not have noticed yet, here is the history a... See more...
Hello community, we are currently a bit desperate because of a Splunk memory leak problem under Windows OS that most probably all of you will have, but may not have noticed yet, here is the history and analysis of it: The first time we observed a heavy memory leak problem on a Windows Server 2019 instance was after updating to Splunk Enterprise Version 9.1.3 (from 9.0.7). The Windows server affected has installed some Splunk apps (Symantec, ServiceNow, MS o365, DBconnect, Solarwinds), which are starting a lot of python scripts at very short intervals. After the update the server crashes every few hours due to low memory. Openend a Splunk case #3416998 in Feb 9th. With the MS sysinternals tool rammap.exe we found a lot "zombie" processes (PIDs no more listed in task manager) which are still using some KB of memory (~20-32 KB). Process names are btool.exe, python3.exe, splunk-optimiz, splunkd.exe. It seems every time a process of one of these programs ends, it leaves behind such a memory usage. The Splunk apps on our Windows server do this very often and fast which results in thousands of zombie processes.   After this insight we downgraded Splunk on the server to 9.0.7 and the problem disappears. Then on a test server we installed Splunk Enterprise versions 9.1.3 and 9.0.9. Both versions are showing the same issue. New Splunk case #3428922. In March 28th we got this information from Splunk: .... got an update from our internal dev team on this "In Windows, after upgrading Splunk enterprise to 9.1.3 or 9.2.0 consumes more memory usage. (memory and processes are not released)" internal ticket. They investigated the diag files and seems system memory usage is high, but only Splunk running. This issue comes from the mimalloc (memory allocator). This memory issue will be fixed in the 9.1.5 and 9.2.2 .......... 9.2.2 arrived at July 1st: Unfortunately, still the same issue, the memory leak persists. 3rd Splunk case #3518811 (which is still open). Also not fixed in Version 9.3.0. Even after a online session showing them the rammap.exe screen they wanted us to provide diags again and again from our (test) servers - but they should actually be able to reproduce it in their lab. The hudge problem is: because of existing vulnerabilities in the installed (affected) versions we need to update Splunk (Heavy Forwarders) on our Windows Servers, but cannot due to the memory leak issue. How to reproduce: - OS tested: Windows Server 2016, 2019, 2022, Windows 10 22H2 - Splunk Enterprise Versions tested: 9.0.9, 9.1.3, 9.2.2 (Universal Forwarder not tested) - let the default installation run for some hours (splunk service running) - download rammap.exe from https://learn.microsoft.com/en-us/sysinternals/downloads/rammap and start it - goto Processes tab, sort by Process column - look for btool.exe, python3.exe and splunkd.exe with a small total memory usage of about ~ 20-32 KB. PIDs of this processes don't exists in task list (see Task manager or tasklist.exe) - with the Splunk default installation (without any other apps) the memory usage slowly increases because the default apps script running interval isn't very high - stopping Splunk service releases memory usage (and zombie processes disappear in rammap.exe) - for faster results you can add an app for exessive testing with python3.exe, starting it in short (0 seconds) intervals. The test.py doesn't need to be exist! Splunk starts python3.exe anyway. Only inputs.conf file is needed: ... \etc\apps\pythonDummy\local\inputs.conf [script://$SPLUNK_HOME/etc/apps/pythonDummy/bin/test.py 0000] python.version = python3 interval = 0 [script://$SPLUNK_HOME/etc/apps/pythonDummy/bin/test.py 1111] python.version = python3 interval = 0 ...............if you want, add some more stanzas, 2222, 3333 and so on ............. - the more python script stanzas there are, the more and faster the zombies processes appears in rammap.exe Please share your experiences. And open tickets for Splunk support if you also see the problem, please. We hope Splunk finally react.  
@ITWhisperer  : It works but whenever any panel in the dashboard is refreshed, color of all the panels in the dashboard is changed from Red/Green to white.  In my case , there are multiple panels. S... See more...
@ITWhisperer  : It works but whenever any panel in the dashboard is refreshed, color of all the panels in the dashboard is changed from Red/Green to white.  In my case , there are multiple panels. So , when any of the one panel is refreshed , it changes the color of all the 6 panels to white from Green/red.  Is it possible to keep the color always as Red or Green ???    Current code :  <row> <panel depends="$alwaysHide$"> <html> <style> #single1 text { fill: $colour$ !important; } </style> </html> </panel> </row> <row> <panel> <title>EVIS DASHBOARD</title> <single id="single1"> <search> <query>`macro_events_all_win_ops_esa` sourcetype=WinHostMon host=P9TWAEVV01STD (TERM(Esa_Invoice_Processor) OR TERM(Esa_Final_Demand_Processor) OR TERM(Esa_Initial_Listener_Service) OR TERM(Esa_MT535_Parser) OR TERM(Esa_MT540_Parser) OR TERM(Esa_MT542_Withdrawal_Request) OR TERM(Esa_MT544_Parser) OR TERM(Esa_MT546_Parser) OR TERM(Esa_MT548_Parser) OR TERM(Esa_SCM Batch_Execution) OR TERM(Euroclear_EVIS_Border_Internal) OR TERM(EVISExternalInterface)) | stats latest(State) as Current_Status by service | where Current_Status != "Running" | stats count as count_of_stopped_services | eval status = if(count_of_stopped_services = 0 , "OK" , "NOK" ) | fields status | append [ search `macro_events_all_win_ops_esa` host="P9TWAEVV01STD" sourcetype=WinEventLog "Batch *Failed" System_Exception="*" | stats count as count_of_failed_batches | eval status = if(count_of_failed_batches = 0 , "OK" , "NOK" ) | fields status ] | stats values(status) as status_list | eval final_status = if(mvcount(mvfilter(status_list=="NOK")) &gt; 0, "NOK", "OK") | eval _colour=if(final_status ="OK","Green","Red") | fields final_status</query> <earliest>-15m</earliest> <latest>now</latest> <done> <set token="colour">$result._colour$</set> </done> <sampleRatio>1</sampleRatio> <refresh>1m</refresh> <refreshType>delay</refreshType> </search> <option name="drilldown">all</option> <option name="refresh.display">progressbar</option> </single> </panel> </row> <row> <panel depends="$alwaysHide$"> <html> <style> #single2 text { fill: $colour$ !important; } </style> </html> <html> <style> #single3 text { fill: $colour$ !important; } </style> </html> </panel> </row> <row> <panel> <title>SEMT FAILURES DASHBOARD</title> <single id="single2"> <search> <query>(index="events_prod_gmh_gateway_esa") sourcetype="mq_PROD_GMH" Cr=S* (ID_FCT=SEMT_002 OR ID_FCT=SEMT_017 OR ID_FCT=SEMT_018 ) ID_FAMILLE!=T2S_ALLEGEMENT | eval ERROR_DESC= case(Cr == "S267", "T2S - Routing Code not related to the System Subscription." , Cr == "S254", "T2S - Transcodification of parties is incorrect." , Cr == "S255", "T2S - Transcodification of accounts are impossible.", Cr == "S288", "T2S - The Instructing party should be a payment bank.", Cr == "S299", "Structure du message incorrecte.",1=1,"NA") | stats count as COUNT_MSG | eval status = if(COUNT_MSG = 0 , "OK" , "NOK" ) | eval _colour=if(status ="OK","Green","Red") | table status</query> <earliest>@d</earliest> <latest>now</latest> <done> <set token="colour">$result._colour$</set> </done> <sampleRatio>1</sampleRatio> <refresh>1m</refresh> <refreshType>delay</refreshType> </search> <option name="colorBy">value</option> <option name="drilldown">all</option> <option name="rangeColors">["0x53a051","0x0877a6","0xf8be34","0xf1813f","0xdc4e41"]</option> <option name="refresh.display">progressbar</option> <option name="trellis.enabled">0</option> <option name="useColors">1</option> </single> </panel>
Hello, I have a query used on Splunk enterprise web (search)- "index="__eit_ecio*" | ... | bin _time span=12h | ... | table ... | I am trying to put that into a python API code using Job clas... See more...
Hello, I have a query used on Splunk enterprise web (search)- "index="__eit_ecio*" | ... | bin _time span=12h | ... | table ... | I am trying to put that into a python API code using Job class as this - searchquery_oneshot ="<my above query>" I am getting error - "SyntaxError: invalid decimal literal" pointing to the 12h  in main query. How can I fix this? [2) Can I direct "collect" results (summary index) via this API into json format?] Thanks
so what you are saying,  just configure 'indexer clusters' in Each cloud environment and then use a 'SHC' from any of the cloud to search the 'indexer clusters' in ALL cloud environments? You sure it... See more...
so what you are saying,  just configure 'indexer clusters' in Each cloud environment and then use a 'SHC' from any of the cloud to search the 'indexer clusters' in ALL cloud environments? You sure it won't causes latency at time of SH aggregation? A diagram would be really appreciated
Try this query "My base query" ("Starting execution for request" OR "Successfully completed execution") | rex "status:\s+(?<Status>.*)\"}" | rex field=_raw "\((?<Message_Id>[^\)]*)" | rex "Path\:... See more...
Try this query "My base query" ("Starting execution for request" OR "Successfully completed execution") | rex "status:\s+(?<Status>.*)\"}" | rex field=_raw "\((?<Message_Id>[^\)]*)" | rex "Path\:\s+(?<ResourcePath>.*)\"" | rex "timestamp\\\":(\d+)" | stats min(timestamp) as startTime, max(timestamp) as endTime, values(*) as * by Message_Id | eval duration = endTime - startTime | eval end_timestamp_s = endTime/1000, start_timestamp_s = startTime/1000 | eval human_readable_etime = strftime(end_timestamp_s, "%Y-%m-%d %H:%M:%S"), human_readable_stime = strftime(start_timestamp_s, "%Y-%m-%d %H:%M:%S"), duration = tostring(duration, "duration") | table Message_Id human_readable_stime human_readable_etime duration Status Path
Thank you
If your problem is resolved, then please click the "Accept as Solution" button to help future readers.
Short answer: Yes. Longer answer: You can do it with CSS. <panel depends="$alwaysHide$"> <html> <style> #single text { fill: $colour$ !important; } ... See more...
Short answer: Yes. Longer answer: You can do it with CSS. <panel depends="$alwaysHide$"> <html> <style> #single text { fill: $colour$ !important; } </style> </html> </panel> <panel> <single id="single"> <search> <query>| makeresults | fields - _time | eval OnTarget=mvindex(split("Yes,No",","),random()%2) | eval _colour=if(OnTarget="Yes","Green","Red")</query> <earliest>-24h@h</earliest> <latest>now</latest> <done> <set token="colour">$result._colour$</set> </done> </search> <option name="drilldown">none</option> <option name="refresh.display">progressbar</option> </single> </panel>
| rex field=account_id "\b(0?)(?<field_to_look_up>\d+)\b" | lookup accounts.csv account_id AS field_to_look_up [...]  
At first glance it seems like a use case for federated search. Having said that - I've never used federated search myself and can't tell you what limitations it has.
What you have should work, but you could try this instead index=db_it_network sourcetype=pan* rule=g_artificial-intelligence-access | stats count by user app date_month | chart count by app date_month
There is no single good answer to such question. It all depends on your data input. It is obviously an overkill to have a 10G uplink for a single or just a bunch of UFs on a fairly unused servers. Bu... See more...
There is no single good answer to such question. It all depends on your data input. It is obviously an overkill to have a 10G uplink for a single or just a bunch of UFs on a fairly unused servers. But on the other hand, if you have a site with plethora of fairly active nodes, even this 10G hose might not be sufficient (but then a single IF will most surely also not be enough).
Hi Team  Can you please help me to find a way to change the color of the output value in a single value visualization.  If COUNT_MSG is OK , then display OK in Green If COUNT_MSG is NOK , then d... See more...
Hi Team  Can you please help me to find a way to change the color of the output value in a single value visualization.  If COUNT_MSG is OK , then display OK in Green If COUNT_MSG is NOK , then display NOK in Red Current Code :  <panel> <title>SEMT FAILURES DASHBOARD</title> <single> <search> <query>(index="events_prod_gmh_gateway_esa") sourcetype="mq_PROD_GMH" Cr=S* (ID_FCT=SEMT_002 OR ID_FCT=SEMT_017 OR ID_FCT=SEMT_018 ) ID_FAMILLE!=T2S_ALLEGEMENT | eval ERROR_DESC= case(Cr == "S267", "T2S - Routing Code not related to the System Subscription." , Cr == "S254", "T2S - Transcodification of parties is incorrect." , Cr == "S255", "T2S - Transcodification of accounts are impossible.", Cr == "S288", "T2S - The Instructing party should be a payment bank.", Cr == "S299", "Structure du message incorrecte.",1=1,"NA") | stats count as COUNT_MSG | eval status = if(COUNT_MSG = 0 , "OK" , "NOK" ) | table status</query> <earliest>@d</earliest> <latest>now</latest> <sampleRatio>1</sampleRatio> <refresh>1m</refresh> <refreshType>delay</refreshType> </search> <option name="drilldown">all</option> <option name="rangeColors">["0x53a051","0x0877a6","0xf8be34","0xf1813f","0xdc4e41"]</option> <option name="refresh.display">progressbar</option> <option name="trellis.enabled">0</option> <option name="useColors">1</option> </single> </panel>   Current Output:   
I still don't understand what you mean by those apps. App in Splunk terminology is just a collection of files. The same set of settings can usually be equally well provided by a single app as well as... See more...
I still don't understand what you mean by those apps. App in Splunk terminology is just a collection of files. The same set of settings can usually be equally well provided by a single app as well as multiple ones (with the possible difference of access to config elements provided by different apps if you differentiate permissions on a role/app basis). Performance improvement when doing summary indexing happens not just because you use the collect command but because summary indexing assumes just that - doing some _summary_ on your data before indexing it. So - for example - you calculate some aggregated sums for every 15 minutes and store that value using collect command in an index so that later you don't have to summarize your raw data each time but just use that already calculated sum. That's what summary indexing is. Simply copying events from index to index is not summary indexing.
Hi @bowesmana ,   Thanks for the response. I don't know JS, so I am checking with the communty if someone else had the same issues and might have some generic solution that might help me fix the xm... See more...
Hi @bowesmana ,   Thanks for the response. I don't know JS, so I am checking with the communty if someone else had the same issues and might have some generic solution that might help me fix the xml. I'll tried your solution and it worked. Could you please explain what it really does so that I would get idea ?   Thanks, Pravin
Hi @hazem , as I said, the answer is related to the bandwidth you have: There isn't a recommended value: the highest you can! As I said, between Intermediate UF and IDX, you must have a large bandw... See more...
Hi @hazem , as I said, the answer is related to the bandwidth you have: There isn't a recommended value: the highest you can! As I said, between Intermediate UF and IDX, you must have a large bandwidth to avoid queues.You can check queues with my search. Ciao. Giuseppe
thank you @gcusello  for your reply i need to clarify that we are in setting architecture phase and our customer  asked me the below questions and I need to reply with specific recommended B/W   W... See more...
thank you @gcusello  for your reply i need to clarify that we are in setting architecture phase and our customer  asked me the below questions and I need to reply with specific recommended B/W   What is the recommended bandwidth between intermediate forwarders, heavy-weight forwarders, and indexers? What is the recommended bandwidth between the UF agents and indexers?"
Hi @hazem , it depends on how many logs you have to transmit: e.g. a Domain Controller has to transmit more logs than a server, if you have application logs they must consider them. Anyway, between... See more...
Hi @hazem , it depends on how many logs you have to transmit: e.g. a Domain Controller has to transmit more logs than a server, if you have application logs they must consider them. Anyway, between intermediate UFs and Indexer, I hint to avoid limits. You can configure the max throughtput  suing the maxKBps parameter on the UFs. My hint is to leave the default values, changing only maxKBps for intermediate UFs, and analyzing both if you have netweork congestions and your Indexers can index all logs with an acceptable delay. Another analysis to perform is the presence of queues, using this search: index=_internal source=*metrics.log sourcetype=splunkd group=queue | eval name=case(name=="aggqueue","2 - Aggregation Queue", name=="indexqueue", "4 - Indexing Queue", name=="parsingqueue", "1 - Parsing Queue", name=="typingqueue", "3 - Typing Queue", name=="splunktcpin", "0 - TCP In Queue", name=="tcpin_cooked_pqueue", "0 - TCP In Queue") | eval max=if(isnotnull(max_size_kb),max_size_kb,max_size) | eval curr=if(isnotnull(current_size_kb),current_size_kb,current_size) | eval fill_perc=round((curr/max)*100,2) | bin _time span=1m | stats Median(fill_perc) AS "fill_percentage" perc90(fill_perc) AS "90_perc" max(max) AS max max(curr) AS curr by host, _time, name | where (fill_percentage>70 AND name!="4 - Indexing Queue") OR (fill_percentage>70 AND name="4 - Indexing Queue") | sort -_time if you have queues, you can modify the maxSize parameter for the queues and the maxKBps. Ciao. Giuseppe
Our deployment has indexers located in the main data center and multiple branches. We plan to deploy intermediate forwarders and Universal Forwarder (UF) agents in our remote branches to collect logs... See more...
Our deployment has indexers located in the main data center and multiple branches. We plan to deploy intermediate forwarders and Universal Forwarder (UF) agents in our remote branches to collect logs from security devices like firewalls and load balancers  .   What is the recommended bandwidth between the intermediate forwarders and indexers? What is the recommended bandwidth between the UF agents and indexers?"