Thank you so much for the information. I get all the data from my lookup table, but everything shows as missing probably due to the way the index and my lookup table are working together. I think I'm going to have to do a lot more to get this to work.
For now, I'm just creating 2 panels. A panel using metadata that shows the systems that have not reported in over 24 hours hours (but have sent logs within the past 30 days) - this is just giving me IP addresses but isn't a lot of systems. Then I'm using another panel next to it that basically goes through the normal events and shows the number of unique hosts seen in events coming across the network.
Should work for now...because my brain about to explode trying to figure the metadata/lookup table.
... View more
So I'm trying to do something that may or may not be possible.
I want to first create a lookup table that maps IP addresses to host names. I then want to use metadata or tstats to pull a list of systems that haven't logged within a certain timeframe, and then convert those IP addresses to the corresponding hostnames in the lookup table. This will provide useful for personnel who need to look at a hostname and immediately know what host it is, without needing to know the IP address of each host on the network.
I believe I have the right metadata and tstats commands, but I am not sure how to then run those results against the lookup table for the IP address to hostname field conversion. This is ultimately going to be dumped into a table as a dashboard widget, and I'm not even sure if I can do all those things.
... View more
Thanks! I actually started from scratch yesterday and then was adding things back and it didn't seem to help much. The only real way to get it quicker is to stop searching the last 24 hours. And if we were using this as a real-time dashboard, that would be easy...because we could just do 1 hour or something like that. But it's not a real-time dashboard...or at least it's not really necessary for it to be.
So I started from scratch. Removed all the "extra" fluff and just let it spit the data out however it wanted for that event ID, and got the time down to about 130 seconds for a 24 hour search. I think that's about the best I'm going to get for that time-frame...and honestly at this point, we're probably going to end up porting this to a daily report instead of an active dashboard so it just runs once early in the morning and gives us the report for the previous 24 hours.
Unfortunately we don't get to decide the resources we are allocated or features we are allowed to use, so we have to make due with what we're provided...and there are a lot of other people who also need their dashboards and searches to work, so I'll just take the back burner on this one.
I appreciate all the help! I'll mark your response as the answer.
... View more
Thank you so much for your answer. I was able to identify a few optimization issues using the job inspector, but they didn't seem to change much. I have watched the video Splunk has on using the job inspector and EPS to determine how well your search is performing...and it seems that mine is under-performing. For my search, in this example...it returned 56 results by scanning 1,467,895 events in 206.326 seconds. This puts my EPS at 7,114...which is much lower than the 10,000 minimum Splunk says it should be running at.
Also, you've stated that I shouldn't use index=*, but the reason I do that is because we only use 1 index for non Splunk system related items. I put the actual index in there explicitly and it did not change my job inspector data either for time to complete, or resources utilized. As for the sorting, I rarely have more than 75 events per search over a 24 hour period on these particular searches that are populating my dashboard, so I'm not too worried about that. But I'll put it in there just in case. Thanks!
Any other ideas?
... View more
Hello all, thanks for taking the time to read this post. I am writing today about an issue we seem to be having with one of our Splunk dashboards. It's really just 1 particular query within the dashboard...and it seems like it's due to the way in which the query is written. The query is taking on average 2 1/2 - 3 minutes to load, and utilizing between 150-200MB of memory on the search head instance.
index=* sourcetype="WinEventLog:Security" EventCode IN (4625, 4626, 4627, 4628, 4630, 4631, 4632, 4633) AND Account_Name IN (admin account prefixes on network) | fillnull value=NULL | eval Account_Name=mvindex(Acount_Name,1) | eval Security_ID=mvindex(Security_ID,1) | eval LoginType=case(Logon_Type=2, "Regular Logon", Logon_Type=3, "RPC (not RDP)", Logon_Type=4, "Batch", Logon_Type=5, "Service", Logon_Type=7, "Screen Unlock/SessionResume", Logon_Type=10, "Remote Desktop", Logon_Type=11, "Cached", Logon_Type=9, "New Credentials") | rename Account_Name as "User" | stats count(Security_ID) as "Login Events" by Account_Name, LoginType, host, _time | sort - _time
Here is another query we are using for standard accounts in the same dashboard...and it loads in less than 15 seconds, utilizing much less resources.
index=* sourcetype="WinEventLog:Security" NOT Account_Name IN (admin account prefixes on network) NOT Caller_Process_Name="*process we want suppressed" EventCode IN (4625, 4626, 4627, 4628, 4630, 4631, 4632, 4633) | fillnull value=NULL | eval Account_Name=mvindex(Acount_Name,1) | eval Security_ID=mvindex(Security_ID,1) | eval LoginType=case(Logon_Type=2, "Regular Logon", Logon_Type=3, "RPC (not RDP)", Logon_Type=4, "Batch", Logon_Type=5, "Service", Logon_Type=7, "Screen Unlock/SessionResume", Logon_Type=10, "Remote Desktop", Logon_Type=11, "Cached", Logon_Type=9, "New Credentials") | rename Account_Name as "User" | stats count(Security_ID) as "Login Events" by Account_Name, LoginType, host, _time | sort - _time
Again, the top query takes minutes to load and uses excessive resources, the bottom query takes seconds to load and doesn't use nearly as much resources. I guess I'm just curious if this is due to the nature of "NOT" statements in a query vice "AND"...or if my query isn't optimized. Maybe both? The queries are searching for the past 24 hours, and are set to 30min refresh intervals.
... View more