I am trying to summarize network traffic logged by our firewall to determine the factors that have made our index usage exceed estimates by about 50%.
The queries that I am running take several days, so I am hoping that there are some optimizations that might help speed them up.
There are a lot of data (hundreds of millions of log records per day), so I am asking a lot of Splunk to summarize
Indexing logs do not go far enough back to let me do analysis on them. I also have a datamodel that does not go far enough back.
So I cannot take advantage of the faster access that these models can provide
I have already run queries to determine 3 points in time (a week each) establishing an increase in log record volume
Here is what I am doing
index=pan_logs
| eval loc=case(match(host,"fw_loc1*"), "Loc1", match(host,"fw_loc2.*"), "Loc1", match(host,"fw_loc3.*"), "Loc2", match(host,"fw_loc4.*"), "Loc3", match(host,"fw_loc5.*"), "Loc4", match(host,"fw_loc6.*"), "Loc5", match(host,"fw_loc7.*"), "Loc5", match(host,"fw_loc8.*"), "Loc5", match(host,"fw_loc9.*"),"test", match(host,"fw_locA.*"),"external", match(host,"fw_locB.*"),"external", 1=1,host)
| eval istraffic=if(match(eventtype,"pan_traffic"), "Y", null), isurl=if(match(eventtype,"pan_url"), "Y", null), isvuln=if(match(log_subtype,"vulnerability"),"Y",null), trafficDir=src_zone."-".dest_zone
| stats count dc(host) values(host) count(istraffic) count(isurl) count(isvuln) dc(trafficDir) by loc
The idea is to try and determine if a group of sources is adding to the volume and then understand why. The trafficDir is a quick measure of whether the increase might be related to increased firewall complexity.
Does anything here look like I could make it more efficient?
... View more