We receive 45-50 millions of data daily from various log sources(servers, network devices, proxy, cloud). we need to get report for top source and destination IP address on monthly basis.
I have created one report but it takes more than 2 days to complete. sometimes report fails to run due to huge amount of events.
I am sure there might be some way provided by Splunk to get these kind of reports more quickly. I am also working on creating data model but as there many log sources, I am facing some issues doing that.
Meanwhile, can somebody suggest me how can I get these reports with such huge amount of data?
The right answer is to build a data model
, accelerate it, and use tstats
to pull back the details:
http://docs.splunk.com/Documentation/Splunk/6.5.2/Knowledge/Acceleratedatamodels
https://helgeklein.com/blog/2015/10/splunk-accelerated-data-models-part-1/
There is a SUBSTANTIAL disk space impact to this so be sure to size your indexers accordingly.
If you cannot handle the disk space impact then summary indexing
is your next best option, especially if you are looking for a gross aggregate and tiny bit of data loss is unlikely to impact your "answer". SI is "fragile" and prone to small outages of data, unlike DMA which is bulletproof and needs no hand-holding nor maintenance:
http://docs.splunk.com/Documentation/Splunk/6.5.2/Knowledge/Usesummaryindexing
The right answer is to build a data model
, accelerate it, and use tstats
to pull back the details:
http://docs.splunk.com/Documentation/Splunk/6.5.2/Knowledge/Acceleratedatamodels
https://helgeklein.com/blog/2015/10/splunk-accelerated-data-models-part-1/
There is a SUBSTANTIAL disk space impact to this so be sure to size your indexers accordingly.
If you cannot handle the disk space impact then summary indexing
is your next best option, especially if you are looking for a gross aggregate and tiny bit of data loss is unlikely to impact your "answer". SI is "fragile" and prone to small outages of data, unlike DMA which is bulletproof and needs no hand-holding nor maintenance:
http://docs.splunk.com/Documentation/Splunk/6.5.2/Knowledge/Usesummaryindexing
Thank you, I will try this out. As your answer covers both data model and summary index methods, I will accept this answer.
@Everyone who have commented here, many thanks for your inputs.
Creating a summary is probably the best way to do it.
That said, there is a quick and dirty way to do it. Given that you are looking for the Top IPs that are appearing in your logs, you can set the Event Sampling in your search(es) to either 1 : 10,000 or 1 : 100,000. Since you are looking for the IPs that are the most numerous in your logs, they will still appear. The total counts will be lower, but since you are limiting everyone evenly they will still be comparable to each other. You will have to play around a little bit to see if 10,000 or 100,000 (or possibly a higher custom number) is the way to go. Also you will have to sanity check it to make sure you get 10, 25, 50, etc. values for your report. I am not sure how may you are looking for.
I don't know if you have tried this yet or not, but it might help out, until you get the data model up and running.
What you need is summary indexing. This can schedule the search at regular interval and save the results for you for end of the month.
Please go through the link in detail and you'll have the answer.
http://docs.splunk.com/Documentation/Splunk/6.5.2/Knowledge/Usesummaryindexing
Hi swapsplunk,
you should try to schedule your report daily (or less time) and put results in a summary index, so monthly you can use this summary index for your report.
See http://docs.splunk.com/Documentation/Splunk/6.5.2/Knowledge/Usesummaryindexing http://docs.splunk.com/Documentation/Splunk/6.5.2/Knowledge/Configuresummaryindexes
Bye.
Giuseppe