When running license usage reports by host we are hitting the squash_threshold in server.conf.
I've researched this and the only solution I can see it to increase the squash_threshold beyond the number of combinations of index, host, source and sourcetype, which I calculate by running this search:
| tstats count AS tuples where index IN (uk*, us*) by index host source sourcetype
The docs say there will be an impact on memory, though there's no indication on what that might look like.
Do you have any real world experience of the impact of doing this?
Is the license usage log the only method of calculating host based usage? It is from what I've found so far in my reading.
Thanks!
You can either create an indexed field holding the raw event length so you can quickly do tstats.
Or - even better - create a simple datamodel holding length of your events as a calculated field. And accelerate it.
It will have _some_ impact on your environment but still better than plowing through raw data every time you need a report on your license usage.
You can also go for configuring the Chargeback App for Splunk. It was created for the same intended purpose. The configuration is quite complex at the initial stage, but once you setup the lookup files it needs, you'll get a clear view of how much of the fund is consumed by which part of the project/team/business etc.
If you or anyone else has experience of using the chargeback app please do share it. I'm sure others will find it helpful too.
Hey @DataWrangler,
I have assisted customers set up the Splunk App for Chargeback and they've mentioned that they were able to distribute the costs as per their requirement within the different business units/orgs. You need to make sure that you enter the right proportion in the lookup while configuring the business units. I do agree that a one time setup is a complicated process, but it does eventually help with the task.
Thanks,
Tejas.
You can either create an indexed field holding the raw event length so you can quickly do tstats.
Or - even better - create a simple datamodel holding length of your events as a calculated field. And accelerate it.
It will have _some_ impact on your environment but still better than plowing through raw data every time you need a report on your license usage.
Two good options there. So I can see we have as is often the case with Splunk, different ways to tackle this issue.
I will go with the quick option 2 and constrain this to the indexes / hosts I need to report on. I can work through the others when I get some quiet time 😁
Thanks everyone for your responses which helped me think this through and find a practical solution.
The third option is actually kinda like "option 2 on steroids". For a one-off thing, a simple search over raw data will probably suffice. If you're planning on doing this often and especially if your data set is big, you will want to accelerate that somehow.
You are right. For now this will be an infrequent ask perhaps monthly or quarterly, so I will run it adhoc or schedule it to run as a report overnight.
Definitely worth looking at making this more efficient.
What is the real work issue which you are trying to solve? And is this one shot or continuously needed answer to it?
We run Splunk in a project and the costs are billed to each part of the project as they are funded separately through separate changes.
So when a new set of servers are built, I'd like to be able to report with reasonable accuracy that these 10 new web servers are using say 5GB of license on average per day for billing.
This will also help us predict future usage when we add another 5 servers of the same type.
It will be an ongoing requirement to accurately report costs and bill correctly.
I've started investigating the Splunk App for Chargeback which seems useful but overly complex for this requirement.
Hey @DataWrangler,
I do agree to the document to not increase the squash_threshold. However, if you want host based utilization, you can check metrics.log with group=per_host_thruput and then sum up the values of kb and group it by series. The series field will contain the host values for group=per_host_thruput.
index=_internal source=*var/log/splunk/metrics.log* group=per_host_thruput
| eval gb = round(kb/1024/1024,2)
| timechart sum(gb) as total_ingestion by series
I haven't tried experimenting the value of squash_threshold.
Thanks,
Tejas.
---
If the above solution helps, an upvote is appreciated..!!