Getting Data In

Does Splunk count the incoming uncompressed data, or the compressed raw data against your license?

ankithreddy777
Contributor

I learned that Splunk compresses the incoming data and creates some index files to point towards compressed raw data. How does Splunk license account for this? Does Splunk charge license for uncompressed incoming data or for the compressed raw data?

0 Karma

beatus
Communicator

Splunk will calculate license volume based on the uncompressed size of the data, regardless of how that data makes it to Splunk. Compression features are there for cost savings at a storage level and network link level.

See: http://blogs.splunk.com/2016/05/06/what-size-should-my-splunk-license-be/

somesoni2
Revered Legend

Splunk license is calculated as the amount of data indexed (which includes creating compressed raw data and index files).

Update

This actually is not true. See the comment below from @mmodestino_splunk.

All that matters is how much raw data makes it into the indexing pipeline (after any filtering).


Licensing is not how much data is in your index, it is how much data did the indexer have to index.
0 Karma

ankithreddy777
Contributor

suppose if there is 100GB incoming uncompressed data . splunk compressed the data to 15GB and created index files of 35GB , i.e 50GB stored on disk. So license will be charged for 100GB or 50GB

0 Karma

mattymo
Splunk Employee
Splunk Employee

it will be 100 GB. There is no need to confuse things with how splunk compresses and stores the data.

If you send 100GB to the indexing pipeline, you are using 100GB of license.

- MattyMo

ankithreddy777
Contributor

Thank you, I have a file thats being ingested to splunk index. Why I am getting different values for the two queries below to find the daily ingestion rate.

1)

index=_internal source=*license_usage.log type="Usage" | eval h=if(len(h)=0 OR isnull(h),"(SQUASHED)",h) | eval s=if(len(s)=0 OR isnull(s),"(SQUASHED)",s) | eval idx=if(len(idx)=0 OR isnull(idx),"(UNKNOWN)",idx) | stats sum(b) as b by idx| eval GB=(((b/1024)/1024))/1000

2)

index=_internal source="/opt/splunk/var/log/splunk/metrics.log" series=index_name  | eval MB=kb/1024 | search group="per_index_thruput" | timechart span=1d sum(MB) by series
0 Karma

beatus
Communicator

See here: http://wiki.splunk.com/Community:TroubleshootingIndexedDataVolume

Metrics.log is only a sampling. license_usage.log is the ground truth.

somesoni2
Revered Legend

It'll be 50 GB, which is basically what Splunk is saving under it's indexes. Besides, there are situations where you filter the data before indexing (to save license volume by removing junk data) OR mask some data , hence incoming data size can't be considered by license volume.

0 Karma

mattymo
Splunk Employee
Splunk Employee

I think we are confusing things a bit here...

All that matters is how much raw data makes it into the indexing pipeline (after any filtering).

Licensing is not how much data is in your index, it is how much data did the indexer have to index.

- MattyMo

somesoni2
Revered Legend

You're right. I got confused with index size vs license size. The uncompressed data going to indexing pipeline will indeed be the license usage.

0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.

Can’t make it to .conf25? Join us online!

Get Updates on the Splunk Community!

Can’t Make It to Boston? Stream .conf25 and Learn with Haya Husain

Boston may be buzzing this September with Splunk University and .conf25, but you don’t have to pack a bag to ...

Splunk Lantern’s Guide to The Most Popular .conf25 Sessions

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...

Unlock What’s Next: The Splunk Cloud Platform at .conf25

In just a few days, Boston will be buzzing as the Splunk team and thousands of community members come together ...