Solved: Re: group by index and count

seregaserega · ‎05-06-2015

Hi, I have several collections:
coll_2015_01_01, coll_2015_01_02, coll_2015_01_03, coll_2015_01_04 ...
I want to write a query:

index=coll_2015_01_01 | group by $indexname count
and get:
coll_2015_01_01 123
coll_2015_01_02 234
coll_2015_01_03 333
coll_2015_01_04 555
coll_2015_01_05 444

count of entries in each collection. Can I do it using splunk?
...

rsennett_splunk · ‎05-06-2015

You have a couple of choices.
You can use the |metadata command (quickest) detailed here
if either the sourcetype or sources are unique. It returns type=host or type=sources or type=sourcetype.
given what the "collections" look like, this would be fine as long as you have one source per collection.
Otherwise you'll need to pipe to a stats command and perhaps use an eval to combine them.

|metadata type=sources index=coll*

Since cleaning that up might be more complex than your current Splunk knowledge allows... you can do this:

index=coll* |stats count by index|sort -count

Which will take longer to return (depending on the timeframe, i.e. how many collections you're covering) but it will give you what you want. If you want to sort by something else... change the field in the |sort -{field} section. remove the - or switch it out for a + if you want the count to sort ascending...

With Splunk... the answer is always "YES!". It just might require more regex than you're prepared for!

View solution in original post

martin_mueller · ‎05-06-2015

For this type of search you're better off using tstats:

| tstats count where index=coll* by index

Should be about two orders of magnitude faster if my home Splunk is a good indicator.

martin_mueller · ‎05-07-2015

This is intended for traditional Splunk indexes with .tsidx files. I don't know for sure how other virtual indexes behave here.

Protip: Tag your questions with Hunk et.al. so people know what you're dealing with.

Another thought, if your data is bunched together by day - wouldn't it be nice to stick them into one index and specify the timestamp properly for Splunk's _time field?

rsennett_splunk · ‎05-07-2015

|tstats might not work... but a virtual index is an index... meaning you refer to it as index=virtual_index_name. so the last one in my answer should work. index=coll* |stats count by index|sort -count unless you are a) not talking about virtual indexes or b) have not kept to the naming convention...

With Splunk... the answer is always "YES!". It just might require more regex than you're prepared for!

rsennett_splunk · ‎05-06-2015

Good point, Martin... Make that an answer so seregaserega can accept the best one. I always forget |tstats

With Splunk... the answer is always "YES!". It just might require more regex than you're prepared for!

seregaserega · ‎05-07-2015

All answers works, I've accepted the longest 🙂

seregaserega · ‎05-07-2015

It works for indexes using hadoop-provider. It doesn't work for indexes based on mongo-provider

rsennett_splunk · ‎05-07-2015

@ seregaserega
In Splunk, an index is an index. So, you want to double-check that there isn't something slightly different about the names of the indexes holding 'hadoop-provider' and 'mongo-provider' data. if the names are not collSOMETHINGELSE it won't match.

@Martin_Mueller
All answers were correct, but yours is the most efficient. So I'm gonna go upvote the heck out of your stuff for the equiv Karma points. 🙂

With Splunk... the answer is always "YES!". It just might require more regex than you're prepared for!

rsennett_splunk · ‎05-06-2015

You have a couple of choices.
You can use the |metadata command (quickest) detailed here
if either the sourcetype or sources are unique. It returns type=host or type=sources or type=sourcetype.
given what the "collections" look like, this would be fine as long as you have one source per collection.
Otherwise you'll need to pipe to a stats command and perhaps use an eval to combine them.

|metadata type=sources index=coll*

Since cleaning that up might be more complex than your current Splunk knowledge allows... you can do this:

index=coll* |stats count by index|sort -count

Which will take longer to return (depending on the timeframe, i.e. how many collections you're covering) but it will give you what you want. If you want to sort by something else... change the field in the |sort -{field} section. remove the - or switch it out for a + if you want the count to sort ascending...

With Splunk... the answer is always "YES!". It just might require more regex than you're prepared for!

seregaserega · ‎05-07-2015

Great thank you!
It works, thanks for the detailed explanation, useful.

seregaserega · ‎05-07-2015

|metadata type=sources index=coll*

returns nothing unfortunately

|metadata type=sources index=*

returns hadoop index based on hadoop provider. My coll* virtual indexes are based on mongo. I use mongo provider https://splunkbase.splunk.com/app/1810/#/documentation
to get access to mongo data using Hunk

woodcock · ‎05-06-2015

Try this:

 index=coll* | stats count by index

group by index and count

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Agent Mode Engaged! Enchaining Agentic Operations with Splunk AI Assistant 2.0

Announcing Modern Navigation: A New Era of Splunk User Experience

Modernize your Splunk Apps – Introducing Python 3.13 in Splunk

Join the Conversation

group by index and count

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Agent Mode Engaged! Enchaining Agentic Operations with Splunk AI Assistant 2.0

Announcing Modern Navigation: A New Era of Splunk User Experience

Modernize your Splunk Apps – Introducing Python 3.13 in Splunk