Splunk Search

group by index and count

seregaserega
Explorer

Hi, I have several collections:
coll_2015_01_01, coll_2015_01_02, coll_2015_01_03, coll_2015_01_04 ...
I want to write a query:

index=coll_2015_01_01 | group by $indexname count
and get:
coll_2015_01_01 123
coll_2015_01_02 234
coll_2015_01_03 333
coll_2015_01_04 555
coll_2015_01_05 444

count of entries in each collection. Can I do it using splunk?
...

Tags (2)
0 Karma
1 Solution

rsennett_splunk
Splunk Employee
Splunk Employee

You have a couple of choices.
You can use the |metadata command (quickest) detailed here
if either the sourcetype or sources are unique. It returns type=host or type=sources or type=sourcetype.
given what the "collections" look like, this would be fine as long as you have one source per collection.
Otherwise you'll need to pipe to a stats command and perhaps use an eval to combine them.

|metadata type=sources index=coll*

Since cleaning that up might be more complex than your current Splunk knowledge allows... you can do this:

index=coll* |stats count by index|sort -count

Which will take longer to return (depending on the timeframe, i.e. how many collections you're covering) but it will give you what you want. If you want to sort by something else... change the field in the |sort -{field} section. remove the - or switch it out for a + if you want the count to sort ascending...

With Splunk... the answer is always "YES!". It just might require more regex than you're prepared for!

View solution in original post

martin_mueller
SplunkTrust
SplunkTrust

For this type of search you're better off using tstats:

| tstats count where index=coll* by index

Should be about two orders of magnitude faster if my home Splunk is a good indicator.

martin_mueller
SplunkTrust
SplunkTrust

This is intended for traditional Splunk indexes with .tsidx files. I don't know for sure how other virtual indexes behave here.

Protip: Tag your questions with Hunk et.al. so people know what you're dealing with.

Another thought, if your data is bunched together by day - wouldn't it be nice to stick them into one index and specify the timestamp properly for Splunk's _time field?

0 Karma

rsennett_splunk
Splunk Employee
Splunk Employee

|tstats might not work... but a virtual index is an index... meaning you refer to it as index=virtual_index_name. so the last one in my answer should work. index=coll* |stats count by index|sort -count unless you are a) not talking about virtual indexes or b) have not kept to the naming convention...

With Splunk... the answer is always "YES!". It just might require more regex than you're prepared for!
0 Karma

rsennett_splunk
Splunk Employee
Splunk Employee

Good point, Martin... Make that an answer so seregaserega can accept the best one. I always forget |tstats

With Splunk... the answer is always "YES!". It just might require more regex than you're prepared for!

seregaserega
Explorer

All answers works, I've accepted the longest 🙂

0 Karma

seregaserega
Explorer

It works for indexes using hadoop-provider. It doesn't work for indexes based on mongo-provider

0 Karma

rsennett_splunk
Splunk Employee
Splunk Employee

@ seregaserega
In Splunk, an index is an index. So, you want to double-check that there isn't something slightly different about the names of the indexes holding 'hadoop-provider' and 'mongo-provider' data. if the names are not collSOMETHINGELSE it won't match.

@Martin_Mueller
All answers were correct, but yours is the most efficient. So I'm gonna go upvote the heck out of your stuff for the equiv Karma points. 🙂

With Splunk... the answer is always "YES!". It just might require more regex than you're prepared for!
0 Karma

rsennett_splunk
Splunk Employee
Splunk Employee

You have a couple of choices.
You can use the |metadata command (quickest) detailed here
if either the sourcetype or sources are unique. It returns type=host or type=sources or type=sourcetype.
given what the "collections" look like, this would be fine as long as you have one source per collection.
Otherwise you'll need to pipe to a stats command and perhaps use an eval to combine them.

|metadata type=sources index=coll*

Since cleaning that up might be more complex than your current Splunk knowledge allows... you can do this:

index=coll* |stats count by index|sort -count

Which will take longer to return (depending on the timeframe, i.e. how many collections you're covering) but it will give you what you want. If you want to sort by something else... change the field in the |sort -{field} section. remove the - or switch it out for a + if you want the count to sort ascending...

With Splunk... the answer is always "YES!". It just might require more regex than you're prepared for!

seregaserega
Explorer

Great thank you!
It works, thanks for the detailed explanation, useful.

0 Karma

seregaserega
Explorer
|metadata type=sources index=coll*

returns nothing unfortunately

|metadata type=sources index=*

returns hadoop index based on hadoop provider. My coll* virtual indexes are based on mongo. I use mongo provider https://splunkbase.splunk.com/app/1810/#/documentation
to get access to mongo data using Hunk

0 Karma

woodcock
Esteemed Legend

Try this:

 index=coll* | stats count by index
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...