Solved: How to count number of values case-insensitive, bu...

eugenek · ‎04-14-2015

I would like to count ignoring case, which can be down with eval lower. However, when displaying the results, I would like to show the "most popular" version of the capitalization.

Example:
q=Apple
q=apple
q=Apple
q=PC

The count for apple would be 3 when ignoring case, but is there a way to use the most popular variant of the capitalizes (in this case "Apple") associated with the total count of 3?

Best I can come up with right now would be a rather manual approach:
1. Create a new field that’s all the same case using lower()
2. Calculate the counts for that field
3. Calculate the counts for the original, mixed-case field
4. Create a new field which takes the capitalization from the most popular version, based on step 3
5. As the result, use label values from step 4 and counts from step 2

If there is no built-in way, I'll mark implementation of the steps above as an accepted answer.

somesoni2 · ‎04-14-2015

Try something like this

Your base search giving field q | stats count by q | sort -count | eval qtemp=lower(q) | stats sum(count) as count list(q) as q list(count) as countq by qtemp | eval q=mvindex(q,0) | table q, count

A sample runanywhere query :

|gentimes start=-1 | eval q="Apple apple Apple PC apple Apple pc pc aPPLE aPPLE aPPLE aPPLE" | table q | makemv q | mvexpand q | stats count by q | sort -count | eval qtemp=lower(q) | stats sum(count) as count list(q) as q list(count) as countq by qtemp | eval q=mvindex(q,0) | table q, count

View solution in original post

somesoni2 · ‎04-14-2015

Try something like this

Your base search giving field q | stats count by q | sort -count | eval qtemp=lower(q) | stats sum(count) as count list(q) as q list(count) as countq by qtemp | eval q=mvindex(q,0) | table q, count

A sample runanywhere query :

|gentimes start=-1 | eval q="Apple apple Apple PC apple Apple pc pc aPPLE aPPLE aPPLE aPPLE" | table q | makemv q | mvexpand q | stats count by q | sort -count | eval qtemp=lower(q) | stats sum(count) as count list(q) as q list(count) as countq by qtemp | eval q=mvindex(q,0) | table q, count

eugenek · ‎04-15-2015

If I could give bonus points for "sample runanywhere query", I would. Well done!
... I guess I can give bonus points.

stephane_cyrill · ‎04-15-2015

That is really good. for your information you can give bonus to someone by clicking on award points

stephane_cyrill · ‎04-14-2015

Hi, eugenek

Assuming that the field contening how value is q , with the following you can have what you are describing easily.

1- Count of value of field q that start with lowercase (apple):

.... q=* | where like(q, "apple%")|stats count(q) AS Count_apple

2- Count of value of field q that start with uppercase (Apple):

.... q=* | where like(q, "Apple%")|stats count(q) AS Count_Apple

3- count of the original mixed field( only for values of apple & Apple) :

.... q=* | where like(q, "Apple%") OR where like(q, "apple%") |stats count(q) AS mixed_count

eugenek · ‎04-15-2015

That's a very specific case. See accepted answer for a generic solution

How to count number of values case-insensitive, but show the most popular case version in the results (ex: 2 "Apple" + 1 "apple" = 3 "Apple")?

AppDynamics Summer Webinars

SOCin’ it to you at Splunk University

Credit Card Data Protection & PCI Compliance with Splunk Edge Processor

Are you a member of the Splunk Community?

How to count number of values case-insensitive, but show the most popular case version in the results (ex: 2 "Apple" + 1 "apple" = 3 "Apple")?

AppDynamics Summer Webinars

SOCin’ it to you at Splunk University

Credit Card Data Protection & PCI Compliance with Splunk Edge Processor