Splunk Search

How to count number of unique part of string from log?

jkauling
Engager

In my logfile I need to count a unique piece of string. This string is many times in the logfile.
The unique parts (bold within this example string, the capital NNNN is a number but not known how many digits and the capital CCC are optional filled with a number or character but also not known how many characters) it is always separated by underscores:
"ID": "20201218_HD_111111_20201218_HD_111111_1111_1000AB_NNNN_CCC_BE"

Your help is much appreciated with this query.

Labels (1)
Tags (4)
0 Karma
1 Solution

VatsalJagani
SplunkTrust
SplunkTrust

@jkauling - You can use the `rex` command.

You can add the below commands at the end of your query to see how many events contain given formatted events.

| rex "_1000AB_(?<extracted_number>\d+)_\w+_"
| search extracted_number=*
| stats count

- If you remove `| stats count` to see the extracted_number field as well.

- If the `ID` field is already being extracted you can replace regex with the below line to improve regex performance.

| rex field=ID "_1000AB_(?<extracted_number>\d+)_\w+_"

 

- If you also expect to see multiple places having this string (single event containing multiple of this unique string) and you like to count them separately, change the rex line with the below line:

| rex "_1000AB_(?<extracted_number>\d+)_\w+_" max_match=0 | mvexpand extracted_number

 

View solution in original post

kamesjaci
New Member

With grep, filter out just the numbers:

grep -Eo '[0-9]+-' file | sort -u | wc -l
  • [0-9] Matches any character between 0 and 9 (any digit).
  • + in extended regular expressions stands for at least one character (that's why the -E option is used with grep). So [0-9]+- matches one or more digits, followed by -.
  • -o only prints the part that matched your pattern, so given input abcd23-gf56, grep will only print 23-.
  • sort -u sorts and filters unique entries (due to -u), and wc -l counts the number of lines in input (hence, the number of unique entriesheadset compatible with macbook pro).
0 Karma

VatsalJagani
SplunkTrust
SplunkTrust

@jkauling - You can use the `rex` command.

You can add the below commands at the end of your query to see how many events contain given formatted events.

| rex "_1000AB_(?<extracted_number>\d+)_\w+_"
| search extracted_number=*
| stats count

- If you remove `| stats count` to see the extracted_number field as well.

- If the `ID` field is already being extracted you can replace regex with the below line to improve regex performance.

| rex field=ID "_1000AB_(?<extracted_number>\d+)_\w+_"

 

- If you also expect to see multiple places having this string (single event containing multiple of this unique string) and you like to count them separately, change the rex line with the below line:

| rex "_1000AB_(?<extracted_number>\d+)_\w+_" max_match=0 | mvexpand extracted_number

 

jkauling
Engager

Thanks this was helpfull/

0 Karma

VatsalJagani
SplunkTrust
SplunkTrust

Kindly accept the answer, so it will be useful for other visitors.

0 Karma
Get Updates on the Splunk Community!

Good Sourcetype Naming

When it comes to getting data in, one of the earliest decisions made is what to use as a sourcetype. Often, ...

See your relevant APM services, dashboards, and alerts in one place with the updated ...

As a Splunk Observability user, you have a lot of data you have to manage, prioritize, and troubleshoot on a ...

Splunk App for Anomaly Detection End of Life Announcement

Q: What is happening to the Splunk App for Anomaly Detection?A: Splunk is officially announcing the ...