Splunk Search

Add sample message to stats aggregation

BorrajaX
Explorer

Hello everyone!

I would like to know if there's a way of adding a sample with the full contents of an event (the _raw would suffice) when creating a table with stats.

Let me elaborate: Let's say my servers log messages that, among others, it contains the keys hostname (to identify the machine's name it came from) plus the Python module and l*ine within the module* where the log message was generated. Something like:

15:57:31 <155> server01 py_module=foo_module:30 This is a foo message
15:57:32 <187> server02 py_module=bar_module:65 Something happened here with wat
15:57:32 <187> server01 py_module=bar_module:65 Something happened here with blugh
15:57:33 <187> server01 py_module=bar_module:65 Something happened here with who knows?
15:57:34 <155> server02 py_module=foo_module:30 This is a bar message
15:57:35 <155> server02 py_module=foo_module:30 This is a baz message
15:57:33 <187> server01 py_module=bar_module:65 Something happened here with wooot???

So... I have no problem grouping by hostname and outputting a table with the most verbose loggers. Something like:

earliest=-24h 
    | rex field=host "(?<host_name>[a-zA-Z]+)\d*" 
    | stats count by host_name, py_module
    | sort -count 
    | head 10

Which produces a neat table like:

+----------------+---------------+-----------+
|    host_name   |   py_module   |   count   |
+----------------+---------------+-----------+
|      server    | foo_module:30 |     3     |
|      server    | bar_module:65 |     4     |
+----------------+---------------+-----------+

Now, what I'd like, is showing one extra column with one of the messages captured. I don't really care which one... Just one (could be the first matched, the last matched... whatever) so the table looks like:

Desired result:

+----------------+---------------+-----------+-------------------------------------+
|    host_name   |   py_module   |   count   |                 sample              |
+----------------+---------------+-----------+-------------------------------------+
|      server    | foo_module:30 |     3     |   This is a foo message             |
|      server    | bar_module:65 |     4     |   Something happened here with wat  |
+----------------+---------------+-----------+-------------------------------------+

As mentioned before, at this point, I'd be happy if I could get one of the _raws, showing the full event (no need to show only the contents of the message itself, is what I mean)

Is this even possible? I'd like to add some of the tries I've done (all of them with subsearches) but I always get a No results found warning so I'd say it's kind of pointless...

Thank you in advance!

Tags (3)
0 Karma
1 Solution

acharlieh
Influencer

In the ideal case, you would extract the message component into its own field, but we can use _raw for now. With stats functions capturing the first value is quite easy. We could just add an additional calculation to your stats statement (using the first function) like so:

| stats count first(_raw) as sample by host_name, py_module

View solution in original post

acharlieh
Influencer

In the ideal case, you would extract the message component into its own field, but we can use _raw for now. With stats functions capturing the first value is quite easy. We could just add an additional calculation to your stats statement (using the first function) like so:

| stats count first(_raw) as sample by host_name, py_module

MuS
SplunkTrust
SplunkTrust

Hi BorrajaX,

try something like this:

earliest=-24h 
 | rex field=host "(?<host_name>[a-zA-Z]+)\d*" 
 | rex field=_raw ":\d{2}\s(?<sample>[\w\s]+)$"
 | stats count by host_name, py_module, sample
 | sort -count 
 | head 10

This will create a new field called sample containing any alphanumeric and white space character after

a : and two digits and a white space until the end of the line

Hope that helps ...

cheers, MuS

BorrajaX
Explorer

It does help and that's why I upvoted it, but it looks like it's creating different entries when the sample is different. What I wanted is group by hostname and py_module:line# and then show one sample line.

0 Karma
Get Updates on the Splunk Community!

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...