Splunk Search

Add sample message to stats aggregation

BorrajaX
Explorer

Hello everyone!

I would like to know if there's a way of adding a sample with the full contents of an event (the _raw would suffice) when creating a table with stats.

Let me elaborate: Let's say my servers log messages that, among others, it contains the keys hostname (to identify the machine's name it came from) plus the Python module and l*ine within the module* where the log message was generated. Something like:

15:57:31 <155> server01 py_module=foo_module:30 This is a foo message
15:57:32 <187> server02 py_module=bar_module:65 Something happened here with wat
15:57:32 <187> server01 py_module=bar_module:65 Something happened here with blugh
15:57:33 <187> server01 py_module=bar_module:65 Something happened here with who knows?
15:57:34 <155> server02 py_module=foo_module:30 This is a bar message
15:57:35 <155> server02 py_module=foo_module:30 This is a baz message
15:57:33 <187> server01 py_module=bar_module:65 Something happened here with wooot???

So... I have no problem grouping by hostname and outputting a table with the most verbose loggers. Something like:

earliest=-24h 
    | rex field=host "(?<host_name>[a-zA-Z]+)\d*" 
    | stats count by host_name, py_module
    | sort -count 
    | head 10

Which produces a neat table like:

+----------------+---------------+-----------+
|    host_name   |   py_module   |   count   |
+----------------+---------------+-----------+
|      server    | foo_module:30 |     3     |
|      server    | bar_module:65 |     4     |
+----------------+---------------+-----------+

Now, what I'd like, is showing one extra column with one of the messages captured. I don't really care which one... Just one (could be the first matched, the last matched... whatever) so the table looks like:

Desired result:

+----------------+---------------+-----------+-------------------------------------+
|    host_name   |   py_module   |   count   |                 sample              |
+----------------+---------------+-----------+-------------------------------------+
|      server    | foo_module:30 |     3     |   This is a foo message             |
|      server    | bar_module:65 |     4     |   Something happened here with wat  |
+----------------+---------------+-----------+-------------------------------------+

As mentioned before, at this point, I'd be happy if I could get one of the _raws, showing the full event (no need to show only the contents of the message itself, is what I mean)

Is this even possible? I'd like to add some of the tries I've done (all of them with subsearches) but I always get a No results found warning so I'd say it's kind of pointless...

Thank you in advance!

Tags (3)
0 Karma
1 Solution

acharlieh
Influencer

In the ideal case, you would extract the message component into its own field, but we can use _raw for now. With stats functions capturing the first value is quite easy. We could just add an additional calculation to your stats statement (using the first function) like so:

| stats count first(_raw) as sample by host_name, py_module

View solution in original post

acharlieh
Influencer

In the ideal case, you would extract the message component into its own field, but we can use _raw for now. With stats functions capturing the first value is quite easy. We could just add an additional calculation to your stats statement (using the first function) like so:

| stats count first(_raw) as sample by host_name, py_module

View solution in original post

MuS
SplunkTrust
SplunkTrust

Hi BorrajaX,

try something like this:

earliest=-24h 
 | rex field=host "(?<host_name>[a-zA-Z]+)\d*" 
 | rex field=_raw ":\d{2}\s(?<sample>[\w\s]+)$"
 | stats count by host_name, py_module, sample
 | sort -count 
 | head 10

This will create a new field called sample containing any alphanumeric and white space character after

a : and two digits and a white space until the end of the line

Hope that helps ...

cheers, MuS

BorrajaX
Explorer

It does help and that's why I upvoted it, but it looks like it's creating different entries when the sample is different. What I wanted is group by hostname and py_module:line# and then show one sample line.

0 Karma
Did you miss .conf21 Virtual?

Good news! The event's keynotes and many of its breakout sessions are now available online, and still totally FREE!