Hello everyone!
I would like to know if there's a way of adding a sample with the full contents of an event (the _raw
would suffice) when creating a table with stats
.
Let me elaborate: Let's say my servers log messages that, among others, it contains the keys hostname (to identify the machine's name it came from) plus the Python module and l*ine within the module* where the log message was generated. Something like:
15:57:31 <155> server01 py_module=foo_module:30 This is a foo message
15:57:32 <187> server02 py_module=bar_module:65 Something happened here with wat
15:57:32 <187> server01 py_module=bar_module:65 Something happened here with blugh
15:57:33 <187> server01 py_module=bar_module:65 Something happened here with who knows?
15:57:34 <155> server02 py_module=foo_module:30 This is a bar message
15:57:35 <155> server02 py_module=foo_module:30 This is a baz message
15:57:33 <187> server01 py_module=bar_module:65 Something happened here with wooot???
So... I have no problem grouping by hostname
and outputting a table with the most verbose loggers. Something like:
earliest=-24h
| rex field=host "(?<host_name>[a-zA-Z]+)\d*"
| stats count by host_name, py_module
| sort -count
| head 10
Which produces a neat table like:
+----------------+---------------+-----------+
| host_name | py_module | count |
+----------------+---------------+-----------+
| server | foo_module:30 | 3 |
| server | bar_module:65 | 4 |
+----------------+---------------+-----------+
Now, what I'd like, is showing one extra column with one of the messages captured. I don't really care which one... Just one (could be the first matched, the last matched... whatever) so the table looks like:
Desired result:
+----------------+---------------+-----------+-------------------------------------+
| host_name | py_module | count | sample |
+----------------+---------------+-----------+-------------------------------------+
| server | foo_module:30 | 3 | This is a foo message |
| server | bar_module:65 | 4 | Something happened here with wat |
+----------------+---------------+-----------+-------------------------------------+
As mentioned before, at this point, I'd be happy if I could get one of the _raw
s, showing the full event (no need to show only the contents of the message itself, is what I mean)
Is this even possible? I'd like to add some of the tries I've done (all of them with subsearch
es) but I always get a No results found
warning so I'd say it's kind of pointless...
Thank you in advance!
In the ideal case, you would extract the message component into its own field, but we can use _raw
for now. With stats functions capturing the first value is quite easy. We could just add an additional calculation to your stats statement (using the first
function) like so:
| stats count first(_raw) as sample by host_name, py_module
In the ideal case, you would extract the message component into its own field, but we can use _raw
for now. With stats functions capturing the first value is quite easy. We could just add an additional calculation to your stats statement (using the first
function) like so:
| stats count first(_raw) as sample by host_name, py_module
Hi BorrajaX,
try something like this:
earliest=-24h
| rex field=host "(?<host_name>[a-zA-Z]+)\d*"
| rex field=_raw ":\d{2}\s(?<sample>[\w\s]+)$"
| stats count by host_name, py_module, sample
| sort -count
| head 10
This will create a new field called sample
containing any alphanumeric and white space character after
a : and two digits and a white space until the end of the line
Hope that helps ...
cheers, MuS
It does help and that's why I upvoted it, but it looks like it's creating different entries when the sample
is different. What I wanted is group by hostname
and py_module:line#
and then show one sample line.