I have source as : /log/web/output/sat1svmdb1210_0511_kernel.log
/log/web/output/sat2svmdb0100_7689_kernel.log
I want to capture the hostname i.e. sat1svmdb1210 and sat2svmdb0100 in a field and display all the hostname. How can I do it?
This should work:
your base search
| rex field=_raw "\/(?<hostname>[^_\/]+)[\w\.]+$"
| stats count by hostname
If the strings like this /log/web/output/sat1svmdb1210_0511_kernel.log
are already being extracted into a field like path
, then you could make the search more efficient by specifying that field:
your base search
| rex field=path "\/(?<hostname>[^_\/]+)[\w\.]+$"
| stats count by hostname
@abhi04, if it is the default field host
that you need to have extracted from the source log file
name being monitored, you can Set Default Host for File or Directory input using Regular Expression (either from Web UI or from inputs.conf
configuration file)
[monitor://log/web/output/*.log]
host_regex = ^.*\/([^_]+)\_[^_]+_kernel.log$
This would imply that host name will show up as default field and will not be required to be extracted during Search time. Of course, if hostname
is different from host
you would need to rely on Search Time Field Extraction (using rex command which can be saved as regular expression based Field Extraction using Interactive Field Extraction or props.conf).
Use regex101.com to learn and test regular expressions with sample data. (It provides an step by step explanation of the extraction).
This should work:
your base search
| rex field=_raw "\/(?<hostname>[^_\/]+)[\w\.]+$"
| stats count by hostname
If the strings like this /log/web/output/sat1svmdb1210_0511_kernel.log
are already being extracted into a field like path
, then you could make the search more efficient by specifying that field:
your base search
| rex field=path "\/(?<hostname>[^_\/]+)[\w\.]+$"
| stats count by hostname
Can you please explain the logic if that's possible?
Absolutely. The rex
command is looking at either the full event data (in the first example, where it looks at field=_raw
) or at the particular field (in the second example, where it looks at field=path
). Within that, it is looking to extract a field called hostname
by matching a regular expression that matches "\/(?<hostname>[^_\/]+)[\w\.]+$"
. Probably the best way to explain the regex would be to use regex101:
https://regex101.com/r/pGOUEK/1
But in summary, it's looking for a /
character, then collecting all subsequent characters that are neither _
nor /
, followed by one or more characters that are either "word characters" (alphanumeric OR underscores) or periods - and anchoring all of this to the end of the field by using $
. Sorry, I'm not very good at putting regexes into plain English!
So the [^_/] will search for characters untill _ and / is found?
If yes then why we are nearing /,only _ should be negated.please explain.
@abhi04, @elliotproebstel has provided you with regex101 link i.e. https://regex101.com/r/pGOUEK/2
If you open the link on the right side the EXPLANATION
section give step by step details of pattern match of each individual character in the regular expression.
Even if you are not familiar with Regular Expressions, you would notice in the bottom right there is a QUICK REFERENCE
with Search Reference
text bar where you can type in any character from Regular expression to see what they mean for example [^_\/]
means a single character not present in the list _\/
. With a plus sign +
that follows it means repeat until any character in the list is found.
Also remember to use the code
button i.e. 101010
or shortcut key Ctrl+K
before posting code/data on Splunk Answers so that special characters do not escape.
Thanks for the quick help.
Anytime, do up vote the comments that helped 🙂
As I typed all this out, I realized the first option might not work for you, as the path you're parsing might not be at the end of the event. Here's a fixed regex:
https://regex101.com/r/pGOUEK/2
In Splunk that would be:
your base search
| rex field=_raw "\/(?<hostname>[^_\/]+)[\w\.]+($|\s)"
| stats count by hostname