Hi everyone,
I'm a new splunk user and I need a help about field extractions.
My splunk receive data from a syslog server and at the moment of indexing data, the splunk index the following way (I've hidden some fields with 'xx'):
2017-06-05T10:59:14-04:00 ins-web01 sshd[32401]: pam_unix(sshd:session): session closed for user xxxxxxx
host =splunk01.infra.xx source =/var/log/remote/auth/ins-web01/sshd.log sourcetype =sshd-too_small
I have two problems here:
1) the host field is incorrect
2) the sourcetype field is incorrect
I've extracted the right host and sourcetype field but it does not work.
My regex is ^[^ \n] (?P[^ ]+).*
Could anyone help me?
Tks.
You are doing syslog wrong. You should be sending each sourcetype to a different port and the explicitly setting the sourcetype for each port. DO NOT EVER let splunk automatically set the sourcetype.
Read these and start over.
http://www.georgestarcher.com/splunk-success-with-syslog/
http://docs.splunk.com/Documentation/Splunk/latest/Data/Listofpretrainedsourcetypes
I have a rsyslog server that is sending its data to the indexers through an HEC, so the data is kind of like yours, in that the data appears to come from the syslog server as the host instead of the originating host, though in my case the data is classified as the right sourcetype. The following information should be able to be used with the sourcetype as well.
I have a transforms.conf
file with the following configuration:
[hostextract]
REGEX = ^\w\w\w \d+ \d\d:\d\d:\d\d (([a-zA-Z]|\d+\.)[^ ]+)
SOURCE_KEY = _raw
DEST_KEY = MetaData:Host
FORMAT = host::$1
That will extract the hostname from the data and set it at index time. It is tied to the data with the following props.conf
file configuration:
[cisco:asa]
TRANSFORMS-hostextract = hostextract
You should be able to do the same sort of thing by setting the props.conf
to use the hostname instead of the sourcetype like this:
[host::splunk01.infra.xx]
TRANSFORMS-hostextract = hostextract
Then you should be able to extract the host as above (use a regex for your data, like the following:
REGEX = ^\\d{4}-\d\d-\d\dT\d\d:\d\d:\d\d-\d\d:\d\d ([^ ]+)
Do another for your sourcetype, but I'm going to assume that you need a sourcetype the same for all the entries coming in. If not, this would have to change (of course). The props.conf
:
[host::splunk01.infra.xx]
TRANSFORMS-hostextract = hostextract,sourcetypeextract
And the entry in transforms.conf
for the sourcetype might be something like:
[sourcetypeextract]
DEST_KEY = MetaData:Sourcetype
FORMAT = sourcetype::yoursourcetype
Now, I haven't tested this, but I think that the tech is close.
Thanks for your support cpetterborg.
I'll try this and I return if it works.
What kind of syslog server (rsyslog or syslog-ng for example) are you using? I know you say in your question that it is a syslog server, but it can make a difference which one.
How are you getting that data from the syslog server to the indexers? Dumping to files and HF or UF to send the data? Forwarding the data? HTTP Event Collector on the indexers?
Where do you have your regex for extracting the host and sourcetype?
Hi cpetterborg,
I'm using rsyslog and forwarding data directly to splunk.
To extract the fields, I usually use the extract field menu, select the incorrect field and reclassify it to the correct name.
Is this clear?
Thanks.
A sourcetype of "xx-too_small" means Splunk does not have enough data to guess about the correct sourcetype to apply. Either you have not specified a sourcetype for that input or the sourcetype specification is in the wrong place.