Hello
I am getting my vpn logs in syslog format on my single splunk deployment instance and I am having trouble figuring out the proper way to extract the fields.
Aug 11 15:57:00 uf-log-ads-01.s.uw.edu 122.211.777.777/111.111.111.112 {"EVENT":"ACCESS_SESSION_OPEN","session.server.network.name":"dept-falconsonnet-ns.uf.edu","session.server.landinguri":"/dservers","session.logon.last.username":"carl","session.saml.last.attr.friendlyName.eduPersonAffiliation":"| member | staff | employee | alum | faculty |","session.client.platform":"Win10","session.client.cpu":"WOW64","session.user.clientip":"111.11.111.111","session.user.ipgeolocation.continent":"NA","session.user.ipgeolocation.country_code":"US","session.user.ipgeolocation.state":"Georgia","session.user.starttime":"1597186611","sessionid":"b5b42313cbb528a386beafff72cd5cef"}
Well now I am trying to figure out what the best way it is to extract the field names that I care about. I was having trouble figuring this out because when I do the delimiter extraction and separate by comma I have the problem of seeing "session.saml.last.attr.friendlyName.eduPersonAffiliation":"| member | staff | employee |"
Alright so I ruled out the delimiter extraction. I began doing regex and everything was going good until I noticed that the the field continent , after extracting it, saving it, and then doing a search, was picked up by only some events and others were missing it even though the variable and continent were the same in this case "NA" The same thing happened with the sessionid field. I compared both logs they were coming from the same source, sourcetype, index. Main differences were the sessions, session start times, ips, usernames.
The syslog comes into the server and writes to a .log file on my splunk server so that is how I got the data to be indexed on splunk by monitoring a directory. But now I am stuck and not sure how I should approach this problem.
Hi @splunktrainingu,
network inputs are the inputs from syslog in TCP or UDP protocol that you can find in [Settings -- data Inputs -- TCP/UDP].
Anyway, I understood that you already have logs in files, so you don't need Network Data Inputs.
You don't have any extracted field for the format of your logs and because you don't have a Technical Add-On (specialized Apps to parse logs)At first see in splunk baseline (apps.splunk.com) if there's a TA already done.
If not, you have to create your parser extracting the fields you need.
to do this you can use regexes and/or the spath command
e.g. running this search:
your_search
| rex "[^\{]*(?<_raw>.*)"
| spath
| rex field="session.saml.last.attr.friendlyName.eduPersonAffiliation" "^\|\s+(?<member>[^\|]*)\|\s(?<staff>[^\|]*)\|\s(?<employee>[^\|]*)\|\s(?<alum>[^\|]*)\|\s(?<faculty>[^\|]*)\|"
Ciao.
Giuseppe
props.conf
[your_vpn]
DATETIME_CONFIG =
LINE_BREAKER = ([\r\n]+)
NO_BINARY_CHECK = true
SHOULD_LINEMERGE = false
category = Custom
pulldown_type = true
disabled = false
TRANSFORMS-sed = ip_sed1, ip_sed2
transforms.conf
[ip_sed1]
REGEX = ^\w+\s\d+\s\S+\s(?P<hostname>\S+)\s(?<ip1>[^\/]+)\/(?<ip2>[^\/]+)\s(?<json>.*)
FORMAT = hostname::$1 ip1::$2 ip2::$3 json::$4
WRITE_META = true
[ip_sed2]
REGEX = \"(?<name>[^\"]+)\":\"(?<value>[^\"]+)\"
FORMAT = "$1"::"$2"
REPEAT_MATCH = true
WRITE_META = true
@to4kawa What is ip_sed1 and ip_sed2?
stanza name
@to4kawa Do I need to make a a stanza for each field I am extracting?
Hi @splunktrainingu,
sorry nut I don't understand what's your problem (and your question), let me summarize:
So what you need from your field extraction?
Ciao.
Giuseppe
1. yes, the log comes in json format.
2. I do not know what you mean by network input. This is what I have done Data Input > Files and Directory > /var/log/vpn.log > continuous monitoring. This means I can search the data in my index.
3. When I went to search the data there was no extraction done on the data so I couldn't search those fields. Only fields I can search are the host, source, index, and sourcetype which are added by splunk I would assume. So I pretty much I have a raw json log with no extractions.
4. Delimiters does NOT work because it extracts the field like this into my "interesting fields" when using a comma session.saml.last.attr.friendlyName.eduPersonAffiliation":"| student | member | staff | employee | alum |" This is not helpful at all because it is putting the field name and the data together as a field extracted . If I decided to change the delimiter to the colon it will then put the assigned data of the previous field name with an upcoming field name together. If I use quotation marks as the delimiter it creates 53 fields some of which are not needed because it creates a colon as a field and I cannot remove them.
5. All of the fields are always populate the problem that I am experiencing is when I use the regex method which I thought would be better because I can choose what fields to extract. If I want to extract the field name "session.user.ipgeolocation.continent":"NA" and rename it to "continent" well it only works on two of the logs even though it exists in all of the logs. I want all of my fields extracted to work on all of the logs not just the two logs.
Hi @splunktrainingu,
network inputs are the inputs from syslog in TCP or UDP protocol that you can find in [Settings -- data Inputs -- TCP/UDP].
Anyway, I understood that you already have logs in files, so you don't need Network Data Inputs.
You don't have any extracted field for the format of your logs and because you don't have a Technical Add-On (specialized Apps to parse logs)At first see in splunk baseline (apps.splunk.com) if there's a TA already done.
If not, you have to create your parser extracting the fields you need.
to do this you can use regexes and/or the spath command
e.g. running this search:
your_search
| rex "[^\{]*(?<_raw>.*)"
| spath
| rex field="session.saml.last.attr.friendlyName.eduPersonAffiliation" "^\|\s+(?<member>[^\|]*)\|\s(?<staff>[^\|]*)\|\s(?<employee>[^\|]*)\|\s(?<alum>[^\|]*)\|\s(?<faculty>[^\|]*)\|"
Ciao.
Giuseppe