Solved: Syslog and field extraction(regex)

splunktrainingu · ‎08-11-2020

Hello

I am getting my vpn logs in syslog format on my single splunk deployment instance and I am having trouble figuring out the proper way to extract the fields.

Aug 11 15:57:00 uf-log-ads-01.s.uw.edu 122.211.777.777/111.111.111.112 {"EVENT":"ACCESS_SESSION_OPEN","session.server.network.name":"dept-falconsonnet-ns.uf.edu","session.server.landinguri":"/dservers","session.logon.last.username":"carl","session.saml.last.attr.friendlyName.eduPersonAffiliation":"| member | staff | employee | alum | faculty |","session.client.platform":"Win10","session.client.cpu":"WOW64","session.user.clientip":"111.11.111.111","session.user.ipgeolocation.continent":"NA","session.user.ipgeolocation.country_code":"US","session.user.ipgeolocation.state":"Georgia","session.user.starttime":"1597186611","sessionid":"b5b42313cbb528a386beafff72cd5cef"}

Well now I am trying to figure out what the best way it is to extract the field names that I care about. I was having trouble figuring this out because when I do the delimiter extraction and separate by comma I have the problem of seeing "session.saml.last.attr.friendlyName.eduPersonAffiliation":"| member | staff | employee |"

Alright so I ruled out the delimiter extraction. I began doing regex and everything was going good until I noticed that the the field continent , after extracting it, saving it, and then doing a search, was picked up by only some events and others were missing it even though the variable and continent were the same in this case "NA" The same thing happened with the sessionid field. I compared both logs they were coming from the same source, sourcetype, index. Main differences were the sessions, session start times, ips, usernames.

The syslog comes into the server and writes to a .log file on my splunk server so that is how I got the data to be indexed on splunk by monitoring a directory. But now I am stuck and not sure how I should approach this problem.

gcusello · ‎08-13-2020

Hi @splunktrainingu,

network inputs are the inputs from syslog in TCP or UDP protocol that you can find in [Settings -- data Inputs -- TCP/UDP].

Anyway, I understood that you already have logs in files, so you don't need Network Data Inputs.

You don't have any extracted field for the format of your logs and because you don't have a Technical Add-On (specialized Apps to parse logs)At first see in splunk baseline (apps.splunk.com) if there's a TA already done.

If not, you have to create your parser extracting the fields you need.
to do this you can use regexes and/or the spath command

e.g. running this search:

your_search
| rex "[^\{]*(?<_raw>.*)"
| spath
| rex field="session.saml.last.attr.friendlyName.eduPersonAffiliation" "^\|\s+(?<member>[^\|]*)\|\s(?<staff>[^\|]*)\|\s(?<employee>[^\|]*)\|\s(?<alum>[^\|]*)\|\s(?<faculty>[^\|]*)\|"

Ciao.

Giuseppe

View solution in original post

to4kawa · ‎08-12-2020

props.conf

[your_vpn]
DATETIME_CONFIG =
LINE_BREAKER = ([\r\n]+)
NO_BINARY_CHECK = true
SHOULD_LINEMERGE = false
category = Custom
pulldown_type = true
disabled = false
TRANSFORMS-sed = ip_sed1, ip_sed2

transforms.conf

[ip_sed1]
REGEX = ^\w+\s\d+\s\S+\s(?P<hostname>\S+)\s(?<ip1>[^\/]+)\/(?<ip2>[^\/]+)\s(?<json>.*)
FORMAT = hostname::$1 ip1::$2 ip2::$3 json::$4
WRITE_META = true

[ip_sed2]
REGEX = \"(?<name>[^\"]+)\":\"(?<value>[^\"]+)\"
FORMAT = "$1"::"$2"
REPEAT_MATCH = true
WRITE_META = true

splunktrainingu · ‎08-12-2020

@to4kawa What is ip_sed1 and ip_sed2?

to4kawa · ‎08-12-2020

stanza name

splunktrainingu · ‎08-12-2020

@to4kawa Do I need to make a a stanza for each field I am extracting?

gcusello · ‎08-12-2020

Hi @splunktrainingu,

sorry nut I don't understand what's your problem (and your question), let me summarize:

you have a log in json format,
youcan receive it by syslog (network input) or by file monitoring,
you have to extract the fields and you're able to divide the " session.saml.last.attr.friendlyName.eduPersonAffiliation" field in the five fields,
then you have some other fields (e.g. " session.user.ipgeolocation.continent") that sometimes has the value NA, but your logs are these.

So what you need from your field extraction?

Ciao.

Giuseppe

splunktrainingu · ‎08-12-2020

@gcusello

1. yes, the log comes in json format.

2. I do not know what you mean by network input. This is what I have done Data Input > Files and Directory > /var/log/vpn.log > continuous monitoring. This means I can search the data in my index.

3. When I went to search the data there was no extraction done on the data so I couldn't search those fields. Only fields I can search are the host, source, index, and sourcetype which are added by splunk I would assume. So I pretty much I have a raw json log with no extractions.

4. Delimiters does NOT work because it extracts the field like this into my "interesting fields" when using a comma session.saml.last.attr.friendlyName.eduPersonAffiliation":"| student | member | staff | employee | alum |" This is not helpful at all because it is putting the field name and the data together as a field extracted . If I decided to change the delimiter to the colon it will then put the assigned data of the previous field name with an upcoming field name together. If I use quotation marks as the delimiter it creates 53 fields some of which are not needed because it creates a colon as a field and I cannot remove them.

5. All of the fields are always populate the problem that I am experiencing is when I use the regex method which I thought would be better because I can choose what fields to extract. If I want to extract the field name "session.user.ipgeolocation.continent":"NA" and rename it to "continent" well it only works on two of the logs even though it exists in all of the logs. I want all of my fields extracted to work on all of the logs not just the two logs.

gcusello · ‎08-13-2020

Hi @splunktrainingu,

network inputs are the inputs from syslog in TCP or UDP protocol that you can find in [Settings -- data Inputs -- TCP/UDP].

Anyway, I understood that you already have logs in files, so you don't need Network Data Inputs.

You don't have any extracted field for the format of your logs and because you don't have a Technical Add-On (specialized Apps to parse logs)At first see in splunk baseline (apps.splunk.com) if there's a TA already done.

If not, you have to create your parser extracting the fields you need.
to do this you can use regexes and/or the spath command

e.g. running this search:

your_search
| rex "[^\{]*(?<_raw>.*)"
| spath
| rex field="session.saml.last.attr.friendlyName.eduPersonAffiliation" "^\|\s+(?<member>[^\|]*)\|\s(?<staff>[^\|]*)\|\s(?<employee>[^\|]*)\|\s(?<alum>[^\|]*)\|\s(?<faculty>[^\|]*)\|"

Ciao.

Giuseppe

Syslog and field extraction(regex)

field extraction

syslog

Stay Connected: Your Guide to November Tech Talks, Office Hours, and Webinars!

Transform your security operations with Splunk Enterprise Security

Splunk Admins and App Developers | Earn a $35 gift card!