Splunk Search

Regex to move three values into sourcetype field with transforms.conf

pbugeja
New Member

Hi,

I am very new with Regex and have been struggling with simple task.

I need to change three values (Health, AuditTrail, Security) in a field called type into individual sourcetypes.

Any assistance would be greatly appreciated.

Thanks, Paul

Tags (1)
0 Karma
1 Solution

cpetterborg
SplunkTrust
SplunkTrust

Just as a note here:

Best practice would be to use a syslog server, like rsyslog or syslog-ng. Then pass the data to the indexers either by using an HTTP Event Collector or a UF or HF. It is harder to loose UDP data that way. Any restart of the Splunk (or syslog service, too) processing will result in a loss of data until the service comes back up. The UF and HF will take many times longer to restore the reception of the data.

If the amount of data coming in is not significant, then perhaps that doesn't matter, but I have one syslog server getting about 800GB/day of syslog data and it is working great (rsyslog -> nginx for load balancing -> indexers with HEC). You can get almost that with a UF alone, but you can't do any kind of parsing of that data to help you out, like separating data to different indexes. If you use an HF, then you will get about a third of that volume. But again, when you restart your Splunk process, you will loose more data than with a syslog server. I use rsyslog, and it's down less than a second, but when we used a UF, it took more than a minute, all the while dropping those UDP packets into the bit bucket.

It is also possible to sourcetype the data at the syslog level, which puts less strain on your indexers.

Something to think about while you are implementing your solution.

View solution in original post

0 Karma

pbugeja
New Member

currently the following rex command is creating new sourcetypes, but I still need assistance the props or transforms conf files.

index="cato_dev" source=CATOLOG.TXT | rex field=type "(?.*)"

adds the Health, AuditTrail and Security sourcetypes

props.conf
[cato:logs]
CHARSET = UTF-8
SHOULD_LINEMERGE = True
KV_MODE = json
TRANSFORMS-changesourcetype = cato:logs

[source::/opt/splunk/etc/apps/Cato_Input/CATOLOG.TXT]
TRANSFORMS-changesourcetype = cato:logs

transforms.conf
[cato:logs]
REGEX field=type "(?.*)"
FORMAT = sourcetype::cato:logs
DEST_KEY = MetaData:Sourcetype

0 Karma

cpetterborg
SplunkTrust
SplunkTrust

Just as a note here:

Best practice would be to use a syslog server, like rsyslog or syslog-ng. Then pass the data to the indexers either by using an HTTP Event Collector or a UF or HF. It is harder to loose UDP data that way. Any restart of the Splunk (or syslog service, too) processing will result in a loss of data until the service comes back up. The UF and HF will take many times longer to restore the reception of the data.

If the amount of data coming in is not significant, then perhaps that doesn't matter, but I have one syslog server getting about 800GB/day of syslog data and it is working great (rsyslog -> nginx for load balancing -> indexers with HEC). You can get almost that with a UF alone, but you can't do any kind of parsing of that data to help you out, like separating data to different indexes. If you use an HF, then you will get about a third of that volume. But again, when you restart your Splunk process, you will loose more data than with a syslog server. I use rsyslog, and it's down less than a second, but when we used a UF, it took more than a minute, all the while dropping those UDP packets into the bit bucket.

It is also possible to sourcetype the data at the syslog level, which puts less strain on your indexers.

Something to think about while you are implementing your solution.

0 Karma

pbugeja
New Member

Currently I am able to create the new sourcetypes with the following rex command, but still have a problem with either the props or transforms conf file. I guess there is a syntax issue.

This works > index="cato_dev" | rex field=type "(?.*)"

sourcetypes:Health;AuditTrail;Security

transform.conf
[cato:logs]
REGEX field=type "(?.*)"
FORMAT = sourcetype::cato:logs
DEST_KEY = MetaData:Sourcetype

props.conf
[cato:logs]
CHARSET = UTF-8
SHOULD_LINEMERGE = True
KV_MODE = json
TRANSFORMS-changesourcetype = cato:logs

[source::/opt/splunk/etc/apps/Cato_Input/CATOLOG.TXT]
TRANSFORMS-changesourcetype = cato:logs

0 Karma

cpetterborg
SplunkTrust
SplunkTrust

Try:

FORMAT = sourcetype::$1
0 Karma

pbugeja
New Member

Hi cpetterborg,
I unfortunately started with "FORMAT = sourcetype::$1" with the same effect.

0 Karma

cpetterborg
SplunkTrust
SplunkTrust

Okay, here we go. Let's try this:

props.conf:

CHARSET = UTF-8
SHOULD_LINEMERGE = True
# If your data is JSON, then the transforms.conf file will need a different REGEX statement the you were using
KV_MODE = json
TRANSFORMS-changesourcetype = cato:logs

transforms.conf:

[cato:logs]
# From what I could understand, you gave the explanation of the JSON data with the field that looked like:
#     "type"="value"
# so the regex might be your problem. The following will put the value in the capture group:
REGEX = \"type\"=\"(\w+)\"
# which would make it find the field value from the JSON string
#
# this FORMAT really should say $1, which is the capture group designation for what is matched in the parens:
FORMAT = sourcetype::$1
DEST_KEY = MetaData:Sourcetype

If I'm wrong about the actual format (you didn't give an actual example of the data), then please provide an example of the data. That will make it much easier to help you. I'm going just from the fact that you provided a rex command which looks like it might work. Hopefully I got it right, and didn't make any typos. I think I got it right, though.

0 Karma

pbugeja
New Member

This is my command minus the two astericks before and after the search command

index="cato_dev" | rex field=type (?.*)

0 Karma

pbugeja
New Member

sorry but my regex command is being edited.

0 Karma

pbugeja
New Member

I tried your regex and unfortunately did not work.

My rex command did add my the new sourcetypes as a search.
index="cato_dev" | rex field=type (?.*)
with the above it added sourcetype: Health, AuditTrail, Security

0 Karma

cpetterborg
SplunkTrust
SplunkTrust

I'm going to refrain from making any more suggestions without seeing example data that is at least representative of your data. In other words, data that may be scrubbed of sensitive information, but it must be completely representative of your data. And more than just a single line. It's very hard to give help in a vacuum.

0 Karma

pbugeja
New Member

{"connection_destination_type":"CATO","severity":"ALERT","sourceCountry":"United States","creationTime":"Jun 12, 2017 9:55:11 PM","tunnel_client_version":1,"accountName":"OpenTunnel","tunnel_type_name":"DTLS","count":1,"start":1497304511000,"type":"Health","tunnel_creation_time":1497304510,"popName":"Chicago","health_rule_id":368,"sourceIp":"71.16.39.186","prettyType":"Changed Pop","tunnel_connection_reason":"Switched to pop Chicago from pop Guadalajara","network_interface_name":"","end":1497304511000,"subType":"Changed Pop","sourceName":"Biggy Small","events":[{"time":"Jun 12, 2017 9:55:10 PM","sourceType":"ACCESSLOGIN","sourceIP":"71.16.39.186","popName":"Chicago","interfaceName":"","status":"Connected","color":"#76B21C"}],"siteType":"ACCESSLOGIN"}

0 Karma

cpetterborg
SplunkTrust
SplunkTrust

This configuration worked on my system, so hopefully it will work for you. But you will have to modify the data for your actual setup. These are just taken from my system and I didn't make more than a couple of mods to the default that I got from doing a standard "Add Data" from the launcher. When I did it with my modifications, it set the sourcetype to Health, which is what you want to do. So it is all simplified as much as possible from the default of the data config.

I made a mistake on my REGEX previously, because I stupidly set a = instead of a : in the REGEX. Sorry for that dumb mistake. No excuses for me.

props.conf

[testdata]
DATETIME_CONFIG =
NO_BINARY_CHECK = true
category = Custom
pulldown_type = true
KV_MODE = JSON
TRANSFORMS-changesourcetype = catologs

transforms.conf:

[catologs]
REGEX = "type":"(\w+)"
FORMAT = sourcetype::$1
DEST_KEY = MetaData:Sourcetype
0 Karma

pbugeja
New Member

cpetterborg cheers to you Sir.

You have helped me and I am eternally gratefully for you assistance.

0 Karma

cpetterborg
SplunkTrust
SplunkTrust

If this worked for you, please accept this answer above so that others can see that it has been a valid answer when they are searching.

Thanks!

0 Karma

pbugeja
New Member

I appreciate your recommendation, but I have been tasked with the segmenting of the different logs with individual sourcetypes as the data is combined from the cloud firewall.

0 Karma

cpetterborg
SplunkTrust
SplunkTrust

Note in the above answer:

It is also possible to sourcetype the data at the syslog level, which puts less strain on your indexers.

Which is what you said that you were tasked with doing.

0 Karma

somesoni2
Revered Legend

We'd need sample events and mock output to help you accurately. Also, assuming this needs to be done at index-time. Correct me if I'm wrong.

0 Karma

pbugeja
New Member

Appreciate you responding to my question.

I would like to do query my data at search time.

field=type
value=Health,AuditTrail,Security

Need to create new sourcetypes with each value: sourcetype=Health;sourcetype=AuditTrail;sourcetype=Security

Thanks, Paul

0 Karma

DMohn
Motivator

I'm afraid this is not possible. The sourcetype of an event is a indexed field, and cannot be changed during search time.

Can you please elaborate why you want to have different sourcetypes here? Maybe there is a easy solution for your problem.

0 Karma

pbugeja
New Member

The reason for the new sourcetypes , each of the values represent a different log source as I am parsing data from a firewall.

0 Karma
Get Updates on the Splunk Community!

Stay Connected: Your Guide to July and August Tech Talks, Office Hours, and Webinars!

Dive into our sizzling summer lineup for July and August Community Office Hours and Tech Talks. Scroll down to ...

Edge Processor Scaling, Energy & Manufacturing Use Cases, and More New Articles on ...

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...

Get More Out of Your Security Practice With a SIEM

Get More Out of Your Security Practice With a SIEMWednesday, July 31, 2024  |  11AM PT / 2PM ETREGISTER ...