Getting Data In
Highlighted

How to split data into separate sourcetypes with transforms

Builder

Hello

I have a input that is monitoring a file. In this file theres data of multiple formats including timestamps, its bad, but I was thinking I could use a transform to set sourcetype in props that I could use to format data.
So I did this in inputs.conf:

[monitor:///var/log/this_log/*.ec]
index = main
sourcetype=momlog

then I created a transforms.conf

[momlog_json_sourcetype]
DEST_KEY = MetaData:Sourcetype
REGEX = \{\"msys\"
FORMAT = sourcetype::momlog:json


[momlog_basic_sourcetype]
DEST_KEY = MetaData:Sourcetype
REGEX = .*
FORMAT = sourcetype::momlog:basic

I also have a props that looks like

[momlog:basic]
TIME_FORMAT = %s
TIME_PREFIX = ^
LINE_BREAKER = ([\r\n]+)
TRANSFORMS-basic = momlog_basic_sourcetype

[momlog:json]
TIME_FORMAT = %s
TIME_PREFIX = "timestamp":"
INDEXED_EXTRACTIONS = JSON
TRANSFORMS-json = momlog_json_sourcetype

My question is this:
What would the regex be for the NON-JSON data? Do inputs and props look correct? Im testing locally so I can break things all day long.

thanks for the assistance

0 Karma
Highlighted

Re: How to split data into separate sourcetypes with transforms

Esteemed Legend

How can we possibly know what REGEX will work if you do not post sample data? In any case, the PaloAlto TA does this so you can download that app and check it all out. It gets stuff from syslog that is supposed to come in as sourcetype=pan:log and then it splits it out into 5 or 6 different sourcetypes based on RegEx patterns, just like what you are doing.

0 Karma
Highlighted

Re: How to split data into separate sourcetypes with transforms

Builder

Well, really, all it has to do is match anything that isnt JSON format, meaning anything that ISNT

TIME_PREFIX = "timestamp":"

which is why I didnt add the data samples. I can take a look at the app but I dont think it should really be that difficult.

but JUST IN CASE
(this is actually data from several files)

1503626401@N@/tmp/12354@@user@1
1503664701@@@@M1
1503664761@@@@M1
1503664821@@@@M1
1503664881@@@@M1
1503664941@@@@M1
1503665001@@@@M1
1503665061@@@@M1
1503665121@@@@M1
1503665181@@@@M1
1503665241@@@@M1
1503665301@@@@M1
1503665361@@@@M1
1503665421@@@@M1
1503665481@@@@M1
{"msys":{"message_event":{"origination":"unauthorized_attempt","conn_name":"stuff","recv_method":"esmtp","remote_addr":"10.0.0.0:12345","raw_reason":"500 5.5.2 unrecognized command","node_name":"host@domain.com","scope_name":"scriptlet","pathway_group":"default","error_code":"500","msg_proc_state":"awaiting mailfrom","tenant_id":"__unauthorized__","reason":"500 5.5.2 unrecognized command","pathway":"default","local_addr":"10.0.0.0:12345","timestamp":"1503524959","customer_id":"0","event_id":"1234512354","type":"rejection"}}}
{"msys":{"message_event":{"timestamp":"1503527383","customer_id":"1","msg_proc_state":"awaiting mailfrom","pathway_group":"default","remote_addr":"10.0.0.0:12345","raw_reason":"500 5.5.2 unrecognized command","conn_name":"11/22-12345-1D10E111","event_id":"1234512345","reason":"500 5.5.2 unrecognized command","tenant_id":"__unauthorized__","type":"rejection","error_code":"500","local_addr":"10.0.0.0:12345","recv_method":"esmtp","node_name":"host.domain.com","origination":"unauthorized_attempt","pathway":"default","scope_name":"scriptlet"}}}
{"msys":{"track_event":{"rcpt_to":"user@domain.com","type":"open","rcpt_meta":{ "userMessageId": "123456789" },"campaign_id":"test_campaign","node_name":"host.domain.com","ip_address":"10.0.0.0:12345","customer_id":"1","template_id":"template_1234512345","transmission_id":"1234512345","event_id":"12345122345","user_agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/601.7.8 (KHTML, like Gecko)","message_id":"000074029e597f538c00","accept_language":"en-us","rcpt_tags":[ "testTag" ],"delv_method":"esmtp","template_version":"0","timestamp":"1503527606"}}}
1503676342@@@@M1
1503676402@@@@M1
1503676462@@@@M1
1503676522@@@@M1
1503676582@@@@M1
1503676642@@@@M1
1503676702@@@@M1
1503676402: Marker 1
1503676462: Marker 1
1503676522: Marker 1
1503676582: Marker 1
1503676642: Marker 1
1503676702: Marker 1
1503676762: Marker 1
0 Karma
Highlighted

Re: How to split data into separate sourcetypes with transforms

Builder

Timestamps are correct. Why would the time prefix need a closed quote, its the prefix of the epoch timestamp.

I tried the .* to match but my config must still be incorrect in the props or inputs as I got ONE of the JSON logs and non of the sourcetyping was correct.

I tried several different variations of inputs and props, just not quite right yet. Close though.

I updated the original post to reflect all changes made

0 Karma
Highlighted

Re: How to split data into separate sourcetypes with transforms

SplunkTrust
SplunkTrust

Clearly, something is wrong with the props TIME_PREFIX not having a closed quote.

I would expect that anything that doesn't match the json would therefore be non-json, so you would just use .*

0 Karma
Highlighted

Re: How to split data into separate sourcetypes with transforms

Esteemed Legend

I would escape all 3 double-quotes (can't hurt).

0 Karma
Highlighted

Re: How to split data into separate sourcetypes with transforms

Builder

Ok I updated the original post with new testing configs. Everything is working EXCEPT sourcetyping. ITs not breaking out the sourcetypes, it just uses the one set in input BUT if I remove that it uses the "too_small" sourcetype. What am I missing? Has to be something simple

Thanks again!

0 Karma
Highlighted

Re: How to split data into separate sourcetypes with transforms

Motivator

Hi there @tkwaller

Try adding this to your props.conf

 [momlog]
 SHOULD_LINEMERGE=false
 NO_BINARY_CHECK=true
 TIME_PREFIX =\"timestamp\":\"
 TRANSFORMS-sourcetye_routing = momlog_basic_sourcetype, momlog_json_sourcetype

 [momlog:basic]
 TIME_FORMAT = %s
 TIME_PREFIX = ^
 LINE_BREAKER = ([\r\n]+)

 [momlog:json]
 TIME_FORMAT = %s
 TIME_PREFIX = \"timestamp\":\"
 INDEXED_EXTRACTIONS = JSON

EDITED: Added a few things on the main sourcetype and fixed TIME_PREFIX regex for momlog:json sourcetype.

View solution in original post

Highlighted

Re: How to split data into separate sourcetypes with transforms

Builder

I added that but when I did it broke formatting, JSON isnt recognized and sourcetype is still momlog

0 Karma
Highlighted

Re: How to split data into separate sourcetypes with transforms

Motivator

Please try the above to see if it works now that I've added a few more things.

0 Karma