Getting Data In

How to split data into separate sourcetypes with transforms

Builder

Hello

I have a input that is monitoring a file. In this file theres data of multiple formats including timestamps, its bad, but I was thinking I could use a transform to set sourcetype in props that I could use to format data.
So I did this in inputs.conf:

[monitor:///var/log/this_log/*.ec]
index = main
sourcetype=momlog

then I created a transforms.conf

[momlog_json_sourcetype]
DEST_KEY = MetaData:Sourcetype
REGEX = \{\"msys\"
FORMAT = sourcetype::momlog:json


[momlog_basic_sourcetype]
DEST_KEY = MetaData:Sourcetype
REGEX = .*
FORMAT = sourcetype::momlog:basic

I also have a props that looks like

[momlog:basic]
TIME_FORMAT = %s
TIME_PREFIX = ^
LINE_BREAKER = ([\r\n]+)
TRANSFORMS-basic = momlog_basic_sourcetype

[momlog:json]
TIME_FORMAT = %s
TIME_PREFIX = "timestamp":"
INDEXED_EXTRACTIONS = JSON
TRANSFORMS-json = momlog_json_sourcetype

My question is this:
What would the regex be for the NON-JSON data? Do inputs and props look correct? Im testing locally so I can break things all day long.

thanks for the assistance

0 Karma
1 Solution

Motivator

Hi there @tkwaller

Try adding this to your props.conf

 [momlog]
 SHOULD_LINEMERGE=false
 NO_BINARY_CHECK=true
 TIME_PREFIX =\"timestamp\":\"
 TRANSFORMS-sourcetye_routing = momlog_basic_sourcetype, momlog_json_sourcetype

 [momlog:basic]
 TIME_FORMAT = %s
 TIME_PREFIX = ^
 LINE_BREAKER = ([\r\n]+)

 [momlog:json]
 TIME_FORMAT = %s
 TIME_PREFIX = \"timestamp\":\"
 INDEXED_EXTRACTIONS = JSON

EDITED: Added a few things on the main sourcetype and fixed TIME_PREFIX regex for momlog:json sourcetype.

View solution in original post

Motivator

Hi there @tkwaller

Try adding this to your props.conf

 [momlog]
 SHOULD_LINEMERGE=false
 NO_BINARY_CHECK=true
 TIME_PREFIX =\"timestamp\":\"
 TRANSFORMS-sourcetye_routing = momlog_basic_sourcetype, momlog_json_sourcetype

 [momlog:basic]
 TIME_FORMAT = %s
 TIME_PREFIX = ^
 LINE_BREAKER = ([\r\n]+)

 [momlog:json]
 TIME_FORMAT = %s
 TIME_PREFIX = \"timestamp\":\"
 INDEXED_EXTRACTIONS = JSON

EDITED: Added a few things on the main sourcetype and fixed TIME_PREFIX regex for momlog:json sourcetype.

View solution in original post

Builder

I added that but when I did it broke formatting, JSON isnt recognized and sourcetype is still momlog

0 Karma

Motivator

Please try the above to see if it works now that I've added a few more things.

0 Karma

Builder

Yes that was exactly it, Sourcetype now splits properly as well as formatting properly. Thanks everyone for the help!

0 Karma

Motivator

Glad it worked out, happy splunking!

0 Karma

Motivator

HI Guys
I used this and it worked thanks.

One small question. The JSON i have has characters before it, so i need to get rid of them before i can get into the 100% JSON, i have done the following - however it is taking the whole line in not just the JSON. Is there a way to get it to take in only the JSON?

Example - 2018-01-10 15:52:03 [metrics-application-1-thread-1] INFO METRIC:41 - {"v":"1.0","t":"MTR","ts":"2018-01-10T15:52:03.700Z","h":"mx7654vm","pid" ....etc..

Transform
[AMBER_RAW_json_METRIC]
DEST_KEY = MetaData:Sourcetype
REGEX = {"v":"1.0\"
FORMAT = sourcetype::AMBER_RAW:METRIC

Props
[AMBER_RAW:METRIC]
TIME_FORMAT = %Y-%m-%dT%H:%M:%S.%3N
TIME_PREFIX = \"ts\":\"
INDEXED_EXTRACTIONS = JSON

So it takes the full line, not just the JSON

Thanks in Advance:)

SplunkTrust
SplunkTrust

Clearly, something is wrong with the props TIME_PREFIX not having a closed quote.

I would expect that anything that doesn't match the json would therefore be non-json, so you would just use .*

0 Karma

Esteemed Legend

I would escape all 3 double-quotes (can't hurt).

0 Karma

Builder

Ok I updated the original post with new testing configs. Everything is working EXCEPT sourcetyping. ITs not breaking out the sourcetypes, it just uses the one set in input BUT if I remove that it uses the "too_small" sourcetype. What am I missing? Has to be something simple

Thanks again!

0 Karma

Esteemed Legend

How can we possibly know what REGEX will work if you do not post sample data? In any case, the PaloAlto TA does this so you can download that app and check it all out. It gets stuff from syslog that is supposed to come in as sourcetype=pan:log and then it splits it out into 5 or 6 different sourcetypes based on RegEx patterns, just like what you are doing.

0 Karma

Builder

Well, really, all it has to do is match anything that isnt JSON format, meaning anything that ISNT

TIME_PREFIX = "timestamp":"

which is why I didnt add the data samples. I can take a look at the app but I dont think it should really be that difficult.

but JUST IN CASE
(this is actually data from several files)

1503626401@N@/tmp/12354@@user@1
1503664701@@@@M1
1503664761@@@@M1
1503664821@@@@M1
1503664881@@@@M1
1503664941@@@@M1
1503665001@@@@M1
1503665061@@@@M1
1503665121@@@@M1
1503665181@@@@M1
1503665241@@@@M1
1503665301@@@@M1
1503665361@@@@M1
1503665421@@@@M1
1503665481@@@@M1
{"msys":{"message_event":{"origination":"unauthorized_attempt","conn_name":"stuff","recv_method":"esmtp","remote_addr":"10.0.0.0:12345","raw_reason":"500 5.5.2 unrecognized command","node_name":"host@domain.com","scope_name":"scriptlet","pathway_group":"default","error_code":"500","msg_proc_state":"awaiting mailfrom","tenant_id":"__unauthorized__","reason":"500 5.5.2 unrecognized command","pathway":"default","local_addr":"10.0.0.0:12345","timestamp":"1503524959","customer_id":"0","event_id":"1234512354","type":"rejection"}}}
{"msys":{"message_event":{"timestamp":"1503527383","customer_id":"1","msg_proc_state":"awaiting mailfrom","pathway_group":"default","remote_addr":"10.0.0.0:12345","raw_reason":"500 5.5.2 unrecognized command","conn_name":"11/22-12345-1D10E111","event_id":"1234512345","reason":"500 5.5.2 unrecognized command","tenant_id":"__unauthorized__","type":"rejection","error_code":"500","local_addr":"10.0.0.0:12345","recv_method":"esmtp","node_name":"host.domain.com","origination":"unauthorized_attempt","pathway":"default","scope_name":"scriptlet"}}}
{"msys":{"track_event":{"rcpt_to":"user@domain.com","type":"open","rcpt_meta":{ "userMessageId": "123456789" },"campaign_id":"test_campaign","node_name":"host.domain.com","ip_address":"10.0.0.0:12345","customer_id":"1","template_id":"template_1234512345","transmission_id":"1234512345","event_id":"12345122345","user_agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/601.7.8 (KHTML, like Gecko)","message_id":"000074029e597f538c00","accept_language":"en-us","rcpt_tags":[ "testTag" ],"delv_method":"esmtp","template_version":"0","timestamp":"1503527606"}}}
1503676342@@@@M1
1503676402@@@@M1
1503676462@@@@M1
1503676522@@@@M1
1503676582@@@@M1
1503676642@@@@M1
1503676702@@@@M1
1503676402: Marker 1
1503676462: Marker 1
1503676522: Marker 1
1503676582: Marker 1
1503676642: Marker 1
1503676702: Marker 1
1503676762: Marker 1
0 Karma

Builder

Timestamps are correct. Why would the time prefix need a closed quote, its the prefix of the epoch timestamp.

I tried the .* to match but my config must still be incorrect in the props or inputs as I got ONE of the JSON logs and non of the sourcetyping was correct.

I tried several different variations of inputs and props, just not quite right yet. Close though.

I updated the original post to reflect all changes made

0 Karma
State of Splunk Careers

Access the Splunk Careers Report to see real data that shows how Splunk mastery increases your value and job satisfaction.

Find out what your skills are worth!