Hello
I have a input that is monitoring a file. In this file theres data of multiple formats including timestamps, its bad, but I was thinking I could use a transform to set sourcetype in props that I could use to format data.
So I did this in inputs.conf:
[monitor:///var/log/this_log/*.ec]
index = main
sourcetype=momlog
then I created a transforms.conf
[momlog_json_sourcetype]
DEST_KEY = MetaData:Sourcetype
REGEX = \{\"msys\"
FORMAT = sourcetype::momlog:json
[momlog_basic_sourcetype]
DEST_KEY = MetaData:Sourcetype
REGEX = .*
FORMAT = sourcetype::momlog:basic
I also have a props that looks like
[momlog:basic]
TIME_FORMAT = %s
TIME_PREFIX = ^
LINE_BREAKER = ([\r\n]+)
TRANSFORMS-basic = momlog_basic_sourcetype
[momlog:json]
TIME_FORMAT = %s
TIME_PREFIX = "timestamp":"
INDEXED_EXTRACTIONS = JSON
TRANSFORMS-json = momlog_json_sourcetype
My question is this:
What would the regex be for the NON-JSON data? Do inputs and props look correct? Im testing locally so I can break things all day long.
thanks for the assistance
Hi there @tkwaller
Try adding this to your props.conf
[momlog]
SHOULD_LINEMERGE=false
NO_BINARY_CHECK=true
TIME_PREFIX =\"timestamp\":\"
TRANSFORMS-sourcetye_routing = momlog_basic_sourcetype, momlog_json_sourcetype
[momlog:basic]
TIME_FORMAT = %s
TIME_PREFIX = ^
LINE_BREAKER = ([\r\n]+)
[momlog:json]
TIME_FORMAT = %s
TIME_PREFIX = \"timestamp\":\"
INDEXED_EXTRACTIONS = JSON
EDITED: Added a few things on the main sourcetype and fixed TIME_PREFIX regex for momlog:json sourcetype.
Hi there @tkwaller
Try adding this to your props.conf
[momlog]
SHOULD_LINEMERGE=false
NO_BINARY_CHECK=true
TIME_PREFIX =\"timestamp\":\"
TRANSFORMS-sourcetye_routing = momlog_basic_sourcetype, momlog_json_sourcetype
[momlog:basic]
TIME_FORMAT = %s
TIME_PREFIX = ^
LINE_BREAKER = ([\r\n]+)
[momlog:json]
TIME_FORMAT = %s
TIME_PREFIX = \"timestamp\":\"
INDEXED_EXTRACTIONS = JSON
EDITED: Added a few things on the main sourcetype and fixed TIME_PREFIX regex for momlog:json sourcetype.
I added that but when I did it broke formatting, JSON isnt recognized and sourcetype is still momlog
Please try the above to see if it works now that I've added a few more things.
Yes that was exactly it, Sourcetype now splits properly as well as formatting properly. Thanks everyone for the help!
Glad it worked out, happy splunking!
HI Guys
I used this and it worked thanks.
One small question. The JSON i have has characters before it, so i need to get rid of them before i can get into the 100% JSON, i have done the following - however it is taking the whole line in not just the JSON. Is there a way to get it to take in only the JSON?
Example - 2018-01-10 15:52:03 [metrics-application-1-thread-1] INFO METRIC:41 - {"v":"1.0","t":"MTR","ts":"2018-01-10T15:52:03.700Z","h":"mx7654vm","pid" ....etc..
Transform
[AMBER_RAW_json_METRIC]
DEST_KEY = MetaData:Sourcetype
REGEX = {"v":"1.0\"
FORMAT = sourcetype::AMBER_RAW:METRIC
Props
[AMBER_RAW:METRIC]
TIME_FORMAT = %Y-%m-%dT%H:%M:%S.%3N
TIME_PREFIX = \"ts\":\"
INDEXED_EXTRACTIONS = JSON
So it takes the full line, not just the JSON
Thanks in Advance:)
Clearly, something is wrong with the props TIME_PREFIX not having a closed quote.
I would expect that anything that doesn't match the json would therefore be non-json, so you would just use .*
I would escape all 3 double-quotes (can't hurt).
Ok I updated the original post with new testing configs. Everything is working EXCEPT sourcetyping. ITs not breaking out the sourcetypes, it just uses the one set in input BUT if I remove that it uses the "too_small" sourcetype. What am I missing? Has to be something simple
Thanks again!
How can we possibly know what REGEX will work if you do not post sample data? In any case, the PaloAlto TA does this so you can download that app and check it all out. It gets stuff from syslog that is supposed to come in as sourcetype=pan:log
and then it splits it out into 5 or 6 different sourcetypes based on RegEx patterns, just like what you are doing.
Well, really, all it has to do is match anything that isnt JSON format, meaning anything that ISNT
TIME_PREFIX = "timestamp":"
which is why I didnt add the data samples. I can take a look at the app but I dont think it should really be that difficult.
but JUST IN CASE
(this is actually data from several files)
1503626401@N@/tmp/12354@@user@1
1503664701@@@@M1
1503664761@@@@M1
1503664821@@@@M1
1503664881@@@@M1
1503664941@@@@M1
1503665001@@@@M1
1503665061@@@@M1
1503665121@@@@M1
1503665181@@@@M1
1503665241@@@@M1
1503665301@@@@M1
1503665361@@@@M1
1503665421@@@@M1
1503665481@@@@M1
{"msys":{"message_event":{"origination":"unauthorized_attempt","conn_name":"stuff","recv_method":"esmtp","remote_addr":"10.0.0.0:12345","raw_reason":"500 5.5.2 unrecognized command","node_name":"host@domain.com","scope_name":"scriptlet","pathway_group":"default","error_code":"500","msg_proc_state":"awaiting mailfrom","tenant_id":"__unauthorized__","reason":"500 5.5.2 unrecognized command","pathway":"default","local_addr":"10.0.0.0:12345","timestamp":"1503524959","customer_id":"0","event_id":"1234512354","type":"rejection"}}}
{"msys":{"message_event":{"timestamp":"1503527383","customer_id":"1","msg_proc_state":"awaiting mailfrom","pathway_group":"default","remote_addr":"10.0.0.0:12345","raw_reason":"500 5.5.2 unrecognized command","conn_name":"11/22-12345-1D10E111","event_id":"1234512345","reason":"500 5.5.2 unrecognized command","tenant_id":"__unauthorized__","type":"rejection","error_code":"500","local_addr":"10.0.0.0:12345","recv_method":"esmtp","node_name":"host.domain.com","origination":"unauthorized_attempt","pathway":"default","scope_name":"scriptlet"}}}
{"msys":{"track_event":{"rcpt_to":"user@domain.com","type":"open","rcpt_meta":{ "userMessageId": "123456789" },"campaign_id":"test_campaign","node_name":"host.domain.com","ip_address":"10.0.0.0:12345","customer_id":"1","template_id":"template_1234512345","transmission_id":"1234512345","event_id":"12345122345","user_agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/601.7.8 (KHTML, like Gecko)","message_id":"000074029e597f538c00","accept_language":"en-us","rcpt_tags":[ "testTag" ],"delv_method":"esmtp","template_version":"0","timestamp":"1503527606"}}}
1503676342@@@@M1
1503676402@@@@M1
1503676462@@@@M1
1503676522@@@@M1
1503676582@@@@M1
1503676642@@@@M1
1503676702@@@@M1
1503676402: Marker 1
1503676462: Marker 1
1503676522: Marker 1
1503676582: Marker 1
1503676642: Marker 1
1503676702: Marker 1
1503676762: Marker 1
Timestamps are correct. Why would the time prefix need a closed quote, its the prefix of the epoch timestamp.
I tried the .*
to match but my config must still be incorrect in the props or inputs as I got ONE of the JSON logs and non of the sourcetyping was correct.
I tried several different variations of inputs and props, just not quite right yet. Close though.
I updated the original post to reflect all changes made