I have an annoying log that I am trying to extract data from and I am lost and don't know where to go from here. What I am trying to extract is as follows
2020-10-02 17:01:32,360 INFO: externalUser: false
|
The first line is the current date (i.e. 2020-10-02 17:01:32,360 INFO: ) and this would used for my indexed time. Between this user event and the next user event, the log is interspersed with the following garbage
2020-10-02 16:59:36,409 ERROR:
at com.sun.mail.rcptTo(SMTPTransport.java:1862) at com.sun.mail.smtp.SMTPTransport.rcptTo(SMTPTransport.java:1715) 2020-10-02 16:59:36,409 ERROR: |
I started with adding data in and then using the Advanced configuration to try and break this up starting with BREAK_ONLY_BEFORE_DATE set as true and this starts to break the log but then (as expected) breaks at every date. So the log then breaks up at every field that has a date (e.g. lastSignInDate, dateCreated, etc.). The problem here is that the timestamp then gets impacted as it will read the time properly and my indexing for that specific break with be all over the place instead of the first time (i.e. 2020-10-02 17:01:32)
What I would like to do is capture everything between "2020-10-02 17:01:32,360 INFO:" and "ipAddress: 10.1.1.1" (using the example above).
The log is a rolling log so it is constantly being written to. I would also like to get rid of the garbage but have not tried doing NULLs to remove events before ingest.
There is no recognised sourcetype nor does the product have any TA's in SPLUNK Base so I am trying to effectively create a new TA for this data source.
Thankyou for any assistance.
Hi @willadams,
good for you that you solved you problem.
about the new problem it should be better to open a new case, but anyway, let me understand you new question:
you want to get only the events containing the word "INFO", is it correct?
and then I don't understand if you want to add another filter to exclude some other events or to delete a part of the INFO events.
If you want to exclude other events (e.g. the ones containing "CleanupProcess"), you could add another rule to the props and transforms, something like this:
in props.conf, add
TRANSFORMS-set = setnull,kept_logs,add_filter
in transfroms.conf, add
[add_filter]
REGEX = CleanupProcess
DEST_KEY = queue
FORMAT = nullQueue
If instead you want to delete a part of the INFO events, you have to use the SEDCMD option in props.conf: e.g. to delete the part of events containing "lastPasswordResetDate: 2019-08-20 5:06:00.856", "dateLastUpdated: 2020-07-20 16:49:30.409", "signupCompletedDate: 2019-07-03 14:24:52.389", "lastSignInDate: 2020-10-01 19:04:21.787", you could use in props.conf:
SEDCMD-mask_events = s/\"lastPasswordResetDate: 2019-08-20 5:06:00.856\", \"dateLastUpdated: 2020-07-20 16:49:30.409\", \"signupCompletedDate: 2019-07-03 14:24:52.389\", \"lastSignInDate: 2020-10-01 19:04:21.787\"//g
Obviously the regex in SEDCMD has to ve verified.
At the end you speak of extract field, remember tha the field extraction is done after filtering, so you cannot filter or delete part of events after indexing.
Ciao.
Giuseppe
I have been able to filter some of the events and at least it looks like I am going in the right direction. Adding back for the question, this is what I have done so far:
PROPS
[silly_logs]
LINE_BREAKER = ([\r\n]+)
SHOULD_LINEMERGE = true
NO_BINARY_CHECK = true
TIME_PREFIX = ^
TIME_FORMAT = %Y-%m-%d %H:%M:%S
MAX_TIMESTAMP_LOOKAHEAD = 20
TRANSFORMS-set = setnull,kept_logs
TRANSFORMS
[setnull]
REGEX = .
DEST_KEY = queue
FORMAT = nullQueue
[kept_logs]
REGEX = ^.+INFO:
DEST_KEY = queue
FORMAT = silly_index
This has been able to get rid of all the stuff I don't want and just get the INFO logs. The biggest problem I have now is to try and remove other INFO fields that are not useful and also do some DELIM's. I tried adding a FIELD_DELIMITER=: to PROPS but this didn't seem to do anything. I also tried adding to props a "REPORT-extract=myextract" and the associated transforms stanza (i.e. [myextract] DELIMS=:
This didn't work and I am stuck. My log now shows as follows
2020-10-02 17:01:32,360 INFO: externalUser: false |
As well as
2020-10-02 17:06:48,123 INFO: Helper.word(): Purging range: (123456, 123654) |
And
2020-10-02 17:09:48,123 INFO: Helper.loadObjects(): Username does not exist. mystique |
2020-10-02 18:01:48,546 INFO: CleanupProcess.executeHelper(): Running cleanup process for Silly 1.2.3.4000 ... |
I want to be able to adjust my PROPS to remove the items with "CleanUpProcess" or "Purging Range" but keep the valid data as well as the "Helper.loadObjects(): Username does not exist..." values. I also want to be able to extrac the fields from the event based on ":" but also going back to the main log ignore the other fields that contain dates in them (i.e. "lastPasswordResetDate: 2019-08-20 5:06:00.856", "dateLastUpdated: 2020-07-20 16:49:30.409", "signupCompletedDate: 2019-07-03 14:24:52.389", "lastSignInDate: 2020-10-01 19:04:21.787".
I suspect I would need to extract the date fields specifically (maybe using REX) and maybe strptime them to get around the ":" delim problem that this may cause (once the DELIM is sorted).
Any help appreciated
Hi @willadams,
good for you that you solved you problem.
about the new problem it should be better to open a new case, but anyway, let me understand you new question:
you want to get only the events containing the word "INFO", is it correct?
and then I don't understand if you want to add another filter to exclude some other events or to delete a part of the INFO events.
If you want to exclude other events (e.g. the ones containing "CleanupProcess"), you could add another rule to the props and transforms, something like this:
in props.conf, add
TRANSFORMS-set = setnull,kept_logs,add_filter
in transfroms.conf, add
[add_filter]
REGEX = CleanupProcess
DEST_KEY = queue
FORMAT = nullQueue
If instead you want to delete a part of the INFO events, you have to use the SEDCMD option in props.conf: e.g. to delete the part of events containing "lastPasswordResetDate: 2019-08-20 5:06:00.856", "dateLastUpdated: 2020-07-20 16:49:30.409", "signupCompletedDate: 2019-07-03 14:24:52.389", "lastSignInDate: 2020-10-01 19:04:21.787", you could use in props.conf:
SEDCMD-mask_events = s/\"lastPasswordResetDate: 2019-08-20 5:06:00.856\", \"dateLastUpdated: 2020-07-20 16:49:30.409\", \"signupCompletedDate: 2019-07-03 14:24:52.389\", \"lastSignInDate: 2020-10-01 19:04:21.787\"//g
Obviously the regex in SEDCMD has to ve verified.
At the end you speak of extract field, remember tha the field extraction is done after filtering, so you cannot filter or delete part of events after indexing.
Ciao.
Giuseppe
Hi @willadams,
good for your that you solved your problems.
ciao and happy splunking.
Giuseppe
P.S.: Karma Points are appreciated 😉
Thanks @gcusello. I just have to fiddle with seperate nulls but almost there. If need be will log another community question if need be.
Hi @gcusello
Thanks it was a good puzzle to solve. The exclusion was as per the latter comments (exclude other events like Cleanup process). It didn't occur to me to just re-use the null with the REGEX to remove that content. I will give it another crack and see how that goes. Thanks!
Hi @willadams,
let me know if you need other help.
Anyway, if the answer solves your initial need, please accept it for the other people of Community.
Ciao and good splunking.
Giuseppe
P.S.: Karma Points are appreciated 😉
Hi @willadams,
there are many dates in your log so you cannot use BREAK_ONLY_BEFORE_DATE, so try to identify your timestamp using in your props.conf
TIME_PREFIX = ^
Ciao.
Giuseppe