Getting Data In

Assistance with Extraction (PROPS / TRANSFORMS)

willadams
Contributor

I have an annoying log that I am trying to extract data from and I am lost and don't know where to go from here.  What I am trying to extract is as follows

 

2020-10-02 17:01:32,360 INFO:
User.val (value, value2, value3, value4): User not found. Parameters: userId: 1; requester: userVO: userId: 66666
status: V
username: joe.blogs@someplace.com
authenticationMethod: PASSWORD
emailAddress: joe.blogs@someplace.com
firstName: Joe
middleName:
lastName: Bloggs
displayName: Joe Blogs
createdBy: 123456
dateCreated: 2019-07-02 17:17:29.68
lastUpdatedBy: 66666
dateLastUpdated: 2020-07-20 16:49:30.409
signupCompletedDate: 2019-07-03 14:24:52.389
lastSignInDate: 2020-10-01 19:04:21.787
title: Person
company: Somewhere
addressLine1: 1 This Street
addressLine2:
city: Somewhere
state: ST1
zipCode: 1234
country: ThatCountry
workPhoneNumber:
homePhoneNumber: +001122334455
mobilePhoneNumber:
otherPhoneNumber:
faxNumber:
secretQuestions: []
signInLocked: false
signInFailureCount: 0
signInTotalFailureCount: 0
signInLastFailureDate: <null>
resetPasswordFailureCount: 0
resetPasswordTotalFailureCount: 0
resetPasswordLastFailureDate: <null>
recipientInclusionList:
recipientExclusionList:
allowSMTPInput: false
lastPasswordResetDate: 2019-08-20 15:06:00.856
passwordExpires: true
forcePasswordReset: false
externalUser: false
lastSignInUserName: joe.blogs@someplace.com
lastSignInDomain:
activationCode:
expiryDate: <null>
expiredOn: <null>
lastActivityDate: 2020-10-01 19:07:12.088
autoUnlockCount: 0
manualUnlockRequired: false
selfRegIPAddress: 192.168.0.1
senderRoleExpired: false

externalUser: false
channelType: Web
ipAddress: 10.1.1.1

 

 

 

The first line is the current date (i.e. 2020-10-02 17:01:32,360 INFO: ) and this would used for my indexed time.  Between this user event and the next user event, the log is interspersed with the following garbage

2020-10-02 16:59:36,409 ERROR:
Mail.send(): (Task ID: x4) Error while sending message:
javax.mail.SendFailedException: Invalid Addresses;
nested exception is:
com.sun.mail.smtp.SMTPAddressFailedException: 501 5.1.3 Invalid address


javax.mail.SendFailedException: Invalid Addresses;
nested exception is:
com.sun.mail.smtp.SMTPAddressFailedException: 501 5.1.3 Invalid address

at com.sun.mail.rcptTo(SMTPTransport.java:1862)
at com.sun.mail.sendMessage(SMTPTransport.java:1118)
at com.neesh.util.Mail.send(Unknown Source)
at com.neesh.fds.util.EmailHelper.sendEmail(Unknown Source)
at com.neesh.fds.core.MailSenderProcess.sendEmail(Unknown Source)
at com.neesh.fds.core.MailSenderProcess.executeHelper(Unknown Source)
at com.neesh.fds.core.AbstractFDSProcess.execute(Unknown Source)
at com.neesh.fds.core.AbstractFDSProcess.startup(Unknown Source)
at com.neesh.fds.core.MailSenderProcess.startup(Unknown Source)
at com.neesh.fds.core.FDSProcessThread.run(Unknown Source)
Caused by: com.sun.mail.smtp.SMTPAddressFailedException: 501 5.1.3 Invalid address

at com.sun.mail.smtp.SMTPTransport.rcptTo(SMTPTransport.java:1715)
... 9 more
2020-10-02 16:59:36,409 WARN:
Mail.send(): (Task ID: x4) Exiting send() with error code: -2

2020-10-02 16:59:36,409 ERROR:
MailSenderProcess.executeHelper(): Invalid Addresses

 

I started with adding data in and then using the Advanced configuration to try and break this up starting with BREAK_ONLY_BEFORE_DATE set as true and this starts to break the log but then (as expected) breaks at every date.  So the log then breaks up at every field that has a date (e.g. lastSignInDate, dateCreated, etc.).  The problem here is that the timestamp then gets impacted as it will read the time properly and my indexing for that specific break with be all over the place instead of the first time (i.e. 2020-10-02 17:01:32)

What I would like to do is capture everything between "2020-10-02 17:01:32,360 INFO:" and "ipAddress: 10.1.1.1" (using the example above).  

 

The log is a rolling log so it is constantly being written to.  I would also like to get rid of the garbage but have not tried doing NULLs to remove events before ingest.  

 

There is no recognised sourcetype nor does the product have any TA's in SPLUNK Base so I am trying to effectively create a new TA for this data source.

 

Thankyou for any assistance.  

Labels (2)
0 Karma
1 Solution

gcusello
SplunkTrust
SplunkTrust

Hi @willadams,

good for you that you solved you problem.

about the new problem it should be better to open a new case, but anyway, let me understand you new question:

you want to get only the events containing the word "INFO", is it correct?

and then I don't understand if you want to add another filter to exclude some other events or to delete a part of the INFO events.

If you want to exclude other events (e.g. the ones containing "CleanupProcess"), you could add another rule to the props and transforms, something like this:

in props.conf, add

TRANSFORMS-set = setnull,kept_logs,add_filter

in transfroms.conf, add

[add_filter]
REGEX = CleanupProcess
DEST_KEY = queue
FORMAT = nullQueue

If instead you want to delete a part of the INFO events, you have to use the SEDCMD option in props.conf: e.g. to delete the part of events containing "lastPasswordResetDate: 2019-08-20 5:06:00.856", "dateLastUpdated: 2020-07-20 16:49:30.409", "signupCompletedDate: 2019-07-03 14:24:52.389", "lastSignInDate: 2020-10-01 19:04:21.787", you could use in props.conf:

SEDCMD-mask_events = s/\"lastPasswordResetDate: 2019-08-20 5:06:00.856\", \"dateLastUpdated: 2020-07-20 16:49:30.409\", \"signupCompletedDate: 2019-07-03 14:24:52.389\", \"lastSignInDate: 2020-10-01 19:04:21.787\"//g

Obviously the regex in SEDCMD has to ve verified.

At the end you speak of extract field, remember tha the field extraction is done after filtering, so you cannot filter or delete part of events after indexing.

Ciao.

Giuseppe

 

View solution in original post

0 Karma

willadams
Contributor

I have been able to filter some of the events and at least it looks like I am going in the right direction.  Adding back for the question, this is what I have done so far:

 

PROPS

[silly_logs]
LINE_BREAKER = ([\r\n]+)
SHOULD_LINEMERGE = true
NO_BINARY_CHECK = true
TIME_PREFIX = ^
TIME_FORMAT = %Y-%m-%d %H:%M:%S
MAX_TIMESTAMP_LOOKAHEAD = 20
TRANSFORMS-set = setnull,kept_logs

 

TRANSFORMS

[setnull]
REGEX = .
DEST_KEY = queue
FORMAT = nullQueue

[kept_logs]
REGEX = ^.+INFO:
DEST_KEY = queue
FORMAT = silly_index

 

This has been able to get rid of all the stuff I don't want and just get the INFO logs.  The biggest problem I have now is to try and remove other INFO fields that are not useful and also do some DELIM's.  I tried adding a FIELD_DELIMITER=: to PROPS but this didn't seem to do anything.  I also tried adding to props a "REPORT-extract=myextract" and the associated transforms stanza (i.e. [myextract] DELIMS=:

 

This didn't work and I am stuck.  My log now shows as follows

 

2020-10-02 17:01:32,360 INFO:
User.val (value, value2, value3, value4): User not found. Parameters: userId: 1; requester: userVO: userId: 66666
status: V
username: joe.blogs@someplace.com
authenticationMethod: PASSWORD
emailAddress: joe.blogs@someplace.com
firstName: Joe
middleName:
lastName: Bloggs
displayName: Joe Blogs
createdBy: 123456
dateCreated: 2019-07-02 17:17:29.68
lastUpdatedBy: 66666
dateLastUpdated: 2020-07-20 16:49:30.409
signupCompletedDate: 2019-07-03 14:24:52.389
lastSignInDate: 2020-10-01 19:04:21.787
title: Person
company: Somewhere
addressLine1: 1 This Street
addressLine2:
city: Somewhere
state: ST1
zipCode: 1234
country: ThatCountry
workPhoneNumber:
homePhoneNumber: +001122334455
mobilePhoneNumber:
otherPhoneNumber:
faxNumber:
secretQuestions: []
signInLocked: false
signInFailureCount: 0
signInTotalFailureCount: 0
signInLastFailureDate: <null>
resetPasswordFailureCount: 0
resetPasswordTotalFailureCount: 0
resetPasswordLastFailureDate: <null>
recipientInclusionList:
recipientExclusionList:
allowSMTPInput: false
lastPasswordResetDate: 2019-08-20 15:06:00.856
passwordExpires: true
forcePasswordReset: false
externalUser: false
lastSignInUserName: joe.blogs@someplace.com
lastSignInDomain:
activationCode:
expiryDate: <null>
expiredOn: <null>
lastActivityDate: 2020-10-01 19:07:12.088
autoUnlockCount: 0
manualUnlockRequired: false
selfRegIPAddress: 192.168.0.1
senderRoleExpired: false

externalUser: false
channelType: Web
ipAddress: 10.1.1.1

 

As well as

2020-10-02 17:06:48,123 INFO:

 Helper.word(): Purging range: (123456, 123654)

 

And

2020-10-02 17:09:48,123 INFO:

 Helper.loadObjects(): Username does not exist. mystique

 

2020-10-02 18:01:48,546 INFO:

CleanupProcess.executeHelper(): Running cleanup process for Silly 1.2.3.4000 ...

 

I want to be able to adjust my PROPS to remove the items with "CleanUpProcess" or "Purging Range" but keep the valid data as well as the "Helper.loadObjects(): Username does not exist..." values.  I also want to be able to extrac the fields from the event based on ":" but also going back to the main log ignore the other fields that contain dates in them (i.e. "lastPasswordResetDate: 2019-08-20 5:06:00.856", "dateLastUpdated: 2020-07-20 16:49:30.409", "signupCompletedDate: 2019-07-03 14:24:52.389", "lastSignInDate: 2020-10-01 19:04:21.787".

 

I suspect I would need to extract the date fields specifically (maybe using REX) and maybe strptime them to get around the ":" delim problem that this may cause (once the DELIM is sorted).  


Any help appreciated

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @willadams,

good for you that you solved you problem.

about the new problem it should be better to open a new case, but anyway, let me understand you new question:

you want to get only the events containing the word "INFO", is it correct?

and then I don't understand if you want to add another filter to exclude some other events or to delete a part of the INFO events.

If you want to exclude other events (e.g. the ones containing "CleanupProcess"), you could add another rule to the props and transforms, something like this:

in props.conf, add

TRANSFORMS-set = setnull,kept_logs,add_filter

in transfroms.conf, add

[add_filter]
REGEX = CleanupProcess
DEST_KEY = queue
FORMAT = nullQueue

If instead you want to delete a part of the INFO events, you have to use the SEDCMD option in props.conf: e.g. to delete the part of events containing "lastPasswordResetDate: 2019-08-20 5:06:00.856", "dateLastUpdated: 2020-07-20 16:49:30.409", "signupCompletedDate: 2019-07-03 14:24:52.389", "lastSignInDate: 2020-10-01 19:04:21.787", you could use in props.conf:

SEDCMD-mask_events = s/\"lastPasswordResetDate: 2019-08-20 5:06:00.856\", \"dateLastUpdated: 2020-07-20 16:49:30.409\", \"signupCompletedDate: 2019-07-03 14:24:52.389\", \"lastSignInDate: 2020-10-01 19:04:21.787\"//g

Obviously the regex in SEDCMD has to ve verified.

At the end you speak of extract field, remember tha the field extraction is done after filtering, so you cannot filter or delete part of events after indexing.

Ciao.

Giuseppe

 

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @willadams,

good for your that you solved your problems.

ciao and happy splunking.

Giuseppe

P.S.: Karma Points are appreciated 😉

0 Karma

willadams
Contributor

Thanks @gcusello.  I just have to fiddle with seperate nulls but almost there.  If need be will log another community question if need be.  

0 Karma

willadams
Contributor

Hi @gcusello 

 

Thanks it was a good puzzle to solve.  The exclusion was as per the latter comments (exclude other events like Cleanup process).  It didn't occur to me to just re-use the null with the REGEX to remove that content.  I will give it another crack and see how that goes.  Thanks!

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @willadams,

let me know if you need other help.

Anyway, if the answer solves your initial need, please accept it for the other people of Community.

Ciao and good splunking.

Giuseppe

P.S.: Karma Points are appreciated 😉

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @willadams,

there are many dates in your log so you cannot use BREAK_ONLY_BEFORE_DATE, so try to identify your timestamp using in your props.conf 

TIME_PREFIX = ^

Ciao.

Giuseppe 

0 Karma
Get Updates on the Splunk Community!

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...