Splunk Search

How to do Field extraction from regex issues?

iamsplunker
Communicator

Hello Splunk Community, 

I'm trying to extract fields from the cloudwatch events like 1)region 2)arn 3) startTime 4) endTime 5)eventTypeCode 6)latestDescription from an event. The regex works fine in regex101 however it's not extracting all field values in Splunk

For ex: | rex field=_raw "region":\s(?P<_region>"\w+-\w+-\d)"

the above rex is only extracting us-east-1 region only where I have multiple regions in the data. Please help to extract the field I mentioned/highlighted. 

sample event:

2020-02-10T17:42:41.088Z 775ab4c6-ccc3-600b-9c84-124320628f00 {"records": [{"value": {"successfulSetoflog": [{"awsAccountId": "123456789123", "event": {"arn": "arn:aws:health:us-east-........................................................

Labels (2)
0 Karma
1 Solution

yuanliu
SplunkTrust
SplunkTrust

If you scroll to the right, you will notice that "arn" is a subnode event.arn, "region" is subnode event.region, and so on; "eventTypeCode" is just node eventTypeCode, and "latestDescription" is subnode eventDescription.latestDescription.

If you only want to see these, you can use fields or table command to list them, e.g.,

 

| rex "^[^{]+ (?<data>{.+})"
| spath input=data path=records{}
| mvexpand records{}
| spath input=records{}
| spath input=records{} path=value.successfulSetoflog{}
| mvexpand value.successfulSetoflog{}
| spath input=value.successfulSetoflog{}
| fields - data records{} value.successfulSetoflog{}.* value.successfulSetoflog{} _time
| fields event.arn event.region event.startTime event.endTime eventTypeCode eventDescription.latestDescription

 

Your sample data will give you listing (again, scroll to the right to see all fields)

event.arnevent.regionevent.startTimeevent.endTimeeventTypeCodeeventDescription.latestDescription
arn:aws:health:us-east-1::event/RDS/AWS_RDS_AURORA_SOFTWARE_BACKUP_SCHEDULED/AWS_RDS_AURORA_SOFTWARE_BACKUP_SCHEDULED_SOFTWARE_BACKUP_SCHEDULEDus-east-22020-01-20 04:33:00+00:002020-01-22 04:33:00+00:00AWS_DATABASE_SOFTWARE_UPDATE_AVAILABLEWe are contacting you to inform you that one or more of your Amazon authena instances listed in the 'Affected resources' tab are scheduled to receive maintenance on the mentioned hardware between 2020-03-10 04:33 UTC (thursday) and2020-03-10 07:33UTC (thursday). The exact time of the maintenance will be determined by the DB instance if you have any questions or concerns, contact AWS Premium Support. http://aws.amazon.com/support

This is an emulation for you to play with and compare with real data

 

| makeresults
| eval _raw = "2020-02-10T17:42:41.088Z 775ab4c6-ccc3-600b-9c84-124320628f00 {\"records\": [{\"value\": {\"successfulSetoflog\": [{\"awsAccountId\": \"123456789123\", \"event\": {\"arn\": \"arn:aws:health:us-east-1::event/RDS/AWS_RDS_AURORA_SOFTWARE_BACKUP_SCHEDULED/AWS_RDS_AURORA_SOFTWARE_BACKUP_SCHEDULED_SOFTWARE_BACKUP_SCHEDULED\", \"eventTypeCategory\": \"scheduledChange\", \"region\": \"us-east-2\", \"startTime\": \"2020-01-20 04:33:00+00:00\", \"endTime\": \"2020-01-22 04:33:00+00:00\", \"lastUpdatedTime\": \"2020-02-22 02:05:17.689000+00:00\", \"statusCode\": \"current\", \"eventStatusCode\": \"NUMBER_SPECIFIC\"}, \"eventTypeCode\": \"AWS_DATABASE_SOFTWARE_UPDATE_AVAILABLE\", \"eventDescription\": {\"latestDescription\": \"We are contacting you to inform you that one or more of your Amazon authena instances listed in the 'Affected resources' tab are scheduled to receive maintenance on the mentioned hardware between 2020-03-10 04:33 UTC (thursday) and2020-03-10 07:33UTC (thursday). The exact time of the maintenance will be determined by the DB instance if you have any questions or concerns, contact AWS Premium Support. \\n\\nhttp://aws.amazon.com/support\"}}], \"failedSet\": [], \"ResponseMetatype\": {\"RequestId\": \"yz0c12d7-s44d-8b65-k883-f233rb4cb70c\", \"HTTPStatusCode\": 500, \"HTTPHeaders\": {\"x-amzn-requestid\": \"105ab4c6-ccc3-999b-9c84-999320628f00 \", \"context-type\": \"application/x-dvz-json-2.1\", \"content-length\": \"4000\", \"date\": \"Tue, 10 Jan 2020 11:11:11 GMT\"}, \"RetryAttempts\": 0}, \"detail-type\": \"AWS API Health Event\"}}]}"
``` data emulation above ```

 

View solution in original post

0 Karma

jotne
Builder

For this type of data, you can use the extract command.  To make it work, we need to remove the part before the first {.  (It can be saved to a field if needed)

| makeresults
| eval _raw="2020-02-10T17:42:41.088Z 775ab4c6-ccc3-600b-9c84-124320628f00 {\"records\": [{\"value\": {\"successfulSetoflog\": [{\"awsAccountId\": \"123456789123\", \"event\": {\"arn\": \"arn:aws:health:us-east-1::event/RDS/AWS_RDS_AURORA_SOFTWARE_BACKUP_SCHEDULED/AWS_RDS_AURORA_SOFTWARE_BACKUP_SCHEDULED_SOFTWARE_BACKUP_SCHEDULED\", \"eventTypeCategory\": \"scheduledChange\", \"region\": \"us-east-2\", \"startTime\": \"2020-01-20 04:33:00+00:00\", \"endTime\": \"2020-01-22 04:33:00+00:00\", \"lastUpdatedTime\": \"2020-02-22 02:05:17.689000+00:00\", \"statusCode\": \"current\", \"eventStatusCode\": \"NUMBER_SPECIFIC\"}, \"eventTypeCode\": \"AWS_DATABASE_SOFTWARE_UPDATE_AVAILABLE\", \"eventDescription\": {\"latestDescription\": \"We are contacting you to inform you that one or more of your Amazon authena instances listed in the 'Affected resources' tab are scheduled to receive maintenance on the mentioned hardware between 2020-03-10 04:33 UTC (thursday) and2020-03-10 07:33UTC (thursday). The exact time of the maintenance will be determined by the DB instance if you have any questions or concerns, contact AWS Premium Support. \n\nhttp://aws.amazon.com/support\"}}], \"failedSet\": [], \"ResponseMetatype\": {\"RequestId\": \"yz0c12d7-s44d-8b65-k883-f233rb4cb70c\", \"HTTPStatusCode\": 500, \"HTTPHeaders\": {\"x-amzn-requestid\": \"105ab4c6-ccc3-999b-9c84-999320628f00 \", \"context-type\": \"application/x-dvz-json-2.1\", \"content-length\": \"4000\", \"date\": \"Tue, 10 Jan 2020 11:11:11 GMT\"}, \"RetryAttempts\": 0}, \"detail-type\": \"AWS API Health Event\"}}]}"
| rex mode=sed "s/^[^{]+//"
| extract
Tags (1)
0 Karma

yuanliu
SplunkTrust
SplunkTrust

I always tell people do not treat structured data as text.  You'll regret later.  Use spath to unpack JSON; use mvexpand to flatten JSON array.

 

| rex "^[^{]+ (?<data>{.+})"
| spath input=data path=records{}
| mvexpand records{}
| spath input=records{}
| spath input=records{} path=value.successfulSetoflog{}
| mvexpand value.successfulSetoflog{}
| spath input=value.successfulSetoflog{}
| fields - data records{} value.successfulSetoflog{}.* value.successfulSetoflog{}

 

The sample data will give you

awsAccountIdevent.arnevent.endTimeevent.eventStatusCodeevent.eventTypeCategoryevent.lastUpdatedTimeevent.regionevent.startTimeevent.statusCodeeventDescription.latestDescriptioneventTypeCodevalue.ResponseMetatype.HTTPHeaders.content-lengthvalue.ResponseMetatype.HTTPHeaders.context-typevalue.ResponseMetatype.HTTPHeaders.datevalue.ResponseMetatype.HTTPHeaders.x-amzn-requestidvalue.ResponseMetatype.HTTPStatusCodevalue.ResponseMetatype.RequestIdvalue.ResponseMetatype.RetryAttemptsvalue.detail-type
123456789123arn:aws:health:us-east-1::event/RDS/AWS_RDS_AURORA_SOFTWARE_BACKUP_SCHEDULED/AWS_RDS_AURORA_SOFTWARE_BACKUP_SCHEDULED_SOFTWARE_BACKUP_SCHEDULED2020-01-22 04:33:00+00:00NUMBER_SPECIFICscheduledChange2020-02-22 02:05:17.689000+00:00us-east-22020-01-20 04:33:00+00:00currentWe are contacting you to inform you that one or more of your Amazon authena instances listed in the 'Affected resources' tab are scheduled to receive maintenance on the mentioned hardware between 2020-03-10 04:33 UTC (thursday) and2020-03-10 07:33UTC (thursday). The exact time of the maintenance will be determined by the DB instance if you have any questions or concerns, contact AWS Premium Support. http://aws.amazon.com/supportAWS_DATABASE_SOFTWARE_UPDATE_AVAILABLE4000application/x-dvz-json-2.1Tue, 10 Jan 2020 11:11:11 GMT105ab4c6-ccc3-999b-9c84-999320628f00500yz0c12d7-s44d-8b65-k883-f233rb4cb70c0AWS API Health Event
Tags (2)

iamsplunker
Communicator

@yuanliu Thanks for your response, the query you've provided is the example?
Would you mind to share the example query to unpack the fields I've highlighted in my question.

0 Karma

yuanliu
SplunkTrust
SplunkTrust

If you scroll to the right, you will notice that "arn" is a subnode event.arn, "region" is subnode event.region, and so on; "eventTypeCode" is just node eventTypeCode, and "latestDescription" is subnode eventDescription.latestDescription.

If you only want to see these, you can use fields or table command to list them, e.g.,

 

| rex "^[^{]+ (?<data>{.+})"
| spath input=data path=records{}
| mvexpand records{}
| spath input=records{}
| spath input=records{} path=value.successfulSetoflog{}
| mvexpand value.successfulSetoflog{}
| spath input=value.successfulSetoflog{}
| fields - data records{} value.successfulSetoflog{}.* value.successfulSetoflog{} _time
| fields event.arn event.region event.startTime event.endTime eventTypeCode eventDescription.latestDescription

 

Your sample data will give you listing (again, scroll to the right to see all fields)

event.arnevent.regionevent.startTimeevent.endTimeeventTypeCodeeventDescription.latestDescription
arn:aws:health:us-east-1::event/RDS/AWS_RDS_AURORA_SOFTWARE_BACKUP_SCHEDULED/AWS_RDS_AURORA_SOFTWARE_BACKUP_SCHEDULED_SOFTWARE_BACKUP_SCHEDULEDus-east-22020-01-20 04:33:00+00:002020-01-22 04:33:00+00:00AWS_DATABASE_SOFTWARE_UPDATE_AVAILABLEWe are contacting you to inform you that one or more of your Amazon authena instances listed in the 'Affected resources' tab are scheduled to receive maintenance on the mentioned hardware between 2020-03-10 04:33 UTC (thursday) and2020-03-10 07:33UTC (thursday). The exact time of the maintenance will be determined by the DB instance if you have any questions or concerns, contact AWS Premium Support. http://aws.amazon.com/support

This is an emulation for you to play with and compare with real data

 

| makeresults
| eval _raw = "2020-02-10T17:42:41.088Z 775ab4c6-ccc3-600b-9c84-124320628f00 {\"records\": [{\"value\": {\"successfulSetoflog\": [{\"awsAccountId\": \"123456789123\", \"event\": {\"arn\": \"arn:aws:health:us-east-1::event/RDS/AWS_RDS_AURORA_SOFTWARE_BACKUP_SCHEDULED/AWS_RDS_AURORA_SOFTWARE_BACKUP_SCHEDULED_SOFTWARE_BACKUP_SCHEDULED\", \"eventTypeCategory\": \"scheduledChange\", \"region\": \"us-east-2\", \"startTime\": \"2020-01-20 04:33:00+00:00\", \"endTime\": \"2020-01-22 04:33:00+00:00\", \"lastUpdatedTime\": \"2020-02-22 02:05:17.689000+00:00\", \"statusCode\": \"current\", \"eventStatusCode\": \"NUMBER_SPECIFIC\"}, \"eventTypeCode\": \"AWS_DATABASE_SOFTWARE_UPDATE_AVAILABLE\", \"eventDescription\": {\"latestDescription\": \"We are contacting you to inform you that one or more of your Amazon authena instances listed in the 'Affected resources' tab are scheduled to receive maintenance on the mentioned hardware between 2020-03-10 04:33 UTC (thursday) and2020-03-10 07:33UTC (thursday). The exact time of the maintenance will be determined by the DB instance if you have any questions or concerns, contact AWS Premium Support. \\n\\nhttp://aws.amazon.com/support\"}}], \"failedSet\": [], \"ResponseMetatype\": {\"RequestId\": \"yz0c12d7-s44d-8b65-k883-f233rb4cb70c\", \"HTTPStatusCode\": 500, \"HTTPHeaders\": {\"x-amzn-requestid\": \"105ab4c6-ccc3-999b-9c84-999320628f00 \", \"context-type\": \"application/x-dvz-json-2.1\", \"content-length\": \"4000\", \"date\": \"Tue, 10 Jan 2020 11:11:11 GMT\"}, \"RetryAttempts\": 0}, \"detail-type\": \"AWS API Health Event\"}}]}"
``` data emulation above ```

 

0 Karma
Get Updates on the Splunk Community!

Join Us for Splunk University and Get Your Bootcamp Game On!

If you know, you know! Splunk University is the vibe this summer so register today for bootcamps galore ...

.conf24 | Learning Tracks for Security, Observability, Platform, and Developers!

.conf24 is taking place at The Venetian in Las Vegas from June 11 - 14. Continue reading to learn about the ...

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...