Sample data:
{ "active" : "Y“, “locationID" : 75942068, "existsFlag" : true, "manuallyUnarchived" : false, "pendingReminder" : false, "headerOperationType" : "TRN“, “headerCreationDateString" : "2019111307255700“, “headerCreationDateEpoch" : "1573651557“, “jobs" : [ { "jobNumber" : "RWERQ70“, “jobVendorNum" : "ACME“, “jobAcknowledgementDateString" : "2019-11-08:10:42“, “jobAcknowledgementDateEpoch" : "1573231320“, “jobPodDateString" : "2019-11-13:05:44“, “jobPodDateEpoch" : "1573645440“, “jobShipDateString" : "2019-11-08:11:20“, “jobShipDateEpoch" : "1573233600“, “jobStatusCode" : "DELIVERED“, “jobPartNumbers" : [ { "skuMfgNbr" : "AS3452“, “quantity" : 1 } ], "partShippedDescription" : "SHP142SVC" } ], "comments" : [ { "commentType" : "PRB“, “commentDateEpoch" : "1573192800000“, “arrivalWindowStart" : 1573477200000, "arrivalWindowEnd" : 1573858740000, "avsUsed" : "N“, “laborStatusCode" : "ETA Provided“, “partStatusCode" : "Delivered“, “owner" : { "businessUnit" : 0, "certifiedFlag" : false, "techId" : 0 }, "environment" : "None“, “subEnvironment" : "Other“, “shortComment" : "TechDirection : Other“, “dispatchCreationDateEpoch" : "1573230503“, “serviceAttributes" : { "ServiceType" : "FixerUpper“, “OutofHours" : "N“, “OutofWarranty" : "N“, “ServiceHours" : "10x1“, “ADOverrideRequest" : "N" } }, "address" : { "address" : "1 Main St“, “address1" : "1 Main St“, “city" : "Nowhere“, “country" : "US“, “postalCode" : "12345" }
I need a field containing all the text from "activity" all the way to } }, (the double curly brackets separated by a space and followed by a comma, located right before "address" field. I could do this with if a single terminator character ( } ), as in the example below, but that would only give me half of the data needed. I need a Rex that gives me all the data betwen "activity" and the } } (the two curly brackets). The two curly brackets indicate the end of the main field).
This works: | rex field=_raw "\"activity\"(?<ACTIVITY_FIELDS>[^\}]+)"
This is what I need, but it does not work: | rex field=_raw "\"activity\"(?<ACTIVITY_FIELDS>[^\}\s\}]+)"
Thanks for any assistance provided.
Thanks, everybody, for your support.
It turns out my data had line breaks in it, so the solution provided by gcusello worked perfectly on rex101, but failed on my actual search. Nevertheless, I got a hint to add "(?s)" to the beginning of the search. That returned results, but did not terminate at the two }} (I still don't understand why). So I extended the search to the next field name and that worked well. The final solution was:
| rex field=_raw "(?s)"\"activity\"\s+:\s+{\s+(?<ACTIVITY_FIELDS>.*)\"address\"
Hi @ryanksplunkster ,
Your problem will be solved by using regex provided by @gcusello .
This is just for better practice and suggestions.
If you can properly format your JSON and ingest the data, Splunk will automatically extract all the fields.
And by using spath
command you can group json elements and extract the required values without writing regex.
I have formatted your json sample data - https://jsonblob.com/d8185020-06e9-11ea-b8b2-d99444fde766
In this case, the activity
can be easily extracted by appending | spath output=ACTIVITY_FIELDS path=activity
to your query.
https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Spath
Thanks, everybody, for your support.
It turns out my data had line breaks in it, so the solution provided by gcusello worked perfectly on rex101, but failed on my actual search. Nevertheless, I got a hint to add "(?s)" to the beginning of the search. That returned results, but did not terminate at the two }} (I still don't understand why). So I extended the search to the next field name and that worked well. The final solution was:
| rex field=_raw "(?s)"\"activity\"\s+:\s+{\s+(?<ACTIVITY_FIELDS>.*)\"address\"
Hi @ryanksplunkster,
I tried to analyze your sample but the word activity
isn't in it, so I cannot test it.
Anyway I see two problems in your regex:
}
is a special char and must be escaped.[^\}]*
means all the chars until }
, so you cannot insert a group \}\s\}
in square brackets.So, if you want to take text between serviceAttributes
and }}
you should try something like
serviceAttributes\"\s+:\s+\{\s+(?<my_field>[^\}]*)\}\s\}
You can test it at https://regex101.com/r/gDamxB/1
Ciao.
Giuseppe
Hi @ryanksplunkster,
try this:
activity\"\s+:\s+\{\s+(?<ACTIVITY_FIELDS>.*)\}\s+\}
that you can test at https://regex101.com/r/kfQ1wR/1
Ciao.
Giuseppe
Hi,
can you please check the given sample data, "activity" keyword is not there.
One more thing, your json data format is not correct, its having tow types of double quotes,
" U+0022 QUOTATION MARK
“ U+201C LEFT DOUBLE QUOTATION MARK
check the difference at first key value pair "Y“
Once your data is in correct format, you can ingest it as json format and use spath
command to get values of keys.
https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Spath
Gaurav, thank you for your response. Here's a better data sample (now complete and properly formatted):
"parts" : [ { "Mfg" : "SLF347", "Description" : "OP SYS", "quantity" : 1, "target" : 0, "Name" : "ACME", "quantityAT" : 0, "Mode" : "Manual" }, { "sku" : "8-45612-65423-0", "pDesc" : "INFOSERV", "quantity" : 1, "Name" : "ACME", "Mode" : "Manual" } ], "region" : "US", "scheduled" : false, "LevelEpoch" : "1570750199", "account" : { "channel" : "USPA" }, "activity" : { "dNumber" : 718526123, "build" : 12, "TimeZone" : "GMT-08:00", "description" : "Windows 10", "groupName" : "WestGroup", "id" : "A-2KJLSJRHE", "priority" : "Low", "reasonCode" : "GH5HTH", "recordType" : "Fix", "status" : "DCNR", "statusDescription" : "Def Request", "solution" : "Re-install", "AgencyRev" : false, "statusDateEpoch" : "1573715886", "rCount" : 1, "arrivalTimezone" : "GMT-06:00", "Start" : 1569938400000, "End" : 1571097540000, "Used" : "Y", "laborCode" : "Ackgd", "partCode" : "Ced", "note" : "Issue Description : Re-Install", "owner" : { "BU" : 0, "certFlag" : false, "name" : "John Doe", "Id" : 0 }, "environment" : "None", "subEnvironment" : "Other", "serialNumber" : "ADN0970234529387509237", "UID" : "B-4FJFLL", "DateEpoch" : "1569701589", "repeat" : "N", "servAtt" : { "ServiceType" : "Fix", "ServiceSubType" : "WH", "Billable" : "N", "CompleteUnit" : "N", "OutHour" : "N", "OutWarranty" : "N", "ServHour" : "365x24x7", "Level" : "10", "ProviderId" : "LEK", "ServpProvider" : "LEK", "System" : "Notebook", "Deferred" : "No", "Cover" : "5-10x5", "ScheduleFlag" : "N", "Component" : "N" } }, "address" : { "address" : "1 Main St", "address1" : "1 Main St", "city" : "Nowhere", "country" : "US", "postalCode" : "12345", "region" : "US", "state" : "ZZ", "timeZone" : "GMT-08:00" }