Splunk Search

How to end a Rex search with mutiple characters or a string sub as } }?

ryanksplunkster
Explorer

Sample data:


{ "active" : "Y“, “locationID" : 75942068, "existsFlag" : true, "manuallyUnarchived" : false, "pendingReminder" : false, "headerOperationType" : "TRN“, “headerCreationDateString" : "2019111307255700“, “headerCreationDateEpoch" : "1573651557“, “jobs" : [ { "jobNumber" : "RWERQ70“, “jobVendorNum" : "ACME“, “jobAcknowledgementDateString" : "2019-11-08:10:42“, “jobAcknowledgementDateEpoch" : "1573231320“, “jobPodDateString" : "2019-11-13:05:44“, “jobPodDateEpoch" : "1573645440“, “jobShipDateString" : "2019-11-08:11:20“, “jobShipDateEpoch" : "1573233600“, “jobStatusCode" : "DELIVERED“, “jobPartNumbers" : [ { "skuMfgNbr" : "AS3452“, “quantity" : 1 } ], "partShippedDescription" : "SHP142SVC" } ], "comments" : [ { "commentType" : "PRB“, “commentDateEpoch" : "1573192800000“, “arrivalWindowStart" : 1573477200000, "arrivalWindowEnd" : 1573858740000, "avsUsed" : "N“, “laborStatusCode" : "ETA Provided“, “partStatusCode" : "Delivered“, “owner" : { "businessUnit" : 0, "certifiedFlag" : false, "techId" : 0 }, "environment" : "None“, “subEnvironment" : "Other“, “shortComment" : "TechDirection : Other“, “dispatchCreationDateEpoch" : "1573230503“, “serviceAttributes" : { "ServiceType" : "FixerUpper“, “OutofHours" : "N“, “OutofWarranty" : "N“, “ServiceHours" : "10x1“, “ADOverrideRequest" : "N" } }, "address" : { "address" : "1 Main St“, “address1" : "1 Main St“, “city" : "Nowhere“, “country" : "US“, “postalCode" : "12345" }

I need a field containing all the text from "activity" all the way to } }, (the double curly brackets separated by a space and followed by a comma, located right before "address" field. I could do this with if a single terminator character ( } ), as in the example below, but that would only give me half of the data needed. I need a Rex that gives me all the data betwen "activity" and the } } (the two curly brackets). The two curly brackets indicate the end of the main field).

This works: | rex field=_raw "\"activity\"(?&ltACTIVITY_FIELDS>[^\}]+)"

This is what I need, but it does not work: | rex field=_raw "\"activity\"(?&ltACTIVITY_FIELDS>[^\}\s\}]+)"

Thanks for any assistance provided.

Tags (1)
0 Karma
1 Solution

ryanksplunkster
Explorer

Thanks, everybody, for your support.

It turns out my data had line breaks in it, so the solution provided by gcusello worked perfectly on rex101, but failed on my actual search. Nevertheless, I got a hint to add "(?s)" to the beginning of the search. That returned results, but did not terminate at the two }} (I still don't understand why). So I extended the search to the next field name and that worked well. The final solution was:

| rex field=_raw "(?s)"\"activity\"\s+:\s+{\s+(?<ACTIVITY_FIELDS>.*)\"address\"

View solution in original post

0 Karma

gaurav_maniar
Builder

Hi @ryanksplunkster ,

Your problem will be solved by using regex provided by @gcusello .

This is just for better practice and suggestions.

If you can properly format your JSON and ingest the data, Splunk will automatically extract all the fields.
And by using spath command you can group json elements and extract the required values without writing regex.

I have formatted your json sample data - https://jsonblob.com/d8185020-06e9-11ea-b8b2-d99444fde766
In this case, the activity can be easily extracted by appending | spath output=ACTIVITY_FIELDS path=activity to your query.
https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Spath

0 Karma

ryanksplunkster
Explorer

Thanks, everybody, for your support.

It turns out my data had line breaks in it, so the solution provided by gcusello worked perfectly on rex101, but failed on my actual search. Nevertheless, I got a hint to add "(?s)" to the beginning of the search. That returned results, but did not terminate at the two }} (I still don't understand why). So I extended the search to the next field name and that worked well. The final solution was:

| rex field=_raw "(?s)"\"activity\"\s+:\s+{\s+(?<ACTIVITY_FIELDS>.*)\"address\"

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @ryanksplunkster,
I tried to analyze your sample but the word activity isn't in it, so I cannot test it.

Anyway I see two problems in your regex:

  • } is a special char and must be escaped.
  • the group [^\}]* means all the chars until }, so you cannot insert a group \}\s\} in square brackets.

So, if you want to take text between serviceAttributes and }} you should try something like

serviceAttributes\"\s+:\s+\{\s+(?<my_field>[^\}]*)\}\s\}

You can test it at https://regex101.com/r/gDamxB/1

Ciao.
Giuseppe

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @ryanksplunkster,
try this:

activity\"\s+:\s+\{\s+(?<ACTIVITY_FIELDS>.*)\}\s+\}

that you can test at https://regex101.com/r/kfQ1wR/1

Ciao.
Giuseppe

0 Karma

gaurav_maniar
Builder

Hi,

can you please check the given sample data, "activity" keyword is not there.
One more thing, your json data format is not correct, its having tow types of double quotes,
" U+0022 QUOTATION MARK
“ U+201C LEFT DOUBLE QUOTATION MARK

check the difference at first key value pair "Y“

Once your data is in correct format, you can ingest it as json format and use spath command to get values of keys.
https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Spath

0 Karma

ryanksplunkster
Explorer

Gaurav, thank you for your response. Here's a better data sample (now complete and properly formatted):

"parts" : [ { "Mfg" : "SLF347", "Description" : "OP SYS", "quantity" : 1, "target" : 0, "Name" : "ACME", "quantityAT" : 0, "Mode" : "Manual" }, { "sku" : "8-45612-65423-0", "pDesc" : "INFOSERV", "quantity" : 1, "Name" : "ACME", "Mode" : "Manual" } ], "region" : "US", "scheduled" : false, "LevelEpoch" : "1570750199", "account" : { "channel" : "USPA" }, "activity" : { "dNumber" : 718526123, "build" : 12, "TimeZone" : "GMT-08:00", "description" : "Windows 10", "groupName" : "WestGroup", "id" : "A-2KJLSJRHE", "priority" : "Low", "reasonCode" : "GH5HTH", "recordType" : "Fix", "status" : "DCNR", "statusDescription" : "Def Request", "solution" : "Re-install", "AgencyRev" : false, "statusDateEpoch" : "1573715886", "rCount" : 1, "arrivalTimezone" : "GMT-06:00", "Start" : 1569938400000, "End" : 1571097540000, "Used" : "Y", "laborCode" : "Ackgd", "partCode" : "Ced", "note" : "Issue Description : Re-Install", "owner" : { "BU" : 0, "certFlag" : false, "name" : "John Doe", "Id" : 0 }, "environment" : "None", "subEnvironment" : "Other", "serialNumber" : "ADN0970234529387509237", "UID" : "B-4FJFLL", "DateEpoch" : "1569701589", "repeat" : "N", "servAtt" : { "ServiceType" : "Fix", "ServiceSubType" : "WH", "Billable" : "N", "CompleteUnit" : "N", "OutHour" : "N", "OutWarranty" : "N", "ServHour" : "365x24x7", "Level" : "10", "ProviderId" : "LEK", "ServpProvider" : "LEK", "System" : "Notebook", "Deferred" : "No", "Cover" : "5-10x5", "ScheduleFlag" : "N", "Component" : "N" } }, "address" : { "address" : "1 Main St", "address1" : "1 Main St", "city" : "Nowhere", "country" : "US", "postalCode" : "12345", "region" : "US", "state" : "ZZ", "timeZone" : "GMT-08:00" }

0 Karma
Get Updates on the Splunk Community!

Index This | Divide 100 by half. What do you get?

November 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with this ...

Stay Connected: Your Guide to December Tech Talks, Office Hours, and Webinars!

❄️ Celebrate the season with our December lineup of Community Office Hours, Tech Talks, and Webinars! ...

Splunk and Fraud

Watch Now!Watch an insightful webinar where we delve into the innovative approaches to solving fraud using the ...