Archive

How to end a Rex search with mutiple characters or a string sub as } }?

Sample data:


{ "active" : "Y“, “locationID" : 75942068, "existsFlag" : true, "manuallyUnarchived" : false, "pendingReminder" : false, "headerOperationType" : "TRN“, “headerCreationDateString" : "2019111307255700“, “headerCreationDateEpoch" : "1573651557“, “jobs" : [ { "jobNumber" : "RWERQ70“, “jobVendorNum" : "ACME“, “jobAcknowledgementDateString" : "2019-11-08:10:42“, “jobAcknowledgementDateEpoch" : "1573231320“, “jobPodDateString" : "2019-11-13:05:44“, “jobPodDateEpoch" : "1573645440“, “jobShipDateString" : "2019-11-08:11:20“, “jobShipDateEpoch" : "1573233600“, “jobStatusCode" : "DELIVERED“, “jobPartNumbers" : [ { "skuMfgNbr" : "AS3452“, “quantity" : 1 } ], "partShippedDescription" : "SHP142SVC" } ], "comments" : [ { "commentType" : "PRB“, “commentDateEpoch" : "1573192800000“, “arrivalWindowStart" : 1573477200000, "arrivalWindowEnd" : 1573858740000, "avsUsed" : "N“, “laborStatusCode" : "ETA Provided“, “partStatusCode" : "Delivered“, “owner" : { "businessUnit" : 0, "certifiedFlag" : false, "techId" : 0 }, "environment" : "None“, “subEnvironment" : "Other“, “shortComment" : "TechDirection : Other“, “dispatchCreationDateEpoch" : "1573230503“, “serviceAttributes" : { "ServiceType" : "FixerUpper“, “OutofHours" : "N“, “OutofWarranty" : "N“, “ServiceHours" : "10x1“, “ADOverrideRequest" : "N" } }, "address" : { "address" : "1 Main St“, “address1" : "1 Main St“, “city" : "Nowhere“, “country" : "US“, “postalCode" : "12345" }

I need a field containing all the text from "activity" all the way to } }, (the double curly brackets separated by a space and followed by a comma, located right before "address" field. I could do this with if a single terminator character ( } ), as in the example below, but that would only give me half of the data needed. I need a Rex that gives me all the data betwen "activity" and the } } (the two curly brackets). The two curly brackets indicate the end of the main field).

This works: | rex field=_raw "\"activity\"(?&ltACTIVITY_FIELDS>[^\}]+)"

This is what I need, but it does not work: | rex field=_raw "\"activity\"(?&ltACTIVITY_FIELDS>[^\}\s\}]+)"

Thanks for any assistance provided.

Tags (1)
0 Karma
1 Solution

Thanks, everybody, for your support.

It turns out my data had line breaks in it, so the solution provided by gcusello worked perfectly on rex101, but failed on my actual search. Nevertheless, I got a hint to add "(?s)" to the beginning of the search. That returned results, but did not terminate at the two }} (I still don't understand why). So I extended the search to the next field name and that worked well. The final solution was:

| rex field=_raw "(?s)"\"activity\"\s+:\s+{\s+(?<ACTIVITY_FIELDS>.*)\"address\"

View solution in original post

0 Karma

Hi @ryanksplunkster ,

Your problem will be solved by using regex provided by @gcusello .

This is just for better practice and suggestions.

If you can properly format your JSON and ingest the data, Splunk will automatically extract all the fields.
And by using spath command you can group json elements and extract the required values without writing regex.

I have formatted your json sample data - https://jsonblob.com/d8185020-06e9-11ea-b8b2-d99444fde766
In this case, the activity can be easily extracted by appending | spath output=ACTIVITY_FIELDS path=activity to your query.
https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Spath

0 Karma

Thanks, everybody, for your support.

It turns out my data had line breaks in it, so the solution provided by gcusello worked perfectly on rex101, but failed on my actual search. Nevertheless, I got a hint to add "(?s)" to the beginning of the search. That returned results, but did not terminate at the two }} (I still don't understand why). So I extended the search to the next field name and that worked well. The final solution was:

| rex field=_raw "(?s)"\"activity\"\s+:\s+{\s+(?<ACTIVITY_FIELDS>.*)\"address\"

View solution in original post

0 Karma

SplunkTrust
SplunkTrust

Hi @ryanksplunkster,
I tried to analyze your sample but the word activity isn't in it, so I cannot test it.

Anyway I see two problems in your regex:

  • } is a special char and must be escaped.
  • the group [^\}]* means all the chars until }, so you cannot insert a group \}\s\} in square brackets.

So, if you want to take text between serviceAttributes and }} you should try something like

serviceAttributes\"\s+:\s+\{\s+(?<my_field>[^\}]*)\}\s\}

You can test it at https://regex101.com/r/gDamxB/1

Ciao.
Giuseppe

0 Karma

SplunkTrust
SplunkTrust

Hi @ryanksplunkster,
try this:

activity\"\s+:\s+\{\s+(?<ACTIVITY_FIELDS>.*)\}\s+\}

that you can test at https://regex101.com/r/kfQ1wR/1

Ciao.
Giuseppe

0 Karma

Hi,

can you please check the given sample data, "activity" keyword is not there.
One more thing, your json data format is not correct, its having tow types of double quotes,
" U+0022 QUOTATION MARK
“ U+201C LEFT DOUBLE QUOTATION MARK

check the difference at first key value pair "Y“

Once your data is in correct format, you can ingest it as json format and use spath command to get values of keys.
https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Spath

0 Karma

Gaurav, thank you for your response. Here's a better data sample (now complete and properly formatted):

"parts" : [ { "Mfg" : "SLF347", "Description" : "OP SYS", "quantity" : 1, "target" : 0, "Name" : "ACME", "quantityAT" : 0, "Mode" : "Manual" }, { "sku" : "8-45612-65423-0", "pDesc" : "INFOSERV", "quantity" : 1, "Name" : "ACME", "Mode" : "Manual" } ], "region" : "US", "scheduled" : false, "LevelEpoch" : "1570750199", "account" : { "channel" : "USPA" }, "activity" : { "dNumber" : 718526123, "build" : 12, "TimeZone" : "GMT-08:00", "description" : "Windows 10", "groupName" : "WestGroup", "id" : "A-2KJLSJRHE", "priority" : "Low", "reasonCode" : "GH5HTH", "recordType" : "Fix", "status" : "DCNR", "statusDescription" : "Def Request", "solution" : "Re-install", "AgencyRev" : false, "statusDateEpoch" : "1573715886", "rCount" : 1, "arrivalTimezone" : "GMT-06:00", "Start" : 1569938400000, "End" : 1571097540000, "Used" : "Y", "laborCode" : "Ackgd", "partCode" : "Ced", "note" : "Issue Description : Re-Install", "owner" : { "BU" : 0, "certFlag" : false, "name" : "John Doe", "Id" : 0 }, "environment" : "None", "subEnvironment" : "Other", "serialNumber" : "ADN0970234529387509237", "UID" : "B-4FJFLL", "DateEpoch" : "1569701589", "repeat" : "N", "servAtt" : { "ServiceType" : "Fix", "ServiceSubType" : "WH", "Billable" : "N", "CompleteUnit" : "N", "OutHour" : "N", "OutWarranty" : "N", "ServHour" : "365x24x7", "Level" : "10", "ProviderId" : "LEK", "ServpProvider" : "LEK", "System" : "Notebook", "Deferred" : "No", "Cover" : "5-10x5", "ScheduleFlag" : "N", "Component" : "N" } }, "address" : { "address" : "1 Main St", "address1" : "1 Main St", "city" : "Nowhere", "country" : "US", "postalCode" : "12345", "region" : "US", "state" : "ZZ", "timeZone" : "GMT-08:00" }

0 Karma
State of Splunk Careers

Access the Splunk Careers Report to see real data that shows how Splunk mastery increases your value and job satisfaction.

Find out what your skills are worth!