Getting Data In

Is there a way to replace _raw at search time with EXTRACT or REPORT?

maciep
Champion

Hi all,

Currently on 6.5.2, but hopefully upgrading to 7.x in the next few months.

I have some data that is basically a

<timestamp> <component> <json payload>

I'm wondering if there is a way that I can replace the _raw with just the <json payload> at search time. I know I can do it with EVAL/replace in props, but I'm hoping to do it before that. The end goal is to have the entire event be json by the time auto kv runs, so that Splunk will parse out all of the json fields.

I have no problem extracting the json payload, but I'm struggling to find the process to actually replace _raw with EXTRACT or REPORT (both run before Auto KV). If i use EXTRACT with a named capture group of _raw, then splunk will rename it to just raw...so that doesn't help.

With REPORT, the DEST_KEY isn't available at search time, so I can't use that. I've tried using a named capture group there too with CLEAN_KEYS = false, but nothing gets extracted in that case. I tried capturing the json and then using FORMAT = _raw::$1 ... but that didn't seem to work either.

Am I missing something obvious? Is this possible?

Worst case, I'll try to extract that json fields with REPORT, but unless the json is pretty flat, the extracted fields aren't the easiest to work with (parent/child relationships get lost). Or maybe I'll just do it at index time and extract the component then too.

Maybe someday Splunk will be able to automatically parse json/xml/etc when that structured data is just in an event (not the entire event)?

EDIT
Adding a sample event.

Mar 26 13:44:57 myserver java {"userName":null,"enterpriseName":null,"type":"UPDATE","entityType":"letters","entityVersion":null,"assoicatedEvent":true,"eventReceivedTime":1522071897164,"entities":[{"children":null,"parentType":"letters","entityScope":"GLOBAL","lastUpdatedBy":"GUID","lastUpdatedDate":1522071896000,"creationDate":1518127649000,"alreadyMarkedForUnavailable":false,"unavailableTimestamp":0,"letters":[],"name":"server","location":null,"description":null,"productVersion":"version","status":"UP","lastStateChange":1521206768829,"disks":[{"children":null,"parentType":"letters","entityScope":"GLOBAL","name":"name:","unit":"GB","used":0.10546875,"available":0.90234375,"ID":"GUID","parentID":"GUID","externalID":null,"size":1.0087890625}],"owner":"GUID","ID":"GUID","parentID":"GUID","externalID":null,"address":IP","managementIP":"IP","BGPPeers":null,"messages":null,"peakMemoryUsage":74.54,"averageMemoryUsage":74.54,"currentMemoryUsage":74.54,"peakCPUUsage":37.79,"averageCPUUsage":25.45,"currentCPUUsage":25.57}],"diffMap":{"EntityKey ( id = GUID    entityClassType = letters     )":{"added":null,"deleted":null,"modified":{"currentCPUUsage":{"oldValue":"25.26","newValue":"25.57"},"peakCPUUsage":{"oldValue":"37.78","newValue":"37.79"}}}},"ignoreDiffInMediationEvents":false,"updateMechanism":"DEFAULT","requestID":null,"sourceEnterpriseID":null}

jpass
Contributor

I think I have the same issue. Basically the event is not recognized by splunk as valid json becuase of the string before your json object: "Mar 26 13:44:57 myserver java". You may also need to remove \" if these are what is in your _raw event text.

  • The solution I ended up with was to have the application team modify the log format so the string of fields that starts before the json object was included within the json object itself so as to deliver valid json to splunk.

  • Another option is use index time feature: ingest_eval:

  • ingest_eval - Modify _raw at index time to remove or re-arrange _raw

see: https://docs.splunk.com/Documentation/Splunk/7.2.6/Data/IngestEval

props.conf
[mysourcetype]
TRANSFORMS = evalatindextime

transforms.conf (remove string)
[evalatindextime]
ingest_eval = _raw=replace(_raw,"^(.*?)({.*)$))","\2")

You could also use REPLACE() here to, instead of removing the text, capture parts of _raw and then compile each part in a way that produces valid JSON.

[evalatindextime]
ingest_eval = _raw=replace(_raw,"^([a-zA-Z]{3}\s\d{2}:\d{2}:\d{2})\s(.?)\s(.?)({)(.*)$","\4\"timestamp\":\"\1\",\"server\":\"\2\":\"type\":\"\3\",\5")

  • another issue I had was my main event text was a json object nested inside the main json object. I'm using the Splunk http event collector & splunk java logging btw.

The issue I had was the nested json object had "\ around values & fields and the object itself had quotes around it. I replace() the \" with " and removed the quotes around the nested object.

replace(_raw,"\\\"","\"")

  • finally I had to also remove the " around the nested json object.

Here's what my bad events looked like:
{
"field":"value",
"field2":"value2",
"message":"12:50:43.222 Username Company address {\"nestedfield1\":\"nestedvalue1\",\"nestedfield3\",:\"nestedvalue3\" }"
}

Here's what my good event looks like after using REPLACE()

{
"field":"value",
"field2":"value2",
"message":{"timestamp":"12:50:43.222","username":"foo","company":"mycompany","address":"an address here","nestedfield1":"nestedvalue1","nestedfield2":"nestedvalue2","nestedfield3","nestedvalue3"}
}

Splunk parses out the top level fields/values fine (field1,field2,message). But the json object in message was not being parsed.

After replacing "\ with " and removing the " around the the nested json object all worked as expected.

woodcock
Esteemed Legend

Try using a calcluated field like this:

[YourSourcetypeHere]
EVAL-_raw = replace(_raw, "^[^\{]+", "")
0 Karma

maciep
Champion

this also happens after splunk's auto kv functionality which sort of defeats the overall goal having Splunk parse the json for me like it does for kv_mode=json on all full json events.

https://docs.splunk.com/Documentation/Splunk/7.2.6/Knowledge/Searchtimeoperationssequence

0 Karma

woodcock
Esteemed Legend

You can do this at index-time with SEDCMD like this:

[YourSourcetypeHere]
SEDCMD-json_only = s/^[^\{]+//

Deploy to your Indexers, restart Splunk there, send in fresh events, profit!

0 Karma

maciep
Champion

Thanks Woodcock. I know it can be done at index time. More than anything, I'm trying to confirm here that it is not possible to do at search time before Splunk does its kv processing.

0 Karma

woodcock
Esteemed Legend

Why can't you just add this:

| rex mode=sed "s/^[^\{]+//"
| kv

Like this:

| makeresults 
| eval _raw="Mar 26 13:44:57 myserver java {\"userName\":null,\"enterpriseName\":null,\"type\":\"UPDATE\",\"entityType\":\"letters\",\"entityVersion\":null,\"assoicatedEvent\":true,\"eventReceivedTime\":1522071897164,\"entities\":[{\"children\":null,\"parentType\":\"letters\",\"entityScope\":\"GLOBAL\",\"lastUpdatedBy\":\"GUID\",\"lastUpdatedDate\":1522071896000,\"creationDate\":1518127649000,\"alreadyMarkedForUnavailable\":false,\"unavailableTimestamp\":0,\"letters\":[],\"name\":\"server\",\"location\":null,\"description\":null,\"productVersion\":\"version\",\"status\":\"UP\",\"lastStateChange\":1521206768829,\"disks\":[{\"children\":null,\"parentType\":\"letters\",\"entityScope\":\"GLOBAL\",\"name\":\"name:\",\"unit\":\"GB\",\"used\":0.10546875,\"available\":0.90234375,\"ID\":\"GUID\",\"parentID\":\"GUID\",\"externalID\":null,\"size\":1.0087890625}],\"owner\":\"GUID\",\"ID\":\"GUID\",\"parentID\":\"GUID\",\"externalID\":null,\"address\":IP\",\"managementIP\":\"IP\",\"BGPPeers\":null,\"messages\":null,\"peakMemoryUsage\":74.54,\"averageMemoryUsage\":74.54,\"currentMemoryUsage\":74.54,\"peakCPUUsage\":37.79,\"averageCPUUsage\":25.45,\"currentCPUUsage\":25.57}],\"diffMap\":{\"EntityKey ( id = GUID    entityClassType = letters     )\":{\"added\":null,\"deleted\":null,\"modified\":{\"currentCPUUsage\":{\"oldValue\":\"25.26\",\"newValue\":\"25.57\"},\"peakCPUUsage\":{\"oldValue\":\"37.78\",\"newValue\":\"37.79\"}}}},\"ignoreDiffInMediationEvents\":false,\"updateMechanism\":\"DEFAULT\",\"requestID\":null,\"sourceEnterpriseID\":null}"
| rex mode=sed "s/^[^\{]+//"
| kv
0 Karma

maciep
Champion

The goal is to be able to use the resulting fields in the base search, e.g. index="blah" parent.child.subchild="whatever" ...i don't want to have to pull back all of the results and then parse the json and then filter events. Even if i specify a keyword in the base search to limit the initial results, it's not the desired functionality.

0 Karma

jpass
Contributor

were you able to achieve this?

0 Karma

maciep
Champion

I haven't looked at trying since around the time i posted this, but no I haven't

0 Karma

niketn
Legend

@maciep can you add some sample events (mock up/anonymize data if needed).. For JSON data start pattern and end pattern should be sufficient without the complete payload info.

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma

maciep
Champion

ok, updated the question with a sample scrubbed event. forgot the host name is in there before the json too.

0 Karma
Get Updates on the Splunk Community!

Customer Experience | Splunk 2024: New Onboarding Resources

In 2023, we were routinely reminded that the digital world is ever-evolving and susceptible to new ...

Celebrate CX Day with Splunk: Take our interactive quiz, join our LinkedIn Live ...

Today and every day, Splunk celebrates the importance of customer experience throughout our product, ...

How to Get Started with Splunk Data Management Pipeline Builders (Edge Processor & ...

If you want to gain full control over your growing data volumes, check out Splunk’s Data Management pipeline ...