Splunk Search

How to use spath with string formatted events?

dbarba
Explorer

Hello!

As the subject of the question says, I'm trying to create SPL queries for several visualizations but it has become very tedious since spath does not work with the outputted events, as they come in a string format, making it very hard to work with more complex operations 

The event contents are in a valid json format (checked using jsonformatter)

Here's the event output:{"time":"time_here","kubernetes":{"host":"host_name_here","pod_name":"pod_name_here","namespace_name":"namespace_name_here","labels":{"app":"app_label"}},"log":{"jobId":"job_id_here","dc":"dc_here","stdout":"{ \"Componente\" :  \"componente_here\", \"channel\" :  \"channel_here\", \"timestamp\" :  \"timestamp_here\", \"Code\" :  \"code_here\", \"logId\" :  \"logid_here\", \"service\" :  \"service_here\", \"responseMessage\" :  \"responseMessage_here\", \"flow\" :  \"flow_here\", \"log\" :  \"log_here\"}","level":"info","host":"host_worker_here","flow":"flow_here","projectName":"project_name_here","caller":"caller_here"},"cluster_id":"cluster_id_here"}

Labels (1)
0 Karma
1 Solution

yuanliu
SplunkTrust
SplunkTrust

It seem that Splunk already gives you fields like cluter_id, log.projectName, and log.stdout.  log.stdout is embedded JSON.  Not sure why you say "spath does not work with outputted events."  It certainly does.  As @richgalloway demonstrated, you just need to use spath's input parameter.

 

| spath input=log.stdout

 

Your mock event gives you these extra fields

CodeComponentechannelflowloglogIdresponseMessageservicetimestamp
code_herecomponente_herechannel_hereflow_herelog_herelogid_hereresponseMessage_hereservice_heretimestamp_here

Play with the emulation @richgalloway gives and compare with your real data.

View solution in original post

yuanliu
SplunkTrust
SplunkTrust

It seem that Splunk already gives you fields like cluter_id, log.projectName, and log.stdout.  log.stdout is embedded JSON.  Not sure why you say "spath does not work with outputted events."  It certainly does.  As @richgalloway demonstrated, you just need to use spath's input parameter.

 

| spath input=log.stdout

 

Your mock event gives you these extra fields

CodeComponentechannelflowloglogIdresponseMessageservicetimestamp
code_herecomponente_herechannel_hereflow_herelog_herelogid_hereresponseMessage_hereservice_heretimestamp_here

Play with the emulation @richgalloway gives and compare with your real data.

dbarba
Explorer

I'm sorry I didn't see your reply sooner, thank you so much! You're a hero!!

0 Karma

richgalloway
SplunkTrust
SplunkTrust

Please explain what you mean by "spath does not work".  It works for me in this run-anywhere example (escape characters added to satisfy the SPL parser).  What is your query?  What results do you expect and what do you get?

 

 

| makeresults | eval data="{\"time\":\"time_here\",\"kubernetes\":{\"host\":\"host_name_here\",\"pod_name\":\"pod_name_here\",\"namespace_name\":\"namespace_name_here\",\"labels\":{\"app\":\"app_label\"}},\"log\":{\"jobId\":\"job_id_here\",\"dc\":\"dc_here\",\"stdout\":\"{ \\\"Componente\\\" :  \\\"componente_here\\\", \\\"channel\\\" :  \\\"channel_here\\\", \\\"timestamp\\\" :  \\\"timestamp_here\\\", \\\"Code\\\" :  \\\"code_here\\\", \\\"logId\\\" :  \\\"logid_here\\\", \\\"service\\\" :  \\\"service_here\\\", \\\"responseMessage\\\" :  \\\"responseMessage_here\\\", \\\"flow\\\" :  \\\"flow_here\\\", \\\"log\\\" :  \\\"log_here\\\"}\",\"level\":\"info\",\"host\":\"host_worker_here\",\"flow\":\"flow_here\",\"projectName\":\"project_name_here\",\"caller\":\"caller_here\"},\"cluster_id\":\"cluster_id_here\"}"
| spath input=data
| transpose

 

And the results

richgalloway_0-1701630877774.png

 

---
If this reply helps you, Karma would be appreciated.

dbarba
Explorer

Hello!! Thank you for your response! And I'm sorry I explained myself so poorly!

spath does not work: What I meant with this was, having the previous event string as an example, I am unable to use SPL queries such as

index="my_index" logid="log_id_here" service="service_here" responseMessage="response_message_here"

instead I gotta use

index="my_index" "log_id_here" "service_here" "response_message_here" or index="my_index" "log_id_here" service logid responseMessage

This is because no data is found when using "variables" such as 

responseMessage="response_message_here"

Instead I must search for specific string fragments within the event outputs... This is because the output is formatted as string instead of json making the SPL query creation a real pain.

 

What is your query: One example would be to individually get each responseMessage as such: 

index="my_index" "log_id_here" logid service responseMessage \\\"responseMessage\\\" :  \\\"null\\\" Instead of the normal way which would be index="my_index" logid="log_id_here" service responseMessage | stats count by responseMessage | dedup responseMessage

 

What results do I expect: Currently I'm trying to get unique services and order them desc based on the error count for each (which is based on the responseMessage)

What results do I get: Currently I'm able to get the count of each service by using string literals such as \\\"service\\\" :  \\\"desk\\\" , other than that I'm stuck here. (I'm guessing this could  be done with something like 

index="my_index" "logid" | stats count by service, responseMessage | eval isError=if(responseMessage!="success",1 ,0) | stats sum(isError) as errorCount by service



I apologize in advance in case I've missed once again important details or if i've given wrong queries, I haven't been able to try them out as the documentation shows :C thank you very much for your time!!

0 Karma

richgalloway
SplunkTrust
SplunkTrust

I find it interesting that you claim the spath command does not work yet none of your searches use spath.  The command won't work if it isn't invoked.  See my example above.

Once the spath command has extracted the fields, then you can reference those fields in other commands.

---
If this reply helps you, Karma would be appreciated.

dbarba
Explorer

I see,  should I copy and paste the event data into the search bar to do as the example you provided?

Edit: I used:

index="my_index" "log_id_here" logid responseMessage | spath input=data | transpose

Strangely most if not all vital data was stored inside _raw as a single str

0 Karma

PickleRick
SplunkTrust
SplunkTrust

OK. I think I see where it is going.

You have your data as JSON structure and want to search it calling the fields by names in the base search and it doesn't work. But it will parse your fields if you search for your events another way (for example just by searching for the content, regardless of where in the event it is) and then pushing it through the spath command.

Am I right?

In other words - your events are not automatically interpreted as JSON structures.

There are three separate levels on which Splunk can handle JSON data.

1. On ingest - it can treat the JSON with INDEXED_EXTRACTIONS and parse your data into indexed fields. You generally don't want that as indexed fields are not really what Splunk is typically about.

2. Manual invocation of spath command - that can be useful if you have your json data as only a part of your whole event (for example - json structure forwarded as a syslog message and prepended with a syslog header; in such case you'd want to cut extract the part after syslog header and manually call the spath command to extract fields from that part).

3. Automatic search-time extraction - it's triggered by proper configuration of your sourcetype. By default, unless explicitly disabled by setting AUTO_KV_JSON to false, Splunk will extract your json fields when (and only then) the whole _raw event is a well-formed json structure. JSON extraction can be also (still, only when the whole event is a well-formed json) explicitly triggered by properly configuring KV_MODE in your sourcetype.

Mind you that netiher 1st nor the 3rd option will extract data if you have - for example - a JSON structure as a string field within another json structure - in such case you have to manually use spath to extract the json data from such string.

So - as you can see - json is a bit tricky to work with.

PS: There is an open idea about extracting only part of the event as json structure - feel free to support that 😉 https://ideas.splunk.com/ideas/EID-I-208

dbarba
Explorer

Hello!!

THanks for your answer! You are indeed correct! The event has some level that is treated as a Json, but nested in the "log" variable, the "stdout" variable has another dictionary within it that is being treated as a string, making it difficult to be worked with SPL.

I did my research and it seems this might be an issue with the way the data is being parsed before arriving to splunk, before checking that I guess I'm stuck with searching for string literals 💔

Thank you for your time and help!!

0 Karma

PickleRick
SplunkTrust
SplunkTrust

So you need to do

<your search>
| spath input=stdout

This way you'll parse the contents of the stdout field.

yuanliu
SplunkTrust
SplunkTrust

The field name is log.stdout.

| spath input=log.stdout

See my earlier comment https://community.splunk.com/t5/Splunk-Search/How-to-use-spath-with-string-formatted-events/m-p/6707... 

richgalloway
SplunkTrust
SplunkTrust

I added data to the SPL because I don't have your data indexed in my Splunk.  Since you have the data indexed, you can skip that part of my example query.  You may need to change the spath command argument to match your events.

---
If this reply helps you, Karma would be appreciated.

dbarba
Explorer

I see, I tried with different variables but _raw seems to hold all vital data in all cases, mabe I'm not doing something right, perhaps the part that is not in json format is the output inside the "stdout" variable.

 

EDIT: Here's the event in log format

 

{ [-]
cluster_id: cluster_id
kubernetes: { [+]
}
log: { [-]
caller: caller_here
dc: dc_here
flow: flow_here
host: gatling_worker_here
jobId: jobid_here
level: info
projectName: project_name_here
stdout: { "Componente" : "componente_here", "channel" : "channel_here", "timestamp" : "timestamp_here", "Code" : "code_here", "logId" : "logid_here", "service" : "service_here", "responseMessage" : "responsemessage_here", "flow" : "flow_here", "log" : "log_here"}
}
time: time_here
}

 

stdout is the issue it seems

0 Karma

richgalloway
SplunkTrust
SplunkTrust

The _raw field is where Splunk stores the raw event.  Many commands default to that field and a few work only on that field.  The spath command defaults to _raw, but you can use spath input=_raw, if you wish.

The example event looks fine to me and passes checks at jsonlint.com.

---
If this reply helps you, Karma would be appreciated.

dbarba
Explorer

I see...

Well it seems like spath (and spl functionality in general) is working fine with the events, except for the contents in stdout... I spoke with an acquaintance and it looks like it's most likely due to the way the data is parsed before arriving to splunk.

I can't thank you enough for your time and effort helping me!! It looks like this has to be checked outside of splunk tho, I'll close the ticket and come back with updates if I'm able to find a solution.

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Look at my explanation above - your stdout field is not a json structure - it's a string containing a json structure so it cannot be automatically parsed as json structure. You have to take the stdout field and manually use stdout on this field to parse out the fields from it.

dbarba
Explorer

Excellent! Is there a way of doing this directly with SPL?

0 Karma
Get Updates on the Splunk Community!

Now Available: Cisco Talos Threat Intelligence Integrations for Splunk Security Cloud ...

At .conf24, we shared that we were in the process of integrating Cisco Talos threat intelligence into Splunk ...

Preparing your Splunk Environment for OpenSSL3

The Splunk platform will transition to OpenSSL version 3 in a future release. Actions are required to prepare ...

Easily Improve Agent Saturation with the Splunk Add-on for OpenTelemetry Collector

Agent Saturation What and Whys In application performance monitoring, saturation is defined as the total load ...