Splunk Search

Docker Sources

MrGlass
Explorer

Having some issues when looking at docker hec logs. The data is showing two sources at the same time, but does not filter on stderr or stdout when using source=stderr.

  • { [-]
    line: clusterrolebinding.rbac.authorization.k8s.io/ucp-kube-system:calico-node:crds unchanged
    source: stdout
    tag: $ucp-kubectl - 9382ee9db872
    }
    Show as raw text
    host = omit    source = http:syslog source = stdout  sourcetype = dnrc:docker
Labels (1)
0 Karma

MrGlass
Explorer

I did get results using spath, not sure if that is the best way but does seem to remove all other sources from the source field.

index=dnrc_docker sourcetype=dnrc:docker | spath source | search source="stderr"

0 Karma

PickleRick
SplunkTrust
SplunkTrust

If it works, it works. 🙂

It's worth noting though that it's a rather "heavy" way of doing it. Spath is a fairly intensive command and you're doing it over all your events. I suppose for a one-off ad-hoc search it might be OK but if you do it often, you might want to optimize it a bit.

0 Karma

livehybrid
Super Champion

Hi @MrGlass 

You are seeing source twice because it is an internal field as well as being specified inside your event. This can cause problems when searching for it because it has two values.

You might find that adding a TERM statement is enough to filter this down in order to retain performence, rather than having to search all your data and then filter by source once all the events are loaded:

index=YourIndex sourcetype=dnrc:docker TERM(stderr)

🌟 Did this answer help you? If so, please consider:

  • Adding karma to show it was useful
  • Marking it as the solution if it resolved your issue
  • Commenting if you need any clarification

Your feedback encourages the volunteers in this community to continue contributing

PickleRick
SplunkTrust
SplunkTrust

@livehybrid Let me interject here 😉

If it was just because it resulted in a multivalued field, you could happily search for just one of those values. But in case of this particular field (as well as other indexed fields which are not (supposed to be) present in the raw data) it's a bit different.

When you do

index=something source=aaa

and check the job log you'll get

07-01-2025 10:47:22.082 INFO UnifiedSearch [3225984 searchOrchestrator] - Expanded index search = (index=something source=aaa)
07-01-2025 10:47:22.082 INFO UnifiedSearch [3225984 searchOrchestrator] - base lispy: [ AND index::something source::aaa ]

This means that Splunk will only look for those events which have metadata fields of index and source with given values. In case of index it's not really a field but in case of source, it's gonna be a search only for indexed terms in form of source::something. Splunk will not try to bother with parsing anything out of the event itself.

It's in the later part of the processing pipeline that the field might get parsed out and then be used for further manipulation.

The problem with the obvious approach

index=something | search source=something_else

is that Splunk's optimizer will turn this seemingly superfluous search command back into

index=something source=something_else

which will end up with what I've already shown.

That's why I used the where command - it works differently and won't get optimized out.

Of course narrowing the search only to the events containing the value of "stderr" will speed the search (but won't be very effecitve if the "stderr" term appears in other terms of the event; tough luck). I'm not quite sure though if TERM() makes any difference here. I'm pretty sure just searching for "stderr" itself would suffice and it doesn't make the resulting SPL look too cryptic 😉

livehybrid
Super Champion

This is true, and I guess there is also a chance that the term "stderr" could exist in the log for a source=stdout log...! 

I tend to use TERM because I find its sometimes the easiest way to improve search performance** and not enough people know of its existence. (only 1% of Splunk Cloud customers in 2020 according to Rich Morgan 

PickleRick
SplunkTrust
SplunkTrust

Source is one of the default metadata fields which are supposed to be indexed along the event, not included in the event. Therefore the initial search does not look for the fields parsed out from the event itself when looking for fields like source or sourcetype.

As a walkaround I'd try to instead of

<rest of your search> source=stderr

do

<rest of your search> stderr
| where source="stderr"
0 Karma

MrGlass
Explorer

Using the where command did not result in any results, not sure why.

 

0 Karma

PickleRick
SplunkTrust
SplunkTrust

That is intriguing because I was pretty sure it would work. I tried to recreate your case locally with makeresults | collect and it indeed doesn't find it with where. I'll keep digging.

0 Karma

PickleRick
SplunkTrust
SplunkTrust

OK. It seems that even with "where" Splunk optimizes this search and it turns into

index=whatever source=CASE("stderr") "stderr"

Which obviously again searches for the source as indexed field only. (same goes 

You can make it work if you disable optimizations

index=whatever stderr
| noop search_optimization=false
| where source="stderr"

 

MrGlass
Explorer

This did work, but had to remove the s on optimizations and presto. Thank you.

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Could be. I didn't copy-paste it but written here by hand so there might have been a typo.

0 Karma
Get Updates on the Splunk Community!

Why You Can't Miss .conf25: Unleashing the Power of Agentic AI with Splunk & Cisco

The Defining Technology Movement of Our Lifetime The advent of agentic AI is arguably the defining technology ...

Deep Dive into Federated Analytics: Unlocking the Full Power of Your Security Data

In today’s complex digital landscape, security teams face increasing pressure to protect sprawling data across ...

Your summer travels continue with new course releases

Summer in the Northern hemisphere is in full swing, and is often a time to travel and explore. If your summer ...