Solved: Splunk stats count & group by on key value using a...

hthwal · ‎10-02-2024

How do I generate reports and run stats on key=value from just message field . Ignoring rest of the fields.

{"cluster_id":"cluster", "message":"Excel someType=MY_TYPE totalItems=1 errors=ABC, XYZ status=success","source":"some_data"}

Gone through multiple examples but could not find something concrete that will help me group by on key someType, compute stats on totalItems, list top errors ABC, XYZ

These don't have to be in the same query. I assume top errors grouping would be a separate query.

yuanliu · ‎10-03-2024

Data illustration could have saved everybody a ton of time reading your mind. The solution is the same as I suggested earlier: kv aka extract is your friend. But first, let me correct JSON error in your mock data:

{"cluster_id":"cluster","kubernetes":{"host":"host","labels":{"app":"app","version":"v1"},"namespace_name":"namespace","pod_name":"pod"},"log":{"App":"app_name","Env":"stg","LogType":"Application","contextMap":{},"endOfBatch":false,"level":"INFO","loggerFqcn":"org.apache.logging.log4j.spi.AbstractLogger","loggerName":"com.x.x.x.X","message":"Json path=/path feed=NAME sku=SKU_NAME status=failed errorCount=3 errors=ERROR_1, ERROR_2, MORE_ERROR_3 fields=Field 1, Field 2, More Fields Here"}}

Now this is a compliant JSON. Second, are you saying that your developers are so inconsiderate as to not properly quote key value pairs? Like I said earlier, in this case, you need to deal with them first. The best route is to implore them to improve log hygiene. Failing that, you can deal with them in a limited way using SPL. The following depends on the order of errors and fields.

The field message is actually named log.message in Splunk. (Many other languages flatten JSON this way, too.)

| rename log.message as _raw
| rex mode=sed "s/errors=(.+) fields=(.+)/errors=\"\1\" fields=\"\2\"/"
| kv
| table path	feed	sku	status	errorCount	errors	fields

Output is

path	feed	sku	status	errorCount	errors	fields
/path	NAME	SKU_NAME	failed	3	ERROR_1, ERROR_2, MORE_ERROR_3	Field 1, Field 2, More Fields Here

Here is full emulation of your mock data. Play with it and compare with real data.

| makeresults
| eval _raw ="{\"cluster_id\":\"cluster\",\"kubernetes\":{\"host\":\"host\",\"labels\":{\"app\":\"app\",\"version\":\"v1\"},\"namespace_name\":\"namespace\",\"pod_name\":\"pod\"},\"log\":{\"App\":\"app_name\",\"Env\":\"stg\",\"LogType\":\"Application\",\"contextMap\":{},\"endOfBatch\":false,\"level\":\"INFO\",\"loggerFqcn\":\"org.apache.logging.log4j.spi.AbstractLogger\",\"loggerName\":\"com.x.x.x.X\",\"message\":\"Json path=/path feed=NAME sku=SKU_NAME status=failed errorCount=3 errors=ERROR_1, ERROR_2, MORE_ERROR_3 fields=Field 1, Field 2, More Fields Here\"}}"
| spath
``` data emulation above ```

View solution in original post

hthwal · ‎10-03-2024

This is the query I have figured out from awesome Splunk community

index=my-index "kubernetes.namespace_name"="namus" "cluster_id":"*stage*" "Env":"stg" "loggerName":"com.x.x.x.SomeClass" "My simple query for key=" "log.level"=INFO 
| spath output=x log.message
| rex max_match=0 field=x "(?<key>\w+)=(?<value>\w+)"
| eval z=mvzip(key, value, "~") 
| mvexpand z 
| rex field=z "(?<key>[^~]+)~(?<value>.*)"
| table key value
| eval dummy="" 
| xyseries dummy key value 
| fields - dummy

Which results in this output. I am missing lot of data. Can someone show how to list all the rows found. What is that I am missing here?

PickleRick · ‎10-03-2024

1. We still have no idea what your raw data looks like. For example - how are we supposed to know whether the log.message path is the right one? I suppose it is because you're getting _some_ result but we have no way to know it.

2. Your initial search is very ineffective. Wildcards in the middle of search terms can give strange and inconsistent results and generally wildcards in a place other than the end of a search term slow your search.

3. You're getting some result but you're not showing us anything. How are we supposed to even understand what you're getting?

4. Don't get involved in this "177 events" number. It's just all events that have been matched by your initial search.

5. There are two main techniques of debugging searches - either you start from the start and add commands one by one until the results stop making sense or you start with the whole search and remove commands one by one until the results start making sense.

6. Honestly, I have no idea what you're trying to achieve with this mvzip/mvexpand/regex magic.

hthwal · ‎10-03-2024

Here is the sample log.

{"cluster_id":"cluster","kubernetes":{"host":"host","labels":{"app":"app","version":"v1"},"namespace_name":"namespace","pod_name":"pod},"log":{"App":"app_name","Env":"stg","LogType":"Application","contextMap":{},"endOfBatch":false,"level":"INFO","loggerFqcn":"org.apache.logging.log4j.spi.AbstractLogger","loggerName":"com.x.x.x.X","message":"Json path=/path feed=NAME sku=SKU_NAME status=failed errorCount=3 errors=ERROR_1, ERROR_2, MORE_ERROR_3 fields=Field 1, Field 2, More Fields Here","source":{"class":"com.x.x.x.X","file":"X.java","line":1,"method":"s"},"thread":"http-apr-8080-exec-4","threadId":1377,"threadPriority":5,"timeMillis":1727978156925},"time":"2024-10-03T17:55:56.925335046Z"}

Expected output from field message

path	feed	sku	status	errorCount	errors	fields
/path	Name	SKU_NAME	failed	3	ERROR_1, ERROR_2, MORE_ERROR_3	Field 1,Field 2,More Fields Here

If data within message field is ugly, I am willing to modify. But I assume, it will be treated as raw data and will not be treated as field @PickleRick

---

This seems to work when these regex are removed errors=(?P<errors>[^,]+) fields=(?P<fields>[^,]+) How do I fix errors and fields.

Whereas when tested on https://pythex.org/ it works

index=item-interface "kubernetes.namespace_name"="namespace" "cluster_id":"*stage*" "Env":"stg" "loggerName":"com.x.x.x.X" "Json path=/validate feedType=" "log.level"=INFO
| rename log.message as _raw
| rex field=_raw "Json path=(?P<path>\/\w+) feedType=(?P<feedType>\w+) sku=(?P<sku>\w+) status=(?P<status>\w+) errorCount=(?P<errorCount>\w+) errors=(?P<errors>[^,]+) fields=(?P<fields>[^,]+)"
| table path, feedType, sku, status, errorCount, errors, fields

PickleRick · ‎10-04-2024

OK. We're getting somewhere 😉

Assuming you had a typo and that's indeed a valid json you can extract values from the log.message field.

The issue I still have with your data is that it's "half-pregnant" - it seems to have some structure to it but it's not kept strictly (I have the same problem with CEF for example). You have some header, then some key=value pairs. There are several issues with those key=value pairs.. What if the value contains the equal sign? What if the value contains a space? It seems that comma-space is a multivalued field separator but is it?

hthwal · ‎10-04-2024

I have not seen value with equal sign. I am open for suggestions to pick a better delimiter.
There are some values with space. Would this be a problem? I can certainly improve the structure.
comma-space is a multivalued field. There are two such fields. Of which, I would need to compute high frequency value at a later stage.

I can modify multivalued field [list of data] to something like this
https://www.splunk.com/en_us/blog/tips-and-tricks/delimiter-base-kv-extraction-advanced.html?locale=...

FIELDS= "time", "client-ip", "cs-method", "sc-status"

@PickleRick Let me know the if there are any changes to be make to log.message to make it "full-preg" so that Splunk can deliver 😉

PickleRick · ‎10-04-2024

Well, the more stronger assumptions you can make about the data the easier it gets. If you have to care about exceptions it's getting complicated quickly.

Take a typical example of

key="string value"

or

key=number_value

The latter form is obviously pretty easy to parse

(?<key>\S+)=(?<value>\d+)

The former is way more complicated to do well. If you simply do

(?<key>\S+)="(?<value>.*)"

You'll capture way over the boundary of the k-v pair if you have many of them.

If you go lazy

(?<key>\S+)="(?<value>.*?)"

you'll stop matching in the middle if you have an escaped quote within your value.

(The same happens if instead of matching for .* lazily you match for [^"]* because of course you encounter the same quote).

Ok. So how about we match for everything not being a quote or a quote preceeded by a backslash.

(?<key>\S+)="(?<value>([^"]|\\")*)"

Ok. Nice idea but what if your value ends with a backslash (expressed as double backslash due to escaping)? You'll miss it because it has the \" sequence and go way past the end of the value.

So maybe we should try making a negative lookback so that a backslash preceeding a quote cannot be prepended by another backslash? But how about three, four and so on backslashes ended with a quote? It's getting messier and messier.

If you can either know for sure that your data will never look like those border cases or can conciously decide that you don't care about those cases and can live with the fact that they will be extracted wrongly it's way easier.

In your case that would be, for example a message containing

comment=This is actually an example of a key=value pair. Another_field=Another value

or

groceries=Let's buy bread, strawberries, butter and pork loin.

As you can see without additional assumptions about the data the extractions will be wrong.

hthwal · ‎10-05-2024

@PickleRick Thank you for your response and patience.

yuanliu · ‎10-03-2024

Data illustration could have saved everybody a ton of time reading your mind. The solution is the same as I suggested earlier: kv aka extract is your friend. But first, let me correct JSON error in your mock data:

{"cluster_id":"cluster","kubernetes":{"host":"host","labels":{"app":"app","version":"v1"},"namespace_name":"namespace","pod_name":"pod"},"log":{"App":"app_name","Env":"stg","LogType":"Application","contextMap":{},"endOfBatch":false,"level":"INFO","loggerFqcn":"org.apache.logging.log4j.spi.AbstractLogger","loggerName":"com.x.x.x.X","message":"Json path=/path feed=NAME sku=SKU_NAME status=failed errorCount=3 errors=ERROR_1, ERROR_2, MORE_ERROR_3 fields=Field 1, Field 2, More Fields Here"}}

Now this is a compliant JSON. Second, are you saying that your developers are so inconsiderate as to not properly quote key value pairs? Like I said earlier, in this case, you need to deal with them first. The best route is to implore them to improve log hygiene. Failing that, you can deal with them in a limited way using SPL. The following depends on the order of errors and fields.

The field message is actually named log.message in Splunk. (Many other languages flatten JSON this way, too.)

| rename log.message as _raw
| rex mode=sed "s/errors=(.+) fields=(.+)/errors=\"\1\" fields=\"\2\"/"
| kv
| table path	feed	sku	status	errorCount	errors	fields

Output is

path	feed	sku	status	errorCount	errors	fields
/path	NAME	SKU_NAME	failed	3	ERROR_1, ERROR_2, MORE_ERROR_3	Field 1, Field 2, More Fields Here

Here is full emulation of your mock data. Play with it and compare with real data.

| makeresults
| eval _raw ="{\"cluster_id\":\"cluster\",\"kubernetes\":{\"host\":\"host\",\"labels\":{\"app\":\"app\",\"version\":\"v1\"},\"namespace_name\":\"namespace\",\"pod_name\":\"pod\"},\"log\":{\"App\":\"app_name\",\"Env\":\"stg\",\"LogType\":\"Application\",\"contextMap\":{},\"endOfBatch\":false,\"level\":\"INFO\",\"loggerFqcn\":\"org.apache.logging.log4j.spi.AbstractLogger\",\"loggerName\":\"com.x.x.x.X\",\"message\":\"Json path=/path feed=NAME sku=SKU_NAME status=failed errorCount=3 errors=ERROR_1, ERROR_2, MORE_ERROR_3 fields=Field 1, Field 2, More Fields Here\"}}"
| spath
``` data emulation above ```

hthwal · ‎10-05-2024

Thank you @yuanliu

Had to modify a little to make it work

| rename log.message as _raw
| rex mode=sed "s/errors=(.+) fields=(.+)/errors=\"\1\" fields=\"\2\"/"
| rex field=_raw "path=(?P<path>.*) feedType=(?P<feedType>.*) sku=(?P<sku>.*) status=(?P<status>.*) errorCount=(?P<errorCount>.*) errors=(?P<errors>.*) fields=(?P<fields>.*)"
| table path, feedType, sku, status, errorCount, errors, fields

PickleRick · ‎10-03-2024

Splunk will do aggregations on the fields you tell it to as long as you have those fields extracted. Until then, they are not fields, they are just some parts of the raw data. You must define proper ways to extract the fields you want to either aggregate or split your aggregations on.

One way is what @yuanliu has already shown. Another way is to define extractions at sourcetype level.

Anyway, your data seems a bit "ugly" - it seems to be a json structure with a string field containing some partly-structured data. It would be much better if the data was actually provided in a consistent format so that you don't have to stand on your head in a bucket full of piranhas to get the values you need.

yuanliu · ‎10-02-2024

You need to better explain the desired results by illustrating them in table or elaborate on what "compute stats on totalItems" will do. Do not force volunteers to read your mind.

If I must try mind reading, I speculate that you want a sum of totalItems. This can be achieved with

| rename message as _raw
| kv
| stats sum(totalItems) as totalItems by someType

Here is an expansion of your mock data to make total meaningful

_raw

{"cluster_id":"cluster","message":"Excel someType=MY_TYPE totalItems=1 errors=\"ABC, XYZ\" status=success","source":"some_data"}

{"cluster_id":"cluster","message":"Excel someType=YOUR_TYPE totalItems=2 errors=\"ABC, XYZ\" status=failure","source":"some_data"}

{"cluster_id":"cluster","message":"Excel someType=MY_TYPE totalItems=3 errors=\"AAA, XYZ\" status=questionable","source":"some_data"}

Running the above search gives

someType	totalItems
MY_TYPE	4
YOUR_TYPE	2

Is this table about right?

You are correct to call out total errors needing to be a separate search. (I mean, you can have them combined if you want to group top error by someType, too.) To do this, however, I have to assume that your developers are nice to you and placed quotes around values of errors. (See the difference between my mock data and yours.)

| rename message as _raw
| kv
| stats count by errors
| sort count
| tail 1
| rename errors as topErrors

The same expanded, "nicified" mock data would give

topErrors	count
ABC, XYZ	2

This is the emulation to produce the mock data that you can play with and compare with real data

| makeresults format=json data="[
{\"cluster_id\":\"cluster\", \"message\":\"Excel someType=MY_TYPE totalItems=1 errors=\\\"ABC, XYZ\\\" status=success\",\"source\":\"some_data\"},
{\"cluster_id\":\"cluster\", \"message\":\"Excel someType=YOUR_TYPE totalItems=2 errors=\\\"ABC, XYZ\\\" status=failure\",\"source\":\"some_data\"},
{\"cluster_id\":\"cluster\", \"message\":\"Excel someType=MY_TYPE totalItems=3 errors=\\\"AAA, XYZ\\\" status=questionable\",\"source\":\"some_data\"}
]"
``` data emulation above ```

Splunk stats count & group by on key value using a single field

count

fields

stats

New This Month in Splunk Observability Cloud - Metrics Usage Analytics, Enhanced K8s ...

Alerting Best Practices: How to Create Good Detectors

Discover Powerful New Features in Splunk Cloud Platform: Enhanced Analytics, ...