Splunk Search

How to join a header of a JSON file to the rest of the rows by one common field?

jalau9
Explorer

I have this JSON file which has a first (header) row that is different from the body. Number of fields in header row is less than in body. However, two fields are common in all rows. Example:

{"project":"1.0","build":"20","milestone":"","platform":"","id":"A1"}
{"build":"20","area":"Area1","status":"Pass","owner":"Engr1","manager":"Mgr1","id":"A1"}

I tried to make 'project' from header to append to each of subsequent rows, then do 'stats count(status) by project, build, area'. Since build is common, I thought from other threads' advice to simply use it as the filter i.e. index=main build | stats count...

  1. index=main* build | stats values(build) as one by project -> this can combine project with build, as it's the header row anyway
  2. index=main* build | stats count(eval(status="FAIL") as "status" by build, area -> this is fine as it's all in the body anyway
  3. index=main* build | stats count(eval(status="FAIL") as "status" by build, area, project -> this gives me zero results

If instead I do a join, index=main* build | join id [search index=main* domain | dedup id] | stats count(eval(status="FAIL") as "status" by build, area, project -> this gets me correct data for the min(build) only. I have three distinct values of build but only the smallest is returned. How do I do a correct join?

Tags (4)
0 Karma
1 Solution

jalau9
Explorer

After much hair-tearing, reading Splunk blogs, documentation on stats, streamstats, etc, I found this query to work: index=main* | join build [search index=main* platform] | where status="FAIL" | stats count(status) by build, project, area
I find it so little different from my 4th approach, how did it not work previously?

View solution in original post

jalau9
Explorer

After much hair-tearing, reading Splunk blogs, documentation on stats, streamstats, etc, I found this query to work: index=main* | join build [search index=main* platform] | where status="FAIL" | stats count(status) by build, project, area
I find it so little different from my 4th approach, how did it not work previously?

lguinn2
Legend

You didn't use dedup is the biggest thing.

0 Karma

somesoni2
Revered Legend

If you data contains just two rows for each build and id combination (common fields), one header and one body, like your example here, try this

index=main* build | stats values(*) as * by id,build | stats count(eval(mvcount(mvfilter(match(status,"FAIL"))))) as "status" by build, area, project
0 Karma

jalau9
Explorer

My data is one header and one body (body has multiple rows), and id, build occurs in all rows, both header and body. When I tried this, every field of area was returned with "status"=1, even though their (original) status was not "FAIL". Why did that happen?

0 Karma

lguinn2
Legend

Because this stats command is wrong

0 Karma

fdi01
Motivator

move dedup command because you join with Id and in you subsearch you dedup eliminate other one of the events with same id.
try this:

index=main* build | join id [search index=lint* domain ] | stats c(eval(status="FAIL") as "status" by build, area, project

0 Karma

jalau9
Explorer

This again produced an answer where only my min(build) results were displayed. But it would work if instead I had done: index=main* | join build_num [search index=main* platform] | stats c(eval(status="FAIL") as "status" by build, area, project

0 Karma

lguinn2
Legend

Your naming in the join is not consistent with what you did before. However, I think your problems with the join are

(1) deduping the id eliminates all but one of the events.

(2) by default, the events from the main pipeline can join with only one event from the subsearch. You may need to use the the max option of the join command.

But I would do it differently. The following should work IF the header row always appears prior to the body rows

NOTE: Edited to fix a problem with the original streamstats I wrote

index=main* build 
| streamstats last(project) as lastproject current=f window=1 by build
| eval project = if(notnull(project),project,lastproject)
| where status=="FAIL"
| stats count as status by build, area, project

jalau9
Explorer

Yes, you're right, I edited my question for consistency. When I tried this, I could not get a statistics of stats count, but I could see the events filtered to those where status=="FAIL". However in these events, the project is blank as it wasn't appended. How can I append project to these events?

0 Karma

lguinn2
Legend

I didn't do the streamstats command properly - I edited my answer above and hope this will fix it!
Let me know if it does not work!

0 Karma

jalau9
Explorer

I edited 'notnull' to 'isnotnull'. It's the same. I can see the events up to

index=main* build 
 | streamstats last(project) as lastproject current=f window=1 by build
 | eval project = if(isnotnull(project),project,lastproject)
 | where status=="FAIL"

but when I view the search results at this query level, project field is empty. When I append | stats count as status by build, area, project to the end, no results come out unless I remove project.

Another thing I don't understand about this query is that when I do

index=main* build 
     | streamstats last(project) as lastproject current=f window=1 by build | table lastproject

, I get no results. Why is that?

0 Karma

lguinn2
Legend

Crud, something is still not working to propagate the project field across the events. The reason that you get nothing with stats command is that the project does not exist in all events. lastproject should be there, but since it isn't, then something is still probably wrong with the streamstats...

0 Karma
Get Updates on the Splunk Community!

Introducing the Splunk Community Dashboard Challenge!

Welcome to Splunk Community Dashboard Challenge! This is your chance to showcase your skills in creating ...

Built-in Service Level Objectives Management to Bridge the Gap Between Service & ...

Wednesday, May 29, 2024  |  11AM PST / 2PM ESTRegister now and join us to learn more about how you can ...

Get Your Exclusive Splunk Certified Cybersecurity Defense Engineer Certification at ...

We’re excited to announce a new Splunk certification exam being released at .conf24! If you’re headed to Vegas ...