Splunk Search

Superset of data fragments / Compress diagonal data with extra obstacles

rikinet
Path Finder

I have three tables. Each has one or more ID fields (out of ID_A, ID_B, ID_C) and assigns values Xn, Yn, Zn to these IDs. In effect, the tables each contain a fragment of information from a set of objects 1...5.

Table X:

ID_AID_BX1X2
A1B1X1_1X2_1
A2B2X1_2aX2_2
A2B2X1_2bX2_2
A3B3X1_3

X2_3


Table Y:

ID_AID_BY1Y2
A2B2Y1_2 
A2B2 Y2_2
A3B3 Y2_3a
A3B3 Y2_3b
A4B4Y1_4Y2_4

 

Table Z:

ID_BID_CZ1
B1C1Z1_1
B3C3Z1_3
B5C5Z1_5


How can I create the superset of all three tables, i.e. reconstruct the "full picture" about obects 1..5 as good as possible?

I tried with union and join in various ways, but I keep tripping over the following obstacles:

  1. The 1:n relation between ID and values (which should remain expanded as individual rows)
  2. Empty fields in between (bad for stats list(...) or stats values(...) because of different-sized MV results)
  3. There is no single table that has references to all objects (e.g. object 5 only present in table Z).

 

Desired result:

ID_AID_BID_CX1X2Y1Y2Z1
A1B1C1X1_1X2_1  Z1_1
A2B2 X1_2aX2_2Y1_2Y2_2 
A2B2 X1_2bX2_2Y1_2Y2_2 
A3B3 X1_3X2_3 Y2_3aZ1_3
A3B3 X1_3X2_3 Y2_3bZ1_3
A4B4   Y1_4Y2_4 
 B5C5    Z1_5



Sample data:

 

| makeresults
| eval _raw="ID_A;ID_B;X1;X2
A1;B1;X1_1;X2_1
A2;B2;X1_2A;X2_2
A2;B2;X1_2B;X2_2
A3;B3;X1_3;X2_3
"
| multikv forceheader=1
| table ID_A, ID_B, X1, X2

| append [
| makeresults
| eval _raw="ID_A;ID_B;Y1;Y2
A2;B2;Y1_2;
A2;B2;;Y2_2
A3;B3;Y1_3;Y2_3A
A3;B3;Y1_3;Y2_3B
A4;B4;Y1_4;Y2_4
"
| multikv forceheader=1
| table ID_A, ID_B, Y1, Y2
]

| append [
| makeresults
| eval _raw="ID_B;ID_C;Z1
B1;C1;Z1_1
B3;C3;Z1_3
B5;C5;Z1_5
"
| multikv forceheader=1
| table ID_B, ID_C, Z1
]

| table ID_A, ID_B, ID_C, X1, X2, Y1, Y2, Z1

 

 

Labels (4)
Tags (2)
0 Karma

rikinet
Path Finder

Small correction: In the result table, rows 4 and 5, ID_C should be "C3" (from table Z).

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

First of all, thanks for creating the dummy data, it was very useful, although slightly wrong.

I corrected the data to remove Y1_3 which doesn't exist in your tables nor your desired results.

Try something like this

| makeresults
| eval _raw="ID_A;ID_B;X1;X2
A1;B1;X1_1;X2_1
A2;B2;X1_2A;X2_2
A2;B2;X1_2B;X2_2
A3;B3;X1_3;X2_3
"
| multikv forceheader=1
| table ID_A, ID_B, X1, X2

| append [
| makeresults
| eval _raw="ID_A;ID_B;Y1;Y2
A2;B2;Y1_2;
A2;B2;;Y2_2
A3;B3;;Y2_3A
A3;B3;;Y2_3B
A4;B4;Y1_4;Y2_4
"
| multikv forceheader=1
| table ID_A, ID_B, Y1, Y2
]

| append [
| makeresults
| eval _raw="ID_B;ID_C;Z1
B1;C1;Z1_1
B3;C3;Z1_3
B5;C5;Z1_5
"
| multikv forceheader=1
| table ID_B, ID_C, Z1
]

| table ID_A, ID_B, ID_C, X1, X2, Y1, Y2, Z1
| eventstats values(ID_C) as ID_C values(Z1) as Z1 by ID_B
| eventstats values(X1) as X1 values(X2) as X2 values(Y1) as Y1 values(Y2) as Y2 by ID_A ID_B
| eventstats values(X1) as X1 values(X2) as X2 values(Y1) as Y1 values(Y2) as Y2 values(ID_A) as ID_A by ID_C ID_B
| mvexpand Y2
| mvexpand X1
| fillnull value="N/A"
| streamstats count by ID_A ID_B ID_C X1 X2 Y1 Y2 Z1
| where count==1
| foreach *
    [| eval <<FIELD>>=if(<<FIELD>>=="N/A",null(),<<FIELD>>)]
| fields - count

Note that mvexpand can cause memory issues, so you need to check your job status to ensure the search works correctly with real data.

rikinet
Path Finder

@ITWhisperer thanks for your reply. You have definitely put me on the right path.

My original data has more ID and attribute fields, so I needed to find a way to generalize your solution further. I had a bit of a struggle to understand why you chose these particular three eventstats lines with their respective values(...) and by ... clauses.

I believe a more generic recipe would be like this:

If you have n ID fields and k attribute fields, do one eventstats ... by ID for each of the n ID fields:

| eventstats
  values(ID_2) as ID_2, .., values(ID_n) as ID_n,
  values(attr_1) as attr_1, .., values(attr_k) as attr_k
  by ID_1
| eventstats
  values(ID_1) as ID_1, values(ID_3) as ID_3, .., values(ID_n) as ID_n,
  values(attr_1) as attr_1, .., values(attr_k) as attr_k
  by ID_2
| eventstats .. by ID_3
| eventstats .. by ID_n-1
| eventstats
  values(ID_1) as ID_1, .., values(ID_n-1) as ID_n-1,
  values(attr_1) as attr_1, .., values(attr_k) as attr_k
  by ID_n

 

The subsequent mvexpand and streamtstats count ... where count==1 can be simplified as:

| fillnull value="N/A" ```or stats by fieldlist ignores rows with at least one null value in fieldlist```
| stats count by ID_1, ..., ID_n, attr_1, ..., attr_k | fields - count
| foreach * [ | eval <<FIELD>>=if(<<FIELD>>=="N/A",null(),<<FIELD>>) ] ```undo the N/A replacement```


This stanza even expands possible multivalues in all of the n attributes - otherwise each attribute field with potential multivalues would need its own explicit mvexpand <attr> . It also avoids mvexpand's potential memory issue.

 

For my own sample data, it would read:

| makeresults
| eval _raw="ID_A;ID_B;X1;X2
A1;B1;X1_1;X2_1
A2;B2;X1_2A;X2_2
A2;B2;X1_2B;X2_2
A3;B3;X1_3;X2_3
"
| multikv forceheader=1
| table ID_A, ID_B, X1, X2

| append [
| makeresults
| eval _raw="ID_A;ID_B;Y1;Y2
A2;B2;Y1_2;
A2;B2;;Y2_2
A3;B3;;Y2_3A
A3;B3;;Y2_3B
A4;B4;Y1_4;Y2_4
"
| multikv forceheader=1
| table ID_A, ID_B, Y1, Y2
]

| append [
| makeresults
| eval _raw="ID_B;ID_C;Z1
B1;C1;Z1_1
B3;C3;Z1_3
B5;C5;Z1_5
"
| multikv forceheader=1
| table ID_B, ID_C, Z1
]
| table ID_A, ID_B, ID_C, X1, X2, Y1, Y2, Z1

```--- relevant code starts here ---```

| eventstats
  values(ID_B) as ID_B, values(ID_C) as ID_C,
  values(X1) as X1, values(X2) as X2, values(Y1) as Y1, values(Y2) as Y2, values(Z1) as Z1
  by ID_A
| eventstats
  values(ID_A) as ID_A, values(ID_C) as ID_C,
  values(X1) as X1, values(X2) as X2, values(Y1) as Y1, values(Y2) as Y2, values(Z1) as Z1
  by ID_B
| eventstats
  values(ID_A) as ID_A, values(ID_B) as ID_B,
  values(X1) as X1, values(X2) as X2, values(Y1) as Y1, values(Y2) as Y2, values(Z1) as Z1
  by ID_C

| fillnull value="N/A" ```or stats by fieldlist ignores rows with at least one null value in fieldlist```
| stats count by ID_A, ID_B, ID_C, X1, X2, Y1, Y2, Z1 | fields - count
| foreach * [ | eval <<FIELD>>=if(<<FIELD>>=="N/A",null(),<<FIELD>>) ] ```undo the N/A replacement```

 
Please let me know if I have overlooked something.

0 Karma
Get Updates on the Splunk Community!

Splunk Decoded: Service Maps vs Service Analyzer Tree View vs Flow Maps

It’s Monday morning, and your phone is buzzing with alert escalations – your customer-facing portal is running ...

What’s New in Splunk Observability – September 2025

What's NewWe are excited to announce the latest enhancements to Splunk Observability, designed to help ITOps ...

Fun with Regular Expression - multiples of nine

Fun with Regular Expression - multiples of nineThis challenge was first posted on Slack #regex channel ...