Splunk Search

Superset of data fragments / Compress diagonal data with extra obstacles

rikinet
Path Finder

I have three tables. Each has one or more ID fields (out of ID_A, ID_B, ID_C) and assigns values Xn, Yn, Zn to these IDs. In effect, the tables each contain a fragment of information from a set of objects 1...5.

Table X:

ID_AID_BX1X2
A1B1X1_1X2_1
A2B2X1_2aX2_2
A2B2X1_2bX2_2
A3B3X1_3

X2_3


Table Y:

ID_AID_BY1Y2
A2B2Y1_2 
A2B2 Y2_2
A3B3 Y2_3a
A3B3 Y2_3b
A4B4Y1_4Y2_4

 

Table Z:

ID_BID_CZ1
B1C1Z1_1
B3C3Z1_3
B5C5Z1_5


How can I create the superset of all three tables, i.e. reconstruct the "full picture" about obects 1..5 as good as possible?

I tried with union and join in various ways, but I keep tripping over the following obstacles:

  1. The 1:n relation between ID and values (which should remain expanded as individual rows)
  2. Empty fields in between (bad for stats list(...) or stats values(...) because of different-sized MV results)
  3. There is no single table that has references to all objects (e.g. object 5 only present in table Z).

 

Desired result:

ID_AID_BID_CX1X2Y1Y2Z1
A1B1C1X1_1X2_1  Z1_1
A2B2 X1_2aX2_2Y1_2Y2_2 
A2B2 X1_2bX2_2Y1_2Y2_2 
A3B3 X1_3X2_3 Y2_3aZ1_3
A3B3 X1_3X2_3 Y2_3bZ1_3
A4B4   Y1_4Y2_4 
 B5C5    Z1_5



Sample data:

 

| makeresults
| eval _raw="ID_A;ID_B;X1;X2
A1;B1;X1_1;X2_1
A2;B2;X1_2A;X2_2
A2;B2;X1_2B;X2_2
A3;B3;X1_3;X2_3
"
| multikv forceheader=1
| table ID_A, ID_B, X1, X2

| append [
| makeresults
| eval _raw="ID_A;ID_B;Y1;Y2
A2;B2;Y1_2;
A2;B2;;Y2_2
A3;B3;Y1_3;Y2_3A
A3;B3;Y1_3;Y2_3B
A4;B4;Y1_4;Y2_4
"
| multikv forceheader=1
| table ID_A, ID_B, Y1, Y2
]

| append [
| makeresults
| eval _raw="ID_B;ID_C;Z1
B1;C1;Z1_1
B3;C3;Z1_3
B5;C5;Z1_5
"
| multikv forceheader=1
| table ID_B, ID_C, Z1
]

| table ID_A, ID_B, ID_C, X1, X2, Y1, Y2, Z1

 

 

Labels (4)
Tags (2)
0 Karma

rikinet
Path Finder

Small correction: In the result table, rows 4 and 5, ID_C should be "C3" (from table Z).

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

First of all, thanks for creating the dummy data, it was very useful, although slightly wrong.

I corrected the data to remove Y1_3 which doesn't exist in your tables nor your desired results.

Try something like this

| makeresults
| eval _raw="ID_A;ID_B;X1;X2
A1;B1;X1_1;X2_1
A2;B2;X1_2A;X2_2
A2;B2;X1_2B;X2_2
A3;B3;X1_3;X2_3
"
| multikv forceheader=1
| table ID_A, ID_B, X1, X2

| append [
| makeresults
| eval _raw="ID_A;ID_B;Y1;Y2
A2;B2;Y1_2;
A2;B2;;Y2_2
A3;B3;;Y2_3A
A3;B3;;Y2_3B
A4;B4;Y1_4;Y2_4
"
| multikv forceheader=1
| table ID_A, ID_B, Y1, Y2
]

| append [
| makeresults
| eval _raw="ID_B;ID_C;Z1
B1;C1;Z1_1
B3;C3;Z1_3
B5;C5;Z1_5
"
| multikv forceheader=1
| table ID_B, ID_C, Z1
]

| table ID_A, ID_B, ID_C, X1, X2, Y1, Y2, Z1
| eventstats values(ID_C) as ID_C values(Z1) as Z1 by ID_B
| eventstats values(X1) as X1 values(X2) as X2 values(Y1) as Y1 values(Y2) as Y2 by ID_A ID_B
| eventstats values(X1) as X1 values(X2) as X2 values(Y1) as Y1 values(Y2) as Y2 values(ID_A) as ID_A by ID_C ID_B
| mvexpand Y2
| mvexpand X1
| fillnull value="N/A"
| streamstats count by ID_A ID_B ID_C X1 X2 Y1 Y2 Z1
| where count==1
| foreach *
    [| eval <<FIELD>>=if(<<FIELD>>=="N/A",null(),<<FIELD>>)]
| fields - count

Note that mvexpand can cause memory issues, so you need to check your job status to ensure the search works correctly with real data.

rikinet
Path Finder

@ITWhisperer thanks for your reply. You have definitely put me on the right path.

My original data has more ID and attribute fields, so I needed to find a way to generalize your solution further. I had a bit of a struggle to understand why you chose these particular three eventstats lines with their respective values(...) and by ... clauses.

I believe a more generic recipe would be like this:

If you have n ID fields and k attribute fields, do one eventstats ... by ID for each of the n ID fields:

| eventstats
  values(ID_2) as ID_2, .., values(ID_n) as ID_n,
  values(attr_1) as attr_1, .., values(attr_k) as attr_k
  by ID_1
| eventstats
  values(ID_1) as ID_1, values(ID_3) as ID_3, .., values(ID_n) as ID_n,
  values(attr_1) as attr_1, .., values(attr_k) as attr_k
  by ID_2
| eventstats .. by ID_3
| eventstats .. by ID_n-1
| eventstats
  values(ID_1) as ID_1, .., values(ID_n-1) as ID_n-1,
  values(attr_1) as attr_1, .., values(attr_k) as attr_k
  by ID_n

 

The subsequent mvexpand and streamtstats count ... where count==1 can be simplified as:

| fillnull value="N/A" ```or stats by fieldlist ignores rows with at least one null value in fieldlist```
| stats count by ID_1, ..., ID_n, attr_1, ..., attr_k | fields - count
| foreach * [ | eval <<FIELD>>=if(<<FIELD>>=="N/A",null(),<<FIELD>>) ] ```undo the N/A replacement```


This stanza even expands possible multivalues in all of the n attributes - otherwise each attribute field with potential multivalues would need its own explicit mvexpand <attr> . It also avoids mvexpand's potential memory issue.

 

For my own sample data, it would read:

| makeresults
| eval _raw="ID_A;ID_B;X1;X2
A1;B1;X1_1;X2_1
A2;B2;X1_2A;X2_2
A2;B2;X1_2B;X2_2
A3;B3;X1_3;X2_3
"
| multikv forceheader=1
| table ID_A, ID_B, X1, X2

| append [
| makeresults
| eval _raw="ID_A;ID_B;Y1;Y2
A2;B2;Y1_2;
A2;B2;;Y2_2
A3;B3;;Y2_3A
A3;B3;;Y2_3B
A4;B4;Y1_4;Y2_4
"
| multikv forceheader=1
| table ID_A, ID_B, Y1, Y2
]

| append [
| makeresults
| eval _raw="ID_B;ID_C;Z1
B1;C1;Z1_1
B3;C3;Z1_3
B5;C5;Z1_5
"
| multikv forceheader=1
| table ID_B, ID_C, Z1
]
| table ID_A, ID_B, ID_C, X1, X2, Y1, Y2, Z1

```--- relevant code starts here ---```

| eventstats
  values(ID_B) as ID_B, values(ID_C) as ID_C,
  values(X1) as X1, values(X2) as X2, values(Y1) as Y1, values(Y2) as Y2, values(Z1) as Z1
  by ID_A
| eventstats
  values(ID_A) as ID_A, values(ID_C) as ID_C,
  values(X1) as X1, values(X2) as X2, values(Y1) as Y1, values(Y2) as Y2, values(Z1) as Z1
  by ID_B
| eventstats
  values(ID_A) as ID_A, values(ID_B) as ID_B,
  values(X1) as X1, values(X2) as X2, values(Y1) as Y1, values(Y2) as Y2, values(Z1) as Z1
  by ID_C

| fillnull value="N/A" ```or stats by fieldlist ignores rows with at least one null value in fieldlist```
| stats count by ID_A, ID_B, ID_C, X1, X2, Y1, Y2, Z1 | fields - count
| foreach * [ | eval <<FIELD>>=if(<<FIELD>>=="N/A",null(),<<FIELD>>) ] ```undo the N/A replacement```

 
Please let me know if I have overlooked something.

0 Karma
Get Updates on the Splunk Community!

Say goodbye to manually analyzing phishing and malware threats with Splunk Attack ...

In today’s evolving threat landscape, we understand you’re constantly bombarded with phishing and malware ...

AppDynamics is now part of Splunk Ideas

Hello Splunkers, We have exciting news for you! AppDynamics has been added to the Splunk Ideas Portal. Which ...

Advanced Splunk Data Management Strategies

Join us on Wednesday, May 14, 2025, at 11 AM PDT / 2 PM EDT for an exclusive Tech Talk that delves into ...