Splunk Search

How to split a column into two based on condition?

shreyp
Explorer

Hi all, 

Pls consider this subset of data,

... - Date - Fruit - Seller - Bad_count - ...

11/8 - Apple - X - 3

11/8 - Apple - Y - 10

11/8 - Apple - X - 3

11/8 - Apple - Y - 10

11/8 - Orange - Y - 6

11/8 - Orange - X - 1

11/8 - Orange - Y - 6

11/9 - Apple - X - 0

11/9 - Apple - Y - 9

11/9 - Apple - X - 0

11/9 - Orange - X - 7

11/9 - Orange - Y - 2

How to read it => Row 1: On 11/8 Seller X had 3 bad Apples, Row 8: on 11/9 Seller X had 0 bad Apples.

I would like to reformat the table into this:

... - Date - Fruit - Seller - Bad_count - X_bad_count - Y_bad_count - ...

11/8 - Apple - X - 3 — 3 - 10

11/8 - Apple - Y - 10 — 3 - 10

11/8 - Apple - X - 3 — 3 - 10

11/8 - Apple - Y - 10 — 3 - 10

11/8 - Orange - Y - 6 — 1 - 6

11/8 - Orange - X - 1 — 1 - 6

11/8 - Orange - Y - 6 — 1 - 6

11/9 - Apple - X - 0 — 0 - 9

11/9 - Apple - Y - 9 — 0 - 9

11/9 - Apple - X - 0 — 0 - 9

11/9 - Orange - X - 7 — 7 - 2

11/9 - Orange - Y - 2 — 7 - 2

How to read this => Row 1: On 11/8 for Apples:  Seller X had 3 bad count and Seller Y had 10 bad count.  

The idea is to split the Bad_count column into two columns based on the unique combination of Date and Fruit. 

Any help would be greatly appreciated!

Thanks,

Shrey

PS: 1) There's years of data, many many fruits, and multiple sellers in the original dataset. 2) I've also sorted the sample data by Fruit up there to make it easy to read. 3) Don't worry about the duplicate rows as there are other fields in the dataset as well (meaning, dedup with care).

0 Karma
1 Solution

shreyp
Explorer

I was able to solve the problem using @gcusello's snippet along w/ a left join. 

 

<base_search>
| bin span=1d _time
| eval column=strftime(_time,"%d/%m")."-".Fruit
| join type=left column
[
| <same_base_search>
| bin span=1d _time
| eval column=strftime(_time,"%d/%m")."-".Fruit
| chart values(bad_count) AS bad_count OVER column BY Seller
]

 

 

Thanks, all.

View solution in original post

0 Karma

shreyp
Explorer

I was able to solve the problem using @gcusello's snippet along w/ a left join. 

 

<base_search>
| bin span=1d _time
| eval column=strftime(_time,"%d/%m")."-".Fruit
| join type=left column
[
| <same_base_search>
| bin span=1d _time
| eval column=strftime(_time,"%d/%m")."-".Fruit
| chart values(bad_count) AS bad_count OVER column BY Seller
]

 

 

Thanks, all.

0 Karma

johnhuang
Motivator

Assuming your fieldnames are: Date, Fruit, Seller, Bad_count

 

| eval {Seller}_bad_count=Bad_count
| eventstats sum(*_bad_count) AS *_bad_count BY Date
| table Date Fruit Seller Bad_count *_bad_count

 

0 Karma

shreyp
Explorer

Hi @johnhuang 

Thanks for the response. 

The sum command is not helping in this case. I changed it to values() and it seemed to work but only if I filter for a particular Fruit on a particular day.

Sincerely,

Shrey

0 Karma

johnhuang
Motivator

Are the fieldnames Date, Fruit, Seller, Bad_count?
Could you provide a sample of your data (screenshot)?

0 Karma

shreyp
Explorer

I'm sorry, I can't leak the exact data as it is critical to the organization. I've masked the use case and the variable names to the best of my ability. However, I can make it clear that the counts require no aggregation, only the table layout needs to be changed.

0 Karma

johnhuang
Motivator

Your table layout example is not clear. Does the dashes between the field values represent artificial delimiters or is it part of the value?

0 Karma

shreyp
Explorer

I've put dashes there to indicate the separation between the columns. Input table has 4 columns and the output table should have 6 columns (after adding X_bad_count and Y_bad_count). You could ignore the dashes or replace them with | , if it helps.

0 Karma

johnhuang
Motivator

This should match your output.

 

| makeresults
| eval data="11/8,Apple,X,3;11/8,Apple,Y,10;11/8,Apple,X,3;11/8,Apple,Y,10;11/8,Orange,Y,6;11/8,Orange,X,1;11/8,Orange,Y,6;11/9,Apple,X,0;11/9,Apple,Y,9;11/9,Apple,X,0;11/9,Orange,X,7;11/9,Orange,Y,2"
| eval data=split(data, ";")
| mvexpand data
| rex field=data "(?<Date>[^,]*)\,(?<Fruit>[^,]*)\,(?<Seller>[^,]*)\,(?<Bad_count>[^,]*)"
| table Date Fruit Seller Bad_count
| eval {Seller}_bad_count=Bad_count
| eventstats max(*_bad_count) AS *_bad_count BY Date Fruit
| table Date Fruit Seller Bad_count *_bad_count

 

 

 

 

shreyp
Explorer

To confirm, @johnhuang - your solution works as well! Thank you!

0 Karma

shreyp
Explorer

Thanks, I'll try this as well.

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @shreyp,

you could try something like this:

<your_search>
| bin span=1d _time
| eval column=strftime(_time,"%d/%m")."-".Fruit
| chart values(bad_count) AS bad_count OVER column BY Seller

Ciao.

Giuseppe

shreyp
Explorer

Hi Giuseppe, 

Thanks for the response. This almost did it! 

As expected, the chart command outputs in "column - X_bad_count - Y_bad_count" format. How do I append these two count columns back into the main table? I'm looking for a view that we would get from eventstats kind of command.

Thanks,

Shrey

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @shreyp,

good for you, see next time!

Ciao and happy splunking

Giuseppe

P.S.: Karma Points are appreciated by all the contributors 😉

Get Updates on the Splunk Community!

.conf24 | Day 0

Hello Splunk Community! My name is Chris, and I'm based in Canberra, Australia's capital, and I travelled for ...

Enhance Security Visibility with Splunk Enterprise Security 7.1 through Threat ...

 (view in My Videos)Struggling with alert fatigue, lack of context, and prioritization around security ...

Troubleshooting the OpenTelemetry Collector

  In this tech talk, you’ll learn how to troubleshoot the OpenTelemetry collector - from checking the ...