Solved: How to split a column into two based on condition?

shreyp · ‎11-10-2022

Hi all,

Pls consider this subset of data,

... - Date - Fruit - Seller - Bad_count - ...

11/8 - Apple - X - 3

11/8 - Apple - Y - 10

11/8 - Apple - X - 3

11/8 - Apple - Y - 10

11/8 - Orange - Y - 6

11/8 - Orange - X - 1

11/8 - Orange - Y - 6

11/9 - Apple - X - 0

11/9 - Apple - Y - 9

11/9 - Apple - X - 0

11/9 - Orange - X - 7

11/9 - Orange - Y - 2

How to read it => Row 1: On 11/8 Seller X had 3 bad Apples, Row 8: on 11/9 Seller X had 0 bad Apples.

I would like to reformat the table into this:

... - Date - Fruit - Seller - Bad_count - X_bad_count - Y_bad_count - ...

11/8 - Apple - X - 3 — 3 - 10

11/8 - Apple - Y - 10 — 3 - 10

11/8 - Apple - X - 3 — 3 - 10

11/8 - Apple - Y - 10 — 3 - 10

11/8 - Orange - Y - 6 — 1 - 6

11/8 - Orange - X - 1 — 1 - 6

11/8 - Orange - Y - 6 — 1 - 6

11/9 - Apple - X - 0 — 0 - 9

11/9 - Apple - Y - 9 — 0 - 9

11/9 - Apple - X - 0 — 0 - 9

11/9 - Orange - X - 7 — 7 - 2

11/9 - Orange - Y - 2 — 7 - 2

How to read this => Row 1: On 11/8 for Apples: Seller X had 3 bad count and Seller Y had 10 bad count.

The idea is to split the Bad_count column into two columns based on the unique combination of Date and Fruit.

Any help would be greatly appreciated!

Thanks,

Shrey

PS: 1) There's years of data, many many fruits, and multiple sellers in the original dataset. 2) I've also sorted the sample data by Fruit up there to make it easy to read. 3) Don't worry about the duplicate rows as there are other fields in the dataset as well (meaning, dedup with care).

shreyp · ‎11-10-2022

I was able to solve the problem using @gcusello's snippet along w/ a left join.

<base_search>
| bin span=1d _time
| eval column=strftime(_time,"%d/%m")."-".Fruit
| join type=left column
[
| <same_base_search>
| bin span=1d _time
| eval column=strftime(_time,"%d/%m")."-".Fruit
| chart values(bad_count) AS bad_count OVER column BY Seller
]

Thanks, all.

View solution in original post

shreyp · ‎11-10-2022

I was able to solve the problem using @gcusello's snippet along w/ a left join.

<base_search>
| bin span=1d _time
| eval column=strftime(_time,"%d/%m")."-".Fruit
| join type=left column
[
| <same_base_search>
| bin span=1d _time
| eval column=strftime(_time,"%d/%m")."-".Fruit
| chart values(bad_count) AS bad_count OVER column BY Seller
]

Thanks, all.

johnhuang · ‎11-10-2022

Assuming your fieldnames are: Date, Fruit, Seller, Bad_count

| eval {Seller}_bad_count=Bad_count
| eventstats sum(*_bad_count) AS *_bad_count BY Date
| table Date Fruit Seller Bad_count *_bad_count

shreyp · ‎11-10-2022

Hi @johnhuang

Thanks for the response.

The sum command is not helping in this case. I changed it to values() and it seemed to work but only if I filter for a particular Fruit on a particular day.

Sincerely,

Shrey

johnhuang · ‎11-10-2022

Are the fieldnames Date, Fruit, Seller, Bad_count?
Could you provide a sample of your data (screenshot)?

shreyp · ‎11-10-2022

I'm sorry, I can't leak the exact data as it is critical to the organization. I've masked the use case and the variable names to the best of my ability. However, I can make it clear that the counts require no aggregation, only the table layout needs to be changed.

johnhuang · ‎11-10-2022

Your table layout example is not clear. Does the dashes between the field values represent artificial delimiters or is it part of the value?

shreyp · ‎11-10-2022

I've put dashes there to indicate the separation between the columns. Input table has 4 columns and the output table should have 6 columns (after adding X_bad_count and Y_bad_count). You could ignore the dashes or replace them with | , if it helps.

johnhuang · ‎11-10-2022

This should match your output.

| makeresults
| eval data="11/8,Apple,X,3;11/8,Apple,Y,10;11/8,Apple,X,3;11/8,Apple,Y,10;11/8,Orange,Y,6;11/8,Orange,X,1;11/8,Orange,Y,6;11/9,Apple,X,0;11/9,Apple,Y,9;11/9,Apple,X,0;11/9,Orange,X,7;11/9,Orange,Y,2"
| eval data=split(data, ";")
| mvexpand data
| rex field=data "(?<Date>[^,]*)\,(?<Fruit>[^,]*)\,(?<Seller>[^,]*)\,(?<Bad_count>[^,]*)"
| table Date Fruit Seller Bad_count
| eval {Seller}_bad_count=Bad_count
| eventstats max(*_bad_count) AS *_bad_count BY Date Fruit
| table Date Fruit Seller Bad_count *_bad_count

shreyp · ‎11-10-2022

To confirm, @johnhuang - your solution works as well! Thank you!

shreyp · ‎11-10-2022

Thanks, I'll try this as well.

gcusello · ‎11-10-2022

Hi @shreyp,

you could try something like this:

<your_search>
| bin span=1d _time
| eval column=strftime(_time,"%d/%m")."-".Fruit
| chart values(bad_count) AS bad_count OVER column BY Seller

Ciao.

Giuseppe

shreyp · ‎11-10-2022

Hi Giuseppe,

Thanks for the response. This almost did it!

As expected, the chart command outputs in "column - X_bad_count - Y_bad_count" format. How do I append these two count columns back into the main table? I'm looking for a view that we would get from eventstats kind of command.

Thanks,

Shrey

gcusello · ‎11-10-2022

Hi @shreyp,

good for you, see next time!

Ciao and happy splunking

Giuseppe

P.S.: Karma Points are appreciated by all the contributors 😉

How to split a column into two based on condition?

count

eval

field extraction

join

lookup

other

stats

subsearch

table

Threat Hunting Unlocked: How to Uplevel Your Threat Hunting With the PEAK Framework ...

Splunk APM: New Product Features + Community Office Hours Recap!

Index This | Forward, I’m heavy; backward, I’m not. What am I?