Dedup with removal of null columns

Fenrir · ‎10-22-2019

I have a fairly complex query that ultimately outputs a large table with 23 fields and several dozen rows. Since the table is being populated from various data sourcetypes, I have an aggregate of all available fields from all sourcetypes, but all fields are not present in all sourcetypes. Assume there are 3 sourcetypes, they share 17 common fields, one has 3 unique, another 2, and the final sourcetype has 1 unique field. The resulting table will show the results, with all fields regardless if the original sourcetype contains them or not. Because of this, often one of those unique fields will be null across the entire column.

Let's assume for simplicity the data returned from the original query and output to the table before any dedup/null removal is:

| sourcetype | ip      | hostname | os      | status  | department |
|---------|---------|----------|---------|---------|------------|
| type1 | value1   | value2    | null    | value3   | null       |
| type2 |  value1   | value2    | null    | value3   | null       |
| type3 | value4   | value5  | value6 | value7 | null       |

In this instance, I would like to dedup rows 2 and 3 (I ignore sourcetype in the dedup, but I check all other fields present in the final tabl) in to a single result, and remove column 6, but I run in to the following problems:

Dedup requires the values you want to check against. If I remove column 6 (department) first, I cannot generate my dedup command properly, as having "department" as an option results in no full match and the dedup doesn't do anything.
I cannot run dedup first, as if I run it without using a fillnull, dedup will skip lines with null values in any of the specified columns. Now I cannot remove the null columns because the columns I want to remove are no longer null, they have a fillnull value.

Also, as mentioned above, the results are fairly dynamic based on the data being searched for (this is being used on a dashboard with a query parameter input). Sometimes column 4 will be all null, or column 3. The solution needs to account for the dynamic nature of the results.

Happy for any suggestions on how to overcome this, or just a confirmation that this will not be possible and I will need to find a different approach.

Apologies if this doesn't make any sense, I am happy to clarify any points as needed.

7splunk · ‎12-23-2020

I just had to deal with this same 'deduping' crushing results as well.. Did it with if() statements for any fields that were NULL at times:

<search> |eval os=if(ISNULL(os),"dummyValue",os)| eval department=if(ISNULL(department),"dummyValue",department) |dedup ip,hostname,status,department

and could then change the dummyValue back after the dedup if needed

arjunpkishore5 · ‎10-24-2019

How about this,

|stats max(*) as * by ip, hostname

Typically the by column should have fields which you consider as keys in the final result set. This would remove any columns which have all nulls.

Fenrir · ‎10-24-2019

This does not work. I need to dedup before running this operation, which means I am forced to use fillnull. I cannot dedup after because there is no way to predict which columns will be removed. Maybe if there's a way I can set the fields that are being fillnulle'd to - back to null this would work?

Dedup with removal of null columns

[Puzzles] Solve, Learn, Repeat: Character substitutions with Regular Expressions

Shape the Future of Splunk: Join the Product Research Lab!

Auto-Injector for Everything Else: Making OpenTelemetry Truly Universal

Are you a member of the Splunk Community?

Dedup with removal of null columns

[Puzzles] Solve, Learn, Repeat: Character substitutions with Regular Expressions

Shape the Future of Splunk: Join the Product Research Lab!

Auto-Injector for Everything Else: Making OpenTelemetry Truly Universal