Solved: Re: Dividing combined value into separate values

psomeshwar · ‎03-20-2024

Currently, I have a table that looks like this:

Table1

Hostname Vendor Product Version
-----------------------------------------------------------------
hostname1 vendor1 product1 version1
vendor2 product2 version2
vendor3 product3 version3
vendor4 product4 version4
-----------------------------------------------------------------
hostname2 vendor1 product2 version2
vendor2 product4 version1
vendor3 product3 version5
vendor4 product6 version3
-----------------------------------------------------------------

In this scenario, each hostname has a list of vendors, products and versions attached to it. What I want to create is the following:

Hostname Vendor Product Version

hostname1 vendor1 product1 version1
hostname1 vendor2 product2 version2
hostname1 vendor3 product3 version3
hostname1 vendor4 product4 version4
hostname2 vendor1 product2 version2
hostname2 vendor2 product4 version1
hostname2 vendor3 product3 version5
hostname2 vendor4 product6 version3

Does anyone have any ideas?

yuanliu · ‎03-22-2024

Based on this search, I suspect that the raw software field is not JSON. Regardless, @richgalloway's suggestion of mvexpand is sound. But you must give examples of your software values; additionally, you probably omitted max_match=0 from your real rex command. I say this because by reverse engineering (something you should not force volunteers to do for you), I see two distinct possible format that software field can take to give out the table you illustrated. Both possible formats require max_match=0, but each possible format requires a different approach to applying mvexpand. Let me illustrate. (You should have illustrated your data in this manner.)

If you do | table hostname software, you probably see the following in Splunk's statistics table

However, never use a screenshot to illustrate data. (Screenshot is only useful when illustrating visualization anomalies.) The same display could come from two fundamentally different values.

1. When software is a multivalue field with distinct values like "cpe:/a:vendor1:product1:version1", "cpe:/a:vendor2:product2:version2", and so on, but all in a single event. If this is the case, all you need is to apply mvexpand to software.

``` use when 'software' is multivalue ```
| mvexpand software
| rex field=software max_match=0 "cpe:\/a:(?<Vendor>[^:]+):(?<Product>[^:]+):(?<Version>.*)"
| table hostname, Vendor, Product, Version
| dedup hostname, Vendor, Product, Version

2. When software is single-value, but multiline, like

cpe:/a:vendor1:product1:version1
cpe:/a:vendor2:product2:version2
cpe:/a:vendor3:product3:version3
cpe:/a:vendor4:product4:version4

In this case, you need to first split software into single-line, multivalue before mvexpand. Like this

``` use when 'software' is multiline ```
| eval software = split(software, "
")
| mvexpand software
| rex field=software max_match=0 "cpe:\/a:(?<Vendor>[^:]+):(?<Product>[^:]+):(?<Version>.*)"
| table hostname, Vendor, Product, Version
| dedup hostname, Vendor, Product, Version

In both cases, you can get the exact result you illustrated in OP. But you must know which data format you have.

Here are two data emulations for you to play with and compare with real data. You can attach them to their corresponding mvexpand method to see how they turn into the desired tabulation.

1. multivalue 'software'

| makeresults format=csv data="hostname
hostname1"
| eval software = split("cpe:/a:vendor1:product1:version1
cpe:/a:vendor2:product2:version2
cpe:/a:vendor3:product3:version3
cpe:/a:vendor4:product4:version4","
")
| append
    [makeresults format=csv data="hostname
    hostname2"
    | eval software = split("cpe:/a:vendor1:product2:version2
cpe:/a:vendor2:product4:version1
cpe:/a:vendor3:product3:version5
cpe:/a:vendor4:product6:version3","
")]
``` emulates multivalue 'software' ```

2. multline 'software'

| makeresults format=csv data="hostname
hostname1"
| eval software = "cpe:/a:vendor1:product1:version1
cpe:/a:vendor2:product2:version2
cpe:/a:vendor3:product3:version3
cpe:/a:vendor4:product4:version4"
| append
    [makeresults format=csv data="hostname
    hostname2"
    | eval software = "cpe:/a:vendor1:product2:version2
cpe:/a:vendor2:product4:version1
cpe:/a:vendor3:product3:version5
cpe:/a:vendor4:product6:version3"]
``` emulates multiline 'software' ```

Of course, reverse engineering (aka mind-reading), though laborious and generally loathed by volunteers, are often incorrect. There could be some other data format that I haven't considered that will give you the undesired output after rex; it is even possible some format will give you that undesirable output without max_match=0. If so, only you can give us the real data format (anonymize as needed) to help yourself.

View solution in original post

richgalloway · ‎03-20-2024

Check out the filldown command.

---
If this reply helps you, Karma would be appreciated.

psomeshwar · ‎03-20-2024

The filldown command is not going to help me here. I don't think that I properly explained what I wanted to do. The original table looks like this:

Hostname	Vendor	Product	Version
hostname1	Vendor1 Vendor2 Vendor3 Vendor4	Product1 Product2 Product3 Prodcut4	Version1 Version2 Version3 Version4
hostname2	Vendor1 Vendor2 Vendor3 Vendor4	Product2 Product4 Product3 Prodcut6	Version2 Version1 Version5 Version3

I want the new table to look like this:

Hostname	Vendor	Product	Version
hostname1	Vendor1	Product1	Version1
hostname1	Vendor2	Product2	Version2
hostname1	Vendor3	Product3	Version3
hostname1	Vendor4	Product4	Version4
hostname2	Vendor1	Product2	Version2
hostname2	Vendor2	Product4	Version1
hostname2	Vendor3	Product3	Version5
hostname2	Vendor4	Product6	Version3

richgalloway · ‎03-20-2024

So, mvexpand may work, but it depends on how you got into this position to begin with. What's the query?

---
If this reply helps you, Karma would be appreciated.

psomeshwar · ‎03-21-2024

The search is the following:

Index=index1 sourcetype=sourcetype1 hostname=* software != ""
| rex field=software "cpe:\/a:(?<Vendor>[^:]+):(?<Product>[^:]+):(?<Version>.*)"
| table hostname, Vendor, Product, Version
| dedup hostname, Vendor, Product, Version

yuanliu · ‎03-22-2024

Based on this search, I suspect that the raw software field is not JSON. Regardless, @richgalloway's suggestion of mvexpand is sound. But you must give examples of your software values; additionally, you probably omitted max_match=0 from your real rex command. I say this because by reverse engineering (something you should not force volunteers to do for you), I see two distinct possible format that software field can take to give out the table you illustrated. Both possible formats require max_match=0, but each possible format requires a different approach to applying mvexpand. Let me illustrate. (You should have illustrated your data in this manner.)

If you do | table hostname software, you probably see the following in Splunk's statistics table

However, never use a screenshot to illustrate data. (Screenshot is only useful when illustrating visualization anomalies.) The same display could come from two fundamentally different values.

1. When software is a multivalue field with distinct values like "cpe:/a:vendor1:product1:version1", "cpe:/a:vendor2:product2:version2", and so on, but all in a single event. If this is the case, all you need is to apply mvexpand to software.

``` use when 'software' is multivalue ```
| mvexpand software
| rex field=software max_match=0 "cpe:\/a:(?<Vendor>[^:]+):(?<Product>[^:]+):(?<Version>.*)"
| table hostname, Vendor, Product, Version
| dedup hostname, Vendor, Product, Version

2. When software is single-value, but multiline, like

cpe:/a:vendor1:product1:version1
cpe:/a:vendor2:product2:version2
cpe:/a:vendor3:product3:version3
cpe:/a:vendor4:product4:version4

In this case, you need to first split software into single-line, multivalue before mvexpand. Like this

``` use when 'software' is multiline ```
| eval software = split(software, "
")
| mvexpand software
| rex field=software max_match=0 "cpe:\/a:(?<Vendor>[^:]+):(?<Product>[^:]+):(?<Version>.*)"
| table hostname, Vendor, Product, Version
| dedup hostname, Vendor, Product, Version

In both cases, you can get the exact result you illustrated in OP. But you must know which data format you have.

Here are two data emulations for you to play with and compare with real data. You can attach them to their corresponding mvexpand method to see how they turn into the desired tabulation.

1. multivalue 'software'

| makeresults format=csv data="hostname
hostname1"
| eval software = split("cpe:/a:vendor1:product1:version1
cpe:/a:vendor2:product2:version2
cpe:/a:vendor3:product3:version3
cpe:/a:vendor4:product4:version4","
")
| append
    [makeresults format=csv data="hostname
    hostname2"
    | eval software = split("cpe:/a:vendor1:product2:version2
cpe:/a:vendor2:product4:version1
cpe:/a:vendor3:product3:version5
cpe:/a:vendor4:product6:version3","
")]
``` emulates multivalue 'software' ```

2. multline 'software'

| makeresults format=csv data="hostname
hostname1"
| eval software = "cpe:/a:vendor1:product1:version1
cpe:/a:vendor2:product2:version2
cpe:/a:vendor3:product3:version3
cpe:/a:vendor4:product4:version4"
| append
    [makeresults format=csv data="hostname
    hostname2"
    | eval software = "cpe:/a:vendor1:product2:version2
cpe:/a:vendor2:product4:version1
cpe:/a:vendor3:product3:version5
cpe:/a:vendor4:product6:version3"]
``` emulates multiline 'software' ```

Of course, reverse engineering (aka mind-reading), though laborious and generally loathed by volunteers, are often incorrect. There could be some other data format that I haven't considered that will give you the undesired output after rex; it is even possible some format will give you that undesirable output without max_match=0. If so, only you can give us the real data format (anonymize as needed) to help yourself.

richgalloway · ‎03-21-2024

Thanks. I don't have a lot of experience with the JSON functions, but I think this should help.

index=index1 sourcetype=sourcetype1 hostname=* software != ""
| eval software=mvexpand(json_array_to_mv(software)
| rex field=software "cpe:\/a:(?<Vendor>[^:]+):(?<Product>[^:]+):(?<Version>.*)"
| table hostname, Vendor, Product, Version
| dedup hostname, Vendor, Product, Version

The idea is to convert the software field from a JSON array into a multi-value field so it can be split into separate events using mvexpand.

---
If this reply helps you, Karma would be appreciated.

Dividing combined value into separate values

fields

regex

rex

Preparing your Splunk Environment for OpenSSL3

Deprecation of Splunk Observability Kubernetes “Classic Navigator” UI starting ...

Now Available: Cisco Talos Threat Intelligence Integrations for Splunk Security Cloud ...