Splunk Search

Dividing combined value into separate values

psomeshwar
Path Finder

Currently, I have a table that looks like this:

Table1

Hostname   Vendor         Product              Version
-----------------------------------------------------------------
hostname1  vendor1      product1             version1
                          vendor2      product2             version2
                          vendor3      product3             version3
                          vendor4      product4             version4
-----------------------------------------------------------------
hostname2 vendor1      product2             version2
                         vendor2      product4             version1
                         vendor3      product3             version5
                         vendor4      product6             version3
-----------------------------------------------------------------

In this scenario, each hostname has a list of vendors, products and versions attached to it. What I want to create is the following:

Hostname      Vendor      Product        Version

hostname1    vendor1   product1      version1
hostname1    vendor2   product2      version2
hostname1    vendor3   product3      version3
hostname1    vendor4   product4      version4
hostname2    vendor1   product2      version2
hostname2    vendor2   product4      version1
hostname2    vendor3   product3      version5
hostname2    vendor4   product6      version3

 

Does anyone have any ideas?

Labels (3)
0 Karma
1 Solution

yuanliu
SplunkTrust
SplunkTrust

Based on this search, I suspect that the raw software field is not JSON.  Regardless, @richgalloway's suggestion of mvexpand is sound.  But you must give examples of your software values; additionally, you probably omitted max_match=0 from your real rex command.   I say this because by reverse engineering (something you should not force volunteers to do for you), I see two distinct possible format that software field can take to give out the table you illustrated.  Both possible formats require max_match=0, but each possible format requires a different approach to applying mvexpand.  Let me illustrate. (You should have illustrated your data in this manner.)

If you do | table hostname software, you probably see the following in Splunk's statistics table

ambiguous-table.png

However, never use a screenshot to illustrate data. (Screenshot is only useful when illustrating visualization anomalies.) The same display could come from two fundamentally different values.

1. When software is a multivalue field with distinct values like "cpe:/a:vendor1:product1:version1", "cpe:/a:vendor2:product2:version2", and so on, but all in a single event.  If this is the case, all you need is to apply mvexpand to software.

``` use when 'software' is multivalue ```
| mvexpand software
| rex field=software max_match=0 "cpe:\/a:(?<Vendor>[^:]+):(?<Product>[^:]+):(?<Version>.*)"
| table hostname, Vendor, Product, Version
| dedup hostname, Vendor, Product, Version

2. When software is single-value, but multiline, like

cpe:/a:vendor1:product1:version1
cpe:/a:vendor2:product2:version2
cpe:/a:vendor3:product3:version3
cpe:/a:vendor4:product4:version4

In this case, you need to first split software into single-line, multivalue before mvexpand.  Like this

``` use when 'software' is multiline ```
| eval software = split(software, "
")
| mvexpand software
| rex field=software max_match=0 "cpe:\/a:(?<Vendor>[^:]+):(?<Product>[^:]+):(?<Version>.*)"
| table hostname, Vendor, Product, Version
| dedup hostname, Vendor, Product, Version

 

In both cases, you can get the exact result you illustrated in OP.  But you must know which data format you have.

Here are two data emulations for you to play with and compare with real data.  You can attach them to their corresponding mvexpand method to see how they turn into the desired tabulation.

1. multivalue 'software'

| makeresults format=csv data="hostname
hostname1"
| eval software = split("cpe:/a:vendor1:product1:version1
cpe:/a:vendor2:product2:version2
cpe:/a:vendor3:product3:version3
cpe:/a:vendor4:product4:version4","
")
| append
    [makeresults format=csv data="hostname
    hostname2"
    | eval software = split("cpe:/a:vendor1:product2:version2
cpe:/a:vendor2:product4:version1
cpe:/a:vendor3:product3:version5
cpe:/a:vendor4:product6:version3","
")]
``` emulates multivalue 'software' ```

2. multline 'software'

| makeresults format=csv data="hostname
hostname1"
| eval software = "cpe:/a:vendor1:product1:version1
cpe:/a:vendor2:product2:version2
cpe:/a:vendor3:product3:version3
cpe:/a:vendor4:product4:version4"
| append
    [makeresults format=csv data="hostname
    hostname2"
    | eval software = "cpe:/a:vendor1:product2:version2
cpe:/a:vendor2:product4:version1
cpe:/a:vendor3:product3:version5
cpe:/a:vendor4:product6:version3"]
``` emulates multiline 'software' ```

Of course, reverse engineering (aka mind-reading), though laborious and generally loathed by volunteers, are often incorrect. There could be some other data format that I haven't considered that will give you the undesired output after rex; it is even possible some format will give you that undesirable output without max_match=0.  If so, only you can give us the real data format (anonymize as needed) to help yourself.

View solution in original post

richgalloway
SplunkTrust
SplunkTrust

Check out the filldown command.

---
If this reply helps you, Karma would be appreciated.
0 Karma

psomeshwar
Path Finder

The filldown command is not going to help me here. I don't think that I properly explained what I wanted to do. The original table looks like this:

HostnameVendorProductVersion
hostname1Vendor1
Vendor2
Vendor3
Vendor4
Product1
Product2
Product3
Prodcut4
Version1
Version2
Version3
Version4
hostname2Vendor1
Vendor2
Vendor3
Vendor4
Product2
Product4
Product3
Prodcut6
Version2
Version1
Version5
Version3

I want the new table to look like this:

HostnameVendorProductVersion
hostname1Vendor1Product1Version1
hostname1Vendor2Product2Version2
hostname1Vendor3Product3Version3
hostname1Vendor4Product4Version4
hostname2Vendor1Product2Version2
hostname2Vendor2Product4Version1
hostname2Vendor3Product3Version5
hostname2Vendor4Product6Version3
0 Karma

richgalloway
SplunkTrust
SplunkTrust

So, mvexpand may work, but it depends on how you got into this position to begin with.  What's the query?

---
If this reply helps you, Karma would be appreciated.
0 Karma

psomeshwar
Path Finder

The search is the following:

Index=index1 sourcetype=sourcetype1 hostname=* software != ""
| rex field=software "cpe:\/a:(?<Vendor>[^:]+):(?<Product>[^:]+):(?<Version>.*)"
| table hostname, Vendor, Product, Version
| dedup hostname, Vendor, Product, Version
0 Karma

yuanliu
SplunkTrust
SplunkTrust

Based on this search, I suspect that the raw software field is not JSON.  Regardless, @richgalloway's suggestion of mvexpand is sound.  But you must give examples of your software values; additionally, you probably omitted max_match=0 from your real rex command.   I say this because by reverse engineering (something you should not force volunteers to do for you), I see two distinct possible format that software field can take to give out the table you illustrated.  Both possible formats require max_match=0, but each possible format requires a different approach to applying mvexpand.  Let me illustrate. (You should have illustrated your data in this manner.)

If you do | table hostname software, you probably see the following in Splunk's statistics table

ambiguous-table.png

However, never use a screenshot to illustrate data. (Screenshot is only useful when illustrating visualization anomalies.) The same display could come from two fundamentally different values.

1. When software is a multivalue field with distinct values like "cpe:/a:vendor1:product1:version1", "cpe:/a:vendor2:product2:version2", and so on, but all in a single event.  If this is the case, all you need is to apply mvexpand to software.

``` use when 'software' is multivalue ```
| mvexpand software
| rex field=software max_match=0 "cpe:\/a:(?<Vendor>[^:]+):(?<Product>[^:]+):(?<Version>.*)"
| table hostname, Vendor, Product, Version
| dedup hostname, Vendor, Product, Version

2. When software is single-value, but multiline, like

cpe:/a:vendor1:product1:version1
cpe:/a:vendor2:product2:version2
cpe:/a:vendor3:product3:version3
cpe:/a:vendor4:product4:version4

In this case, you need to first split software into single-line, multivalue before mvexpand.  Like this

``` use when 'software' is multiline ```
| eval software = split(software, "
")
| mvexpand software
| rex field=software max_match=0 "cpe:\/a:(?<Vendor>[^:]+):(?<Product>[^:]+):(?<Version>.*)"
| table hostname, Vendor, Product, Version
| dedup hostname, Vendor, Product, Version

 

In both cases, you can get the exact result you illustrated in OP.  But you must know which data format you have.

Here are two data emulations for you to play with and compare with real data.  You can attach them to their corresponding mvexpand method to see how they turn into the desired tabulation.

1. multivalue 'software'

| makeresults format=csv data="hostname
hostname1"
| eval software = split("cpe:/a:vendor1:product1:version1
cpe:/a:vendor2:product2:version2
cpe:/a:vendor3:product3:version3
cpe:/a:vendor4:product4:version4","
")
| append
    [makeresults format=csv data="hostname
    hostname2"
    | eval software = split("cpe:/a:vendor1:product2:version2
cpe:/a:vendor2:product4:version1
cpe:/a:vendor3:product3:version5
cpe:/a:vendor4:product6:version3","
")]
``` emulates multivalue 'software' ```

2. multline 'software'

| makeresults format=csv data="hostname
hostname1"
| eval software = "cpe:/a:vendor1:product1:version1
cpe:/a:vendor2:product2:version2
cpe:/a:vendor3:product3:version3
cpe:/a:vendor4:product4:version4"
| append
    [makeresults format=csv data="hostname
    hostname2"
    | eval software = "cpe:/a:vendor1:product2:version2
cpe:/a:vendor2:product4:version1
cpe:/a:vendor3:product3:version5
cpe:/a:vendor4:product6:version3"]
``` emulates multiline 'software' ```

Of course, reverse engineering (aka mind-reading), though laborious and generally loathed by volunteers, are often incorrect. There could be some other data format that I haven't considered that will give you the undesired output after rex; it is even possible some format will give you that undesirable output without max_match=0.  If so, only you can give us the real data format (anonymize as needed) to help yourself.

richgalloway
SplunkTrust
SplunkTrust

Thanks.  I don't have a lot of experience with the JSON functions, but I think this should help.

index=index1 sourcetype=sourcetype1 hostname=* software != ""
| eval software=mvexpand(json_array_to_mv(software)
| rex field=software "cpe:\/a:(?<Vendor>[^:]+):(?<Product>[^:]+):(?<Version>.*)"
| table hostname, Vendor, Product, Version
| dedup hostname, Vendor, Product, Version

 The idea is to convert the software field from a JSON array into a multi-value field so it can be split into separate events using mvexpand.

---
If this reply helps you, Karma would be appreciated.
0 Karma
Get Updates on the Splunk Community!

Enterprise Security Content Update (ESCU) | New Releases

In December, the Splunk Threat Research Team had 1 release of new security content via the Enterprise Security ...

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

(This is the first of a series of 2 blogs). Splunk Enterprise Security is a fantastic tool that offers robust ...

Index This | What are the 12 Days of Splunk-mas?

December 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...