Currently, I have a table that looks like this:
Table1
Hostname Vendor Product Version
-----------------------------------------------------------------
hostname1 vendor1 product1 version1
vendor2 product2 version2
vendor3 product3 version3
vendor4 product4 version4
-----------------------------------------------------------------
hostname2 vendor1 product2 version2
vendor2 product4 version1
vendor3 product3 version5
vendor4 product6 version3
-----------------------------------------------------------------
In this scenario, each hostname has a list of vendors, products and versions attached to it. What I want to create is the following:
Hostname Vendor Product Version
hostname1 vendor1 product1 version1
hostname1 vendor2 product2 version2
hostname1 vendor3 product3 version3
hostname1 vendor4 product4 version4
hostname2 vendor1 product2 version2
hostname2 vendor2 product4 version1
hostname2 vendor3 product3 version5
hostname2 vendor4 product6 version3
Does anyone have any ideas?
Based on this search, I suspect that the raw software field is not JSON. Regardless, @richgalloway's suggestion of mvexpand is sound. But you must give examples of your software values; additionally, you probably omitted max_match=0 from your real rex command. I say this because by reverse engineering (something you should not force volunteers to do for you), I see two distinct possible format that software field can take to give out the table you illustrated. Both possible formats require max_match=0, but each possible format requires a different approach to applying mvexpand. Let me illustrate. (You should have illustrated your data in this manner.)
If you do | table hostname software, you probably see the following in Splunk's statistics table
However, never use a screenshot to illustrate data. (Screenshot is only useful when illustrating visualization anomalies.) The same display could come from two fundamentally different values.
1. When software is a multivalue field with distinct values like "cpe:/a:vendor1:product1:version1", "cpe:/a:vendor2:product2:version2", and so on, but all in a single event. If this is the case, all you need is to apply mvexpand to software.
``` use when 'software' is multivalue ```
| mvexpand software
| rex field=software max_match=0 "cpe:\/a:(?<Vendor>[^:]+):(?<Product>[^:]+):(?<Version>.*)"
| table hostname, Vendor, Product, Version
| dedup hostname, Vendor, Product, Version
2. When software is single-value, but multiline, like
cpe:/a:vendor1:product1:version1
cpe:/a:vendor2:product2:version2
cpe:/a:vendor3:product3:version3
cpe:/a:vendor4:product4:version4
In this case, you need to first split software into single-line, multivalue before mvexpand. Like this
``` use when 'software' is multiline ```
| eval software = split(software, "
")
| mvexpand software
| rex field=software max_match=0 "cpe:\/a:(?<Vendor>[^:]+):(?<Product>[^:]+):(?<Version>.*)"
| table hostname, Vendor, Product, Version
| dedup hostname, Vendor, Product, Version
In both cases, you can get the exact result you illustrated in OP. But you must know which data format you have.
Here are two data emulations for you to play with and compare with real data. You can attach them to their corresponding mvexpand method to see how they turn into the desired tabulation.
1. multivalue 'software'
| makeresults format=csv data="hostname
hostname1"
| eval software = split("cpe:/a:vendor1:product1:version1
cpe:/a:vendor2:product2:version2
cpe:/a:vendor3:product3:version3
cpe:/a:vendor4:product4:version4","
")
| append
[makeresults format=csv data="hostname
hostname2"
| eval software = split("cpe:/a:vendor1:product2:version2
cpe:/a:vendor2:product4:version1
cpe:/a:vendor3:product3:version5
cpe:/a:vendor4:product6:version3","
")]
``` emulates multivalue 'software' ```
2. multline 'software'
| makeresults format=csv data="hostname
hostname1"
| eval software = "cpe:/a:vendor1:product1:version1
cpe:/a:vendor2:product2:version2
cpe:/a:vendor3:product3:version3
cpe:/a:vendor4:product4:version4"
| append
[makeresults format=csv data="hostname
hostname2"
| eval software = "cpe:/a:vendor1:product2:version2
cpe:/a:vendor2:product4:version1
cpe:/a:vendor3:product3:version5
cpe:/a:vendor4:product6:version3"]
``` emulates multiline 'software' ```
Of course, reverse engineering (aka mind-reading), though laborious and generally loathed by volunteers, are often incorrect. There could be some other data format that I haven't considered that will give you the undesired output after rex; it is even possible some format will give you that undesirable output without max_match=0. If so, only you can give us the real data format (anonymize as needed) to help yourself.
Check out the filldown command.
The filldown command is not going to help me here. I don't think that I properly explained what I wanted to do. The original table looks like this:
Hostname | Vendor | Product | Version |
hostname1 | Vendor1 Vendor2 Vendor3 Vendor4 | Product1 Product2 Product3 Prodcut4 | Version1 Version2 Version3 Version4 |
hostname2 | Vendor1 Vendor2 Vendor3 Vendor4 | Product2 Product4 Product3 Prodcut6 | Version2 Version1 Version5 Version3 |
I want the new table to look like this:
Hostname | Vendor | Product | Version |
hostname1 | Vendor1 | Product1 | Version1 |
hostname1 | Vendor2 | Product2 | Version2 |
hostname1 | Vendor3 | Product3 | Version3 |
hostname1 | Vendor4 | Product4 | Version4 |
hostname2 | Vendor1 | Product2 | Version2 |
hostname2 | Vendor2 | Product4 | Version1 |
hostname2 | Vendor3 | Product3 | Version5 |
hostname2 | Vendor4 | Product6 | Version3 |
So, mvexpand may work, but it depends on how you got into this position to begin with. What's the query?
The search is the following:
Index=index1 sourcetype=sourcetype1 hostname=* software != ""
| rex field=software "cpe:\/a:(?<Vendor>[^:]+):(?<Product>[^:]+):(?<Version>.*)"
| table hostname, Vendor, Product, Version
| dedup hostname, Vendor, Product, Version
Based on this search, I suspect that the raw software field is not JSON. Regardless, @richgalloway's suggestion of mvexpand is sound. But you must give examples of your software values; additionally, you probably omitted max_match=0 from your real rex command. I say this because by reverse engineering (something you should not force volunteers to do for you), I see two distinct possible format that software field can take to give out the table you illustrated. Both possible formats require max_match=0, but each possible format requires a different approach to applying mvexpand. Let me illustrate. (You should have illustrated your data in this manner.)
If you do | table hostname software, you probably see the following in Splunk's statistics table
However, never use a screenshot to illustrate data. (Screenshot is only useful when illustrating visualization anomalies.) The same display could come from two fundamentally different values.
1. When software is a multivalue field with distinct values like "cpe:/a:vendor1:product1:version1", "cpe:/a:vendor2:product2:version2", and so on, but all in a single event. If this is the case, all you need is to apply mvexpand to software.
``` use when 'software' is multivalue ```
| mvexpand software
| rex field=software max_match=0 "cpe:\/a:(?<Vendor>[^:]+):(?<Product>[^:]+):(?<Version>.*)"
| table hostname, Vendor, Product, Version
| dedup hostname, Vendor, Product, Version
2. When software is single-value, but multiline, like
cpe:/a:vendor1:product1:version1
cpe:/a:vendor2:product2:version2
cpe:/a:vendor3:product3:version3
cpe:/a:vendor4:product4:version4
In this case, you need to first split software into single-line, multivalue before mvexpand. Like this
``` use when 'software' is multiline ```
| eval software = split(software, "
")
| mvexpand software
| rex field=software max_match=0 "cpe:\/a:(?<Vendor>[^:]+):(?<Product>[^:]+):(?<Version>.*)"
| table hostname, Vendor, Product, Version
| dedup hostname, Vendor, Product, Version
In both cases, you can get the exact result you illustrated in OP. But you must know which data format you have.
Here are two data emulations for you to play with and compare with real data. You can attach them to their corresponding mvexpand method to see how they turn into the desired tabulation.
1. multivalue 'software'
| makeresults format=csv data="hostname
hostname1"
| eval software = split("cpe:/a:vendor1:product1:version1
cpe:/a:vendor2:product2:version2
cpe:/a:vendor3:product3:version3
cpe:/a:vendor4:product4:version4","
")
| append
[makeresults format=csv data="hostname
hostname2"
| eval software = split("cpe:/a:vendor1:product2:version2
cpe:/a:vendor2:product4:version1
cpe:/a:vendor3:product3:version5
cpe:/a:vendor4:product6:version3","
")]
``` emulates multivalue 'software' ```
2. multline 'software'
| makeresults format=csv data="hostname
hostname1"
| eval software = "cpe:/a:vendor1:product1:version1
cpe:/a:vendor2:product2:version2
cpe:/a:vendor3:product3:version3
cpe:/a:vendor4:product4:version4"
| append
[makeresults format=csv data="hostname
hostname2"
| eval software = "cpe:/a:vendor1:product2:version2
cpe:/a:vendor2:product4:version1
cpe:/a:vendor3:product3:version5
cpe:/a:vendor4:product6:version3"]
``` emulates multiline 'software' ```
Of course, reverse engineering (aka mind-reading), though laborious and generally loathed by volunteers, are often incorrect. There could be some other data format that I haven't considered that will give you the undesired output after rex; it is even possible some format will give you that undesirable output without max_match=0. If so, only you can give us the real data format (anonymize as needed) to help yourself.
Thanks. I don't have a lot of experience with the JSON functions, but I think this should help.
index=index1 sourcetype=sourcetype1 hostname=* software != ""
| eval software=mvexpand(json_array_to_mv(software)
| rex field=software "cpe:\/a:(?<Vendor>[^:]+):(?<Product>[^:]+):(?<Version>.*)"
| table hostname, Vendor, Product, Version
| dedup hostname, Vendor, Product, Version
The idea is to convert the software field from a JSON array into a multi-value field so it can be split into separate events using mvexpand.