I have a CSV and a Keystore with data that I would like to join together. I read the documentation:
https://docs.splunk.com/Documentation/SplunkCloud/8.1.2101/SearchReference/Append
https://docs.splunk.com/Documentation/SplunkCloud/8.1.2101/SearchReference/Join
It appears my type of search might not be covered by either join or append. I have a scan.csv that has about 19,000 IP's with scan data in it and another CSV called IP's.csv that has about 40,000 IP's. The IP's.csv has some systems that have a system name and some that do not.
Shortened example below:
IP's.csv
System IP
ABC 192.168.1.1
ABC 192.168.1.2
DEF 192.168.1.3
DEF 192.168.1.4
DEF 192.168.1.5
Scan.csv
IP scan-data
192.168.1.1 yes
192.168.1.2 yes
192.168.1.3 yes
192.168.1.4 yes
192.168.1.5 yes
192.168.1.6 yes
192.168.1.7 yes
When I do a join with left, outer, or inner, lookup, or and appendcols, etc, it only returns the data that has "system" field. The data will return and look something like this:
System IP scan-data
ABC 192.168.1.1 yes
ABC 192.168.1.2 yes
DEF 192.168.1.3 yes
DEF 192.168.1.4 yes
DEF 192.168.1.5 yes
However, 192.168.1.6 and 192.168.1.7 do not return at all (it appears joins and appends only return what is in common, like by the System field), how do I do a search that returns data that does not have a "System"?
I want the data to return to look something like this:
System IP scan-data
ABC 192.168.1.1 yes
ABC 192.168.1.2 yes
DEF 192.168.1.3 yes
DEF 192.168.1.4 yes
DEF 192.168.1.5 yes
192.168.1.6 yes
192.168.1.7 yes
Any ideas?
You're right - join and appendcols are not right for this. You can use append or you can let inputlookup do the append for you. The key part is to re-group the results using the stats command.
```Read the first CSV```
| inputlookup IPs.csv
```Append the second CSV to the result set```
| inputlookup append=true Scan.csv
```Group the results by IP address```
| stats values(*) as * by IP
| inputlookup scan_data.csv
|append [
|inputlookup ip_kv_system
|where isnotnull(stuff)
|eval stuff=split(stuff, "|delim|")
|mvexpand stuff
|mvexpand data
|spath input=stuff "IP Addr" output=ip
|spath input=devices "OS"
|spath input=data "fqdn"
|fields ip "OS" "fqdn"
]
The append returns results, but breaks if I add anything after the append. I need to do some data normalization after the append because we have a lot of bad data. For example, If I add
|fillnull systems value="N/A"
| where systems != "N/A"
or
|rename field as field
or
|dedup ip systems
The data that is in the ip_kv_system gets dropped from the search results. How can I run commands after the append?
Side note: The scan_data.csv is 500MB, not sure if that has an impact.
To debug the appended search, run it by itself starting with just the inputlookup command. Then run it again with one more command added. Verify the results are as expected after each run. When the results change the cause is the command just added.
Try something like this:
| inputlookup scan.csv
| append [| inputlookup ips.csv ]
| stats values(system) as system values(scanData) as scanData by IP