Splunk Search

Join and appendcols returns results, but not giving the desired results

UMDTERPS
Communicator

I have a CSV and a Keystore with data that I would like to join together.  I read the documentation:

https://docs.splunk.com/Documentation/SplunkCloud/8.1.2101/SearchReference/Append

https://docs.splunk.com/Documentation/SplunkCloud/8.1.2101/SearchReference/Join

It appears my type of search might not be covered by either join or append.  I have a scan.csv that has about 19,000 IP's with scan data in it and another CSV called IP's.csv that has about 40,000 IP's.  The IP's.csv has some systems that have a system name and some that do not.

Shortened example below:

IP's.csv

System              IP                    
ABC        192.168.1.1                      
ABC        192.168.1.2                      
DEF         192.168.1.3                      
DEF         192.168.1.4                     
DEF         192.168.1.5                    

Scan.csv

IP                                 scan-data
192.168.1.1                 yes
192.168.1.2                 yes
192.168.1.3                 yes
192.168.1.4                 yes 
192.168.1.5                 yes
192.168.1.6                 yes
192.168.1.7                 yes

When I do a join with left, outer, or inner, lookup, or and appendcols, etc, it only returns the data that has "system" field.  The data will return and look something like this:

System              IP           scan-data
ABC        192.168.1.1       yes                   
ABC        192.168.1.2       yes                 
DEF         192.168.1.3       yes                  
DEF          192.168.1.4      yes                 
DEF         192.168.1.5       yes


However, 192.168.1.6   and 192.168.1.7 do not return at all (it appears joins and appends only return what is in common, like by the System field), how do I do a search that returns data that does not have a "System"?

I want the data to return to look something like this:

System              IP           scan-data
ABC        192.168.1.1       yes                   
ABC        192.168.1.2       yes                 
DEF         192.168.1.3       yes                  
DEF         192.168.1.4       yes                 
DEF         192.168.1.5       yes
                 192.168.1.6       yes
                  192.168.1.7      yes

Any ideas?

 

Labels (4)
0 Karma

richgalloway
SplunkTrust
SplunkTrust

You're right - join and appendcols are not right for this.  You can use append or you can let inputlookup do the append for you.   The key part is to re-group the results using the stats command.

```Read the first CSV```
| inputlookup IPs.csv
```Append the second CSV to the result set```
| inputlookup append=true Scan.csv
```Group the results by IP address```
| stats values(*) as * by IP

 

---
If this reply helps you, an upvote would be appreciated.
0 Karma

UMDTERPS
Communicator

 

 

 

| inputlookup scan_data.csv
|append [
                |inputlookup ip_kv_system  
                |where isnotnull(stuff)
                |eval stuff=split(stuff, "|delim|")
                |mvexpand stuff
                |mvexpand data
                |spath input=stuff "IP Addr" output=ip
                |spath input=devices "OS"
                |spath input=data "fqdn"
                |fields ip "OS" "fqdn"
      ]
   

 

 

The append returns results, but breaks if I add anything after the append.  I need to do some data normalization after the append because we have a lot of bad data.  For example, If I add 

|fillnull systems value="N/A"
| where systems != "N/A"

or 

|rename field as field 

or 

|dedup ip systems


The data that is in the ip_kv_system gets dropped from the search results.  How can I run commands after the append?

Side note: The scan_data.csv is 500MB, not sure if that has an impact. 

0 Karma

richgalloway
SplunkTrust
SplunkTrust

To debug the appended search, run it by itself starting with just the inputlookup command.  Then run it again with one more command added.  Verify the results are as expected after each run.  When the results change the cause is the command just added.

---
If this reply helps you, an upvote would be appreciated.
0 Karma

ITWhisperer
Ultra Champion

Try something like this:

| inputlookup scan.csv
| append [| inputlookup ips.csv ]
| stats values(system) as system values(scanData) as scanData by IP
0 Karma
.conf21 CFS Extended through 5/20!

Don't miss your chance
to share your Splunk
wisdom in-person or
virtually at .conf21!

Call for Speakers has
been extended through
Thursday, 5/20!