Getting Data In

How to query a CSV file from another CSV file?

jip31
Motivator

Hi

I have 2 CSV file in lookups directory :
The first CSV is called "host.csv" and has a field called "host" which corresponds to the hostname
The second CSV is called "DCM.csv" and has afield called "Computer Name" which corresponds to the hostname and another field called "Flag"
I need to count host by Flag for hosts which exists in the first CSV and in the second CSV
Could you help me please?

Tags (2)
0 Karma
1 Solution

jeffland
Champion

Just to clarify, I understood this as "There can be hosts which are in only one of the two lookups. I want to count all that appear in the first file by Flag". If you need something else, please clarify.
First you'll have to get the lookup file as your search results. This is done with the inputlookup command. Since it is a generating command, you need to put a pipe before it:

| inputlookup host.csv

Run this and you'll see all hosts from your first file. Next, you need to lookup each line of this data in your second file using the correct fields:

| lookup DCM.csv "Computer Name" as host

With this command, you have the new field Flag in your data for hosts that were in both lookups and which have a value for Flag in the second (you'll still have results that are only in the first lookup and also those that are in the first and in the seconds but don't have a value for the field Flag in the second, either of these will not have a value in Flag however). Now we can count:

| stats count by host Flag

This will give you a count per host and Flag combination, but hosts without a value in Flag will be thrown away. If you want them, you'll need to provide a value before stats, for example with fillnull:

| fillnull value="no Flag value set" Flag

Final search might look like this:

| inputlookup host.csv
| lookup DCM.csv "Computer Name" as host
| fillnull value="no Flag value set" Flag
| stats count by host Flag

View solution in original post

0 Karma

woodcock
Esteemed Legend

Like this:

|inputcsv host.csv | eval which="host"
| appendpipe [|inputcsv DCM.csv | rename "Computer Name" AS host | eval which="DCM" ]
| stats values(*) AS * dc(which) AS whichCount BY host
| where whichCount=2
| stats count BY Flag
0 Karma

jip31
Motivator

thanks woodcock your code is perfect and works fine but I have to accept jeffland answer which has answered first
have a nice day!!

0 Karma

jeffland
Champion

Thanks. Generally, if you get more than one answer I would recommend choosing the answers which gives a better explanation or links to relevant documentation.

0 Karma

jeffland
Champion

Why use | appendpipe? Why | inputcsv? Also, why use a subsearch with and not | inputlookup append=t?

0 Karma

woodcock
Esteemed Legend

The user said that he has CSV files, which I took very literally. I agree that inputlookup would be better than inputcsv. This kind of a subsearch is unlimited and I have used it for a decade since I invented it to bypass the subsearch-limit, long before append=t existed. Maybe that works just as well but I have no reason to switch now.

0 Karma

jeffland
Champion

I would argue that having used something for a decade is not a good reason to continue doing it if better options become available. However, I hadn't realized that inputcsv also has an append=t argument, so it hardly makes a difference in this case. Since the question doesn't say where the csv files are kept (in etc/apps/<appname>/lookups or /var/run/splunk/csv), either should work depending on lookup location.

0 Karma

jeffland
Champion

Just to clarify, I understood this as "There can be hosts which are in only one of the two lookups. I want to count all that appear in the first file by Flag". If you need something else, please clarify.
First you'll have to get the lookup file as your search results. This is done with the inputlookup command. Since it is a generating command, you need to put a pipe before it:

| inputlookup host.csv

Run this and you'll see all hosts from your first file. Next, you need to lookup each line of this data in your second file using the correct fields:

| lookup DCM.csv "Computer Name" as host

With this command, you have the new field Flag in your data for hosts that were in both lookups and which have a value for Flag in the second (you'll still have results that are only in the first lookup and also those that are in the first and in the seconds but don't have a value for the field Flag in the second, either of these will not have a value in Flag however). Now we can count:

| stats count by host Flag

This will give you a count per host and Flag combination, but hosts without a value in Flag will be thrown away. If you want them, you'll need to provide a value before stats, for example with fillnull:

| fillnull value="no Flag value set" Flag

Final search might look like this:

| inputlookup host.csv
| lookup DCM.csv "Computer Name" as host
| fillnull value="no Flag value set" Flag
| stats count by host Flag
0 Karma

jip31
Motivator

perfect jeffland thanks for your clear answer!

0 Karma

grittonc
Contributor

What have you tried?

0 Karma
Get Updates on the Splunk Community!

The Splunk Success Framework: Your Guide to Successful Splunk Implementations

Splunk Lantern is a customer success center that provides advice from Splunk experts on valuable data ...

Splunk Training for All: Meet Aspiring Cybersecurity Analyst, Marc Alicea

Splunk Education believes in the value of training and certification in today’s rapidly-changing data-driven ...

Investigate Security and Threat Detection with VirusTotal and Splunk Integration

As security threats and their complexities surge, security analysts deal with increased challenges and ...