Splunk Search

Compare CSV to search

H4waiianPunch
Engager

Hello everyone,

I'd like to start out by saying I'm really quite new to Splunk, and we run older versions(6.6.3 and 7.2.3).

I'm looking to have a search that will do the following:

- Look up the current hosts in our system, which I can get with the following search

 

 

index=* "daily.cvd"
| dedup host | table host

 

 

 - Then compare to a CSV file that has 1 column with A1 being "host" and then all other entries are the hosts that SHOULD be present/accounted for.

-- Using ChatGPT I was able to get something like below which on it's own will properly read the CSV file and output the hosts in it.

 

 

| append [
    | inputlookup hosts.csv
    | rename host as known_hosts
    | stats values(known_hosts) as known_hosts
]
| eval source="current"
| eval status=if(isnull(mvfind(known_hosts, current_hosts)), "New", "Existing")
| eval status=if(isnull(mvfind(current_hosts, known_hosts)), "Missing", status)
| mvexpand current_hosts
| mvexpand known_hosts
| table current_hosts, known_hosts, status

 

 

- However when I combine the 2, it will show me 118 results(should only be 59) and there are no results in the "current_hosts" column, and after 59 blank results, the "known_hosts" will then show the correct results from the CSV.

 

 

index=* "daily.cvd"
| dedup host | table host
| append [
    | inputlookup hosts.csv
    | rename host as known_hosts
    | stats values(known_hosts) as known_hosts
]
| eval source="current"
| eval status=if(isnull(mvfind(known_hosts, current_hosts)), "New", "Existing")
| eval status=if(isnull(mvfind(current_hosts, known_hosts)), "Missing", status)
| mvexpand current_hosts
| mvexpand known_hosts
| table current_hosts, known_hosts, status

 

 

I'd love to have any help on this, I'm wouldn't be surprised if ChatGPT is making things more difficult than needed. 

Thanks in advance!

Labels (2)
0 Karma
1 Solution

marnall
Motivator

Have a go at this:

index=* "daily.cvd"
| dedup host
| table host
| append
[| inputlookup hosts.csv]
| stats count by host
| where count = 1
| lookup hosts.csv host outputnew host as host_found
| eval status = if(isnull(host_found),"NEW","MISSING")
| table host status

Make sure you have a lookup table (hosts.csv) with a single "host" column containing all your expected hosts.

View solution in original post

0 Karma

yuanliu
SplunkTrust
SplunkTrust

ChatGPT is perhaps the last place you want to learn SPL from.  The task is relative straightforward.

index=* "daily.cvd"
| fields host ``` only needed if sources have too many fields ```
| eval source = "INDEX"
| append
    [inputlookup hosts.csv
    | eval source = "CSV"]
| stats values(source) as source by host
| eval status = case(mvcount(source) > 1, null(), source == "CSV", "Missing", true(), "New")
| fields - source
0 Karma

H4waiianPunch
Engager

Hey, I certainly agree that ChatGPT isn't the best place to learn, but it comes in handy sometimes. I need to start taking some actual training though.

 

Your solution did work, so thank you for sharing it with me. I did then go and use GPT to help explain the details to me and I think I understand it all so that's nice. Setting sources to different values and comparing them that way is neat and I'm glad I've seen that now. 

0 Karma

marnall
Motivator

If I understand correctly, you would like the final output to be two columns, where one shows the machines that SHOULD appear, and the second shows the machines that DO appear? Then you could see which machines are not appearing and therefore need attention?

E.g.

SHOULD_APPEARDO_APPEAR
host1host1
host2 
host3host3
......
0 Karma

H4waiianPunch
Engager

Either two columns as you described, or two columns with machines that SHOULD appear and another column saying Missing if it's not there or New if it's new and unexpected. That way I wouldn't need to look through them as thoroughly and at a glance be able to see if something is wrong.

0 Karma

marnall
Motivator

Have a go at this:

index=* "daily.cvd"
| dedup host
| table host
| append
[| inputlookup hosts.csv]
| stats count by host
| where count = 1
| lookup hosts.csv host outputnew host as host_found
| eval status = if(isnull(host_found),"NEW","MISSING")
| table host status

Make sure you have a lookup table (hosts.csv) with a single "host" column containing all your expected hosts.

0 Karma

H4waiianPunch
Engager

Hey, thank you very much for this query! I've decided to go with yours out of the 2 responses here as it displays just the one host in the end instead of all of them which will be nicer at a glance.

 

You made it seem very simple and I appreciate that, I have a lot to learn!

0 Karma
Get Updates on the Splunk Community!

Developer Spotlight with Paul Stout

Welcome to our very first developer spotlight release series where we'll feature some awesome Splunk ...

State of Splunk Careers 2024: Maximizing Career Outcomes and the Continued Value of ...

For the past four years, Splunk has partnered with Enterprise Strategy Group to conduct a survey that gauges ...

Data-Driven Success: Splunk & Financial Services

Splunk streamlines the process of extracting insights from large volumes of data. In this fast-paced world, ...