Deployment Architecture

S.o.S: multiple problems with the Security Health Check view on Windows

cmeo
Contributor

I tried to raise a case but the support portal wouldn't play ball.

We've found several problems with SoS on Windows, specifically the Security Health Check dashboard.

  1. The module closes in the XML are in the wrong place. Hence the drop-down value you select gets thrown away and isn't used to filter the results of securityinfo.py. We fixed it and that bit works OK now.
  2. It's a very strange strategy anyway. The XML is supposed to set a filter of one of the splunk servers. securityinfo.py reports on all servers it knows about, then the XML filters the results. Kinda like doing a full table scan then a select...
  3. securityinfo.py doesn't know anything about splunk_server_cache.csv so has a wrong idea of what the splunk servers are anyway. It is now supported to customise splunk_server_cache.csv but the python script doesn't look at it.
  4. securityinfo.py is borked. The run as root test is not implemented on windows, so the value being returned is 'Undefined'. This raises a red traffic light unnecessarily which may have the effect of confusing or upsetting end users. Should return 'false' and that's what we've done here, so that bit works OK too.
  5. securityinfo.py doesn't return correct values for whether splunk web ssl is enabled. Or at least, it has a bet each way and returns two rows, one that says true, the other that says false. This may be a side effect of our architecture, which is two search heads, one running PCI and one ES, which are peered to the indexer but not to each other. This is supposed to be supported by SoS but plainly isn't completely.

What this code should do is use the drop-down you select in the XML as a parameter to the script, to get the security info from just one server. However, we weren't able to fix this because the code is pretty obfuscated, and it isn't really clear how it does anything at all. For instance, having set sos_server (incorrectly) it doesn't seem to do anything with it.

There must be some implicit or hidden stuff going on.

PS if anyone knows what the 'Raise Case' support page will accept as a valid phone number, this would be good to know. I tried around a dozen variations, including Support's own number as given on the page, and it didn't like any of them nor would it give me a hint.

hexx
Splunk Employee
Splunk Employee

Thank you for reporting these issues! I'll do my best to answer point by point:

  1. Wow. I have no idea how this could have escaped our testing. That is a gross mistake. I have filed and resolved this as a bug (SUP-928). I hope to post a small maintenance release for S.o.S 3.2 with this fix once we have a few other bugs sorted out.
  2. This is a limitation of custom search commands: There is no way to write them so that an argument like splunk_server can be passed to only distribute the search and execute the command on a given host. The best you can do would be some introspective logic where the command is given a target hostname as an argument, runs on every search peer and first checks if the local hostname matches what has been given as an argument. The command would then exit immediately except on peers for whom both peer values match. So, you're still running some Pyhon everywhere, really. Given that securityinfo.py is fairly lightweight, that optimization didn't seem worth the trouble.
  3. I think I found what the error was there and fixed it along with SUP-928.
  4. I have opened a bug against this behavior. Internal reference: SUP-929
  5. Are your two search-heads on the same machine? If they are, did you set different values for "host" as it comes configured in the general stanza of etc/system/local/inputs.conf? Unless this requirement is fulfilled, problems will occur in S.o.S when instances that run on the same machine (and have the same default value for "host") are selected.
0 Karma

cmeo
Contributor

Anyway as long as all of SoS is aware of splunk_server_cache it should work fine.

0 Karma

cmeo
Contributor

The PCI and ES search heads are on different machines. As far as we can tell it's not best practice to peer PCI and ES heads. We've offloaded the data to a single indexer; each of the search heads is peered with that.

There's a third search head running SOS and general searches. We have the two other SH's forwarding internal logs to the indexer; SOS on its own SH can therefore see everything it needs to-in theory!

In practice our rather arcane setup was probably not visualised in the design of SOS, despite the fact that each individual configuration step we made is supported and legal.

0 Karma
Get Updates on the Splunk Community!

Introducing Splunk Enterprise 9.2

WATCH HERE! Watch this Tech Talk to learn about the latest features and enhancements shipped in the new Splunk ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...