Hi All.
Using Splunk for collecting logs from different devices. But logs from on devices on the network , is not present on the splunk server. After some hours, the logs from that device is appearing on the Splunk server again. In that period, where we missed logs from this device, there has not been any network changes, og changes on the client. We are looking for reason for this. The logs were missing for around 6 hours. from early in the morning.
Could it it be some memory issues on the server, or something with the index`es ? If there was some work for preparing for some kind of maintanence on the backen, could this have any effect on the Splunk server log preformance ? Device which we are missing logs from these hours, has been online all the time.
Any tips, how and where to look/ troublshoote in the Splunk enviroment when logs are not present from on or more hosts ?
Thanks in advance.
DD
There is too little information to even blindly guess.
Firstly, how are those events getting into your Splunk infrastructure? Do you have a UF installed on remote hosts and monitor file inputs deifned on them? Or maybe those are eventlog inputs? Or are you receiving syslog data over the network? Directly or using a third party syslog daemon?
Secondly, how did you verify that the data for those "outages" isn't ingested at all? Maybe the sources (or receivers) are getting clogged so your ingestion process stops for a while but then resumes and catches up but your data onboarding is incomplete so you don't have reliable timestamps?
There are many things that can go wrong.
Hi PickleRick.
Thank you for replying on the post.
Our devices are sending syslogs to Splunk server over the network (there has not been, network issues).
Secondly, we our supplier noticed, that they was not recieving logs from one specefic host. And after some hours (approx 5), our Supplier was recieving logs from the specific host. While the supplier was not recieving logs from this host, they recieved a lot of logs from other hosts on our network. It happend around 04.47 (am) local time, on that time, there is not load on the network
Our supplier is maintaining indexes, and system work.
About the ingestion process stops, could that process stop for one host (one out of many), while the other hosts are not impacted ?
Brgds DD
Well, we can't say retroactively what happened for sure. Syslog, especially UDP-transmitted one is sensitive to both network disruptions as well as receiver's performance.
If the receiving Splunk infrastructure listens for syslog directly with splunkd process, without external syslog daemons, that might have caused the receiver to be "overwhelmed" with a burst of data from other hosts and might have caused it to not process the incoming syslog data properly.
Performance is one of the reasons why in production environment you generally shouldn't listen for syslog data directly with Splunk process. You should use an external syslog daemon. See https://docs.splunk.com/Documentation/SVA/current/Architectures/Syslog for possible syslog ingestion architectures.
Hi Isoutamo.
Sorry for the late response, had some time off.
But If I share our Splunk enviroment, perhaps there would be some places you could recommend for troubleshooting.
Our clients are sending syslogs through Netscaler enviroment to the syslogs servers.
Syslog servers are connected to the Splunk Index Cluster, which is connected to the Search Head cluster and Heavy Forwarders.
Heavy forwarders are connected to Universal Forwarder. There was no changes in our network. The logs from this specific host was missing for about 5 hours. Earlier we had some issues with the indexes in our Splunk enviroment, which they are saying is fixed. In this case our syslogs servers have Netscaler between it self and the clients.
The logs are at the end send via Internet to a Cybersecurity enviroment.
Does this make the picture more clear, or can I send you our diagram over the Splunk envorment (some place where it is not public) ?
Brgds DD
1. Load-balancing syslogs usually doesn't work very well.
2. Your description doesn't make much sense in some points.
3. This is a public community where volutneers share their experience/expertise for common good. If you want a private audit - well, that's a service you're normally paying for. Reach out in your area for a friendly Splunk Partner and ask them to review your environment.
4. Without knowing both the details of your environment as well as seeing what really happened within your environment (checking internal logs if they haven't rolled out yet, maybe verifying some other logs and external monitoring) it's impossible to say what exactly happened.
What _might have_ happened is the usual - lack of connectivity, there was enough data buffered so that extra data just overflowed and didn't make it into the buffer. Maybe - as you're saying you had "some issues with indexes" - some data was indexed but got lost. We don't know. And it might or might not, depending on how much data you have on your environment still left, be something that's really only findable on-site by examining the environment in question.
As a side note vis-a-vis your sharing "somewhere where it's not private" - are you sure you're in position to freely disclose such information to a third party? Without a prior service agreement and possibly an NDA?
Hi Ricklepick.
Thanks for your info. We have a lot of data in our network. We lost logs form this device at approx 05:15 in the morning, local time. At that time, there isn`t a lot of traffic on our network. We had not excperinced any lack of connectivty in that period where we was missing these logs from this device. I
If it was the loadbalancer, then we should miss logs from more one device. Our syslogs sources are sending logs through the netscaler to syslog servers. Syslogservers are sending then the syslogs to SPlunk index Cluster, which are sending it to the Heavy forwarders.
Brgds DD