Getting Data In

Duplicate events because of hosts writing to the same log file shared on a NFS

AbhinandGokul
New Member

Hello All,
I am a total newbie to SPLUNK and request expert's help to create a query/dashboard.

We have a set of servers writing to the same log which is on a NFS. This NFS is shared on the set of servers. I want to be able to count the number of connection resets I have been getting, unfortunately since they are being treated as separate events on the different hosts, I get duplicate results in my output.

This is my SPLUNK Query
sourcetype="LogFile" "LogEvent level=\"SEVERE\"" | rex "(?.)" | search message="*Connection reset" "StackTrace" | rex field=source "/path/to/the/log/file/(?\w+|\d+)-" | convert timeformat="%Y-%m-%d" ctime(_time) AS date | top company by host,date

IF there were 2 connection resets they will unfortunately be counted as 2 across all my hosts, skewing my results .

I tried using dedup and cluster but somehow never got it working. Could somebody please help?

Warm Regards,
Abhi

Tags (2)
0 Karma

echalex
Builder

Hello AbhinandGokul,

I'm afraid Splunk has no way of identifying which host the event is coming from. My guess is that you won't know which host is generating the event, either. To Splunk it will always look like several hosts have an identically named log file.

Now, there are a number of options:

  1. Don't log to the same file. This is the option I would recommend, so I'm putting it first. Either use an individual file name per host or keep the file locally, not on NFS.
  2. If the count of events is the only information you are interested in, you can count the events and divide the count by the number of hosts.
  3. Don't monitor the file on every host. Just monitor on one.
  4. Change your logging parameters so it includes the hostname, eg "server=foo.bar.domain.com". Then you can dedup with _time and server combined
  5. You can specify the host in your search, can you not? Then you will see the results only once. But you won't be able to calculate stats per hosts, as long as Splunk can't tell which event is caused by which host.

I don't know the reason why you are logging to the same file on NFS, but if it's at all possible, I would strongly recommend splitting up so each server has its own log file. If this then resides on NFS, it doesn't matter, as long as the path and/or filename is different.

grijhwani
Motivator

I would agree with the above. If you have any control over the generation of the logs your solution is there, rather than within Splunk. Centralising your scrutiny of multiple sources is what Splunk is all about.

0 Karma

grijhwani
Motivator

Logs over NFS are not ideal, but you should place a forwarder on the NFS server, and include the logs as a source there, and black-list the shared logs on the remote hosts.

0 Karma

echalex
Builder

Can you provide us with a few sample lines what it looks like? Does the line contain information about the host causing the problem?

0 Karma

AbhinandGokul
New Member

Thanks for responding!

A sample log file line

Blockquote

LogEvent level="SEVERE" time="2014-09-17T16:07:32Z" shapename="shape20" shapetype="Connector" shapelabel="" shapeextendedinfo="Master Name of the/process(Nameofthe process Default): cookie-Q2Q93V-cookie-name_of Connector; Name of What Operation">
0 java.net.SocketException: Connection reset
com.sun.xml.ws.transport.http.client.HttpClientTransport.readResponseCodeAndMessage(HttpClientTransport.java:212)
.
.
.
.
.
Lot of Stack Trace
java.lang.Thread.run(Thread.java:745)
/LogEvent

Blockquote

0 Karma