Hi,
We are trying to use selective parsing in splunk to parse only those events that have timestamp as a part of entire log. The context behind the same is that while parsing the logs from one of the Java application, JVM is printing garbage value as well in the logs and we don't want it to be parsed in splunk system. So we are moving to parse the events selectively based on timestamps ie events with timestamps.
When we ingest logfiles which contain Java we configure the linebreaker manually. This means that the whole java dump will be in one event instead of mixed up by the splunk auto parsing.
In those cases our props.conf looks like this:
[yoursourcetype]
LINE_BREAKER = ([\r\n]+)\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2}
SHOULD_LINEMERGE = false
TRUNCATE = 100000
MAX_TIMESTAMP_LOOKAHEAD = 30
NO_BINARY_CHECK = true
TIME_FORMAT=%F %T
TIME_PREFIX=^
LINE_BREAKER should contain the timestamp after the ([\r\n]+)
and you probably want to increase TRUNCATE
to at least 100000
hi Juhi28,
could you share an example of your logs?
Anyway, the method to filter events id described at https://docs.splunk.com/Documentation/Splunk/7.2.3/Forwarding/Routeandfilterdatad
Bye.
Giuseppe
Here is the sample data. I only want splunk to parse data with timestamp in it and exclude other garbage data.
19/01/21 05:32:15 WARN YarnAllocator: Expected to find pending requests, but found none.
^@^Fstdout^@^A0^@^A^@^@^E^@
^@^GVERSION/-^@+container_e217_1537606163373_5158_01_000001^E^Dnone^A^PÑ,Ñ,^C^Qdata:BCFile.index^DnoneÑ}^K^K^Pdata:TFile.index^DnoneÑB;;^Odata:TFile.meta^DnoneÑ<^F^F^@^@^@^@^@^@^E~H^@^A^@^@Ñ^QÓh~Qµ×¶9ßA@~RºáP
Ñ^QÓh~Qµ×¶9ßA@~RºáP ^@^GVERSION^D^@^@^@^A^Q^@^OAPPLICATION_ACL,^@
MODIFY_APP^@ s_ptheon ^@^HVIEW_APP^@ s_ptheon ^S^@^QAPPLICATION_OWNER
^@^Hs_ptheon-^@+container_e217_1537606163373_5158_01_000004Î^C^@^Fstderr^@^C491SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/mapr/lib/slf4j-log4j12-1.7.12.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/alluxio/alluxio-1.4.0/core/client/target/alluxio-core-client-1.4.0-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
^@^Fstdout^@^A0^@^A^@^@^D^@
^@^GVERSION/-^@+container_e217_1537606163373_5158_01_000004^D^Dnone^A^PÎ| Î| ^C^Qdata:BCFile.index^DnoneÎñ^K^K^Pdata:TFile.index^Dnoneζ;;^Odata:TFile.meta^Dnoneΰ^F^F^@^@^@^@^@^@^Bü^@^A^@^@Ñ^QÓh~Qµ×¶9ßA@~RºáP
If the above is a single event, you could send it to nullQueue as per the steps in the link posted above by Giuseppe. However, if you would want to only remove contents after 'found none'......, then you would need to filter them before indexing something like below:
https://docs.splunk.com/Documentation/Splunk/7.2.3/Admin/Propsconf
[yoursourcetype]
SEDCMD-removejvmlogs = s/^@.*//
This will need a restart of the indexer and any future messages will be masked/removed.
Can this be done at forwarder level instead of indexing. I want the data without timestamp should not come to splunk system.
Yes, if its a heavy forwarder. If you only have UF and indexer, you can add the rule in indexer and it will filter them out before indexing. So this will not be consuming your license/storage, not available in searches
Thanks lakshman , What rule will i have to add in inputs.conf?
As this is already coming to indexer, adding the SEDCMD-* in the props.conf in the indexer or heavyforwarder will do. no need to change inputs.conf
It may be possible that the JVM is throwing exceptions and unwanted lines. You may want to double check the line breaking (as some events could form part of multi-line and hence give you the impression that events are without timestamp) and parse them correctly. Each event has to have a timestamp.
if you share examples that would help.
yes, the scenario is similar to JVM throwing exception, i shared sample data with Giuseppe in above thread.