Getting Data In

CSV log unexpectedly breaking by lines

NoSpaces
Path Finder

Hello to everyone!
I have a Win server with Splunk UF installed that consumes MS Exchange logs
This logs is stored in CSV format

Splunk UF settings look like this:
props.conf

[exch_file_httpproxy-mapi]
ANNOTATE_PUNCT = false
BREAK_ONLY_BEFORE_DATE = true
INDEXED_EXTRACTIONS = csv
initCrcLength = 2735
HEADER_FIELD_LINE_NUMBER = 1
MAX_TIMESTAMP_LOOKAHEAD = 24
SHOULD_LINEMERGE = false
TIMESTAMP_FIELDS = DateTime
TRANSFORMS-no_column_headers = no_column_headers

transforms.conf

[no_column_headers]
REGEX = ^#.*
DEST_KEY = queue
FORMAT = nullQueue

 

Thanks to the data quality report on the indexers layer, I found out that this source type has some timestamp issues
I investigated this problem by executing a search on the searched layer and found surprising events breaking
You can see an example in the attachment
_raw data is OK and is not contain "unxepected" next-line characters

What is wrong with my settings?

Labels (1)
0 Karma

richgalloway
SplunkTrust
SplunkTrust

The props.conf settings are missing TIME_FORMAT.  Other settings may need to be changed, but we need to see the raw data (the CSV file before it gets to Splunk) to determine that.

---
If this reply helps you, Karma would be appreciated.
0 Karma

NoSpaces
Path Finder

Is TIME_FORMAT necessarily? Events that are not broken haven't problem with timestamp determination 

This is the log example that contains headers, a preamble, and one event

DateTime,RequestId,MajorVersion,MinorVersion,BuildVersion,RevisionVersion,ClientRequestId,Protocol,UrlHost,UrlStem,ProtocolAction,AuthenticationType,IsAuthenticated,AuthenticatedUser,Organization,AnchorMailbox,UserAgent,ClientIpAddress,ServerHostName,HttpStatus,BackEndStatus,ErrorCode,Method,ProxyAction,TargetServer,TargetServerVersion,RoutingType,RoutingHint,BackEndCookie,ServerLocatorHost,ServerLocatorLatency,RequestBytes,ResponseBytes,TargetOutstandingRequests,AuthModulePerfContext,HttpPipelineLatency,CalculateTargetBackEndLatency,GlsLatencyBreakup,TotalGlsLatency,AccountForestLatencyBreakup,TotalAccountForestLatency,ResourceForestLatencyBreakup,TotalResourceForestLatency,ADLatency,SharedCacheLatencyBreakup,TotalSharedCacheLatency,ActivityContextLifeTime,ModuleToHandlerSwitchingLatency,ClientReqStreamLatency,BackendReqInitLatency,BackendReqStreamLatency,BackendProcessingLatency,BackendRespInitLatency,BackendRespStreamLatency,ClientRespStreamLatency,KerberosAuthHeaderLatency,HandlerCompletionLatency,RequestHandlerLatency,HandlerToModuleSwitchingLatency,ProxyTime,CoreLatency,RoutingLatency,HttpProxyOverhead,TotalRequestTime,RouteRefresherLatency,UrlQuery,BackEndGenericInfo,GenericInfo,GenericErrors,EdgeTraceId,DatabaseGuid,UserADObjectGuid,PartitionEndpointLookupLatency,RoutingStatus
#Software: Microsoft Exchange Server
#Version: 15.02.1118.040
#Log-type: HttpProxy Logs
#Date: 2024-02-20T14:00:01.019Z
#Fields: DateTime,RequestId,MajorVersion,MinorVersion,BuildVersion,RevisionVersion,ClientRequestId,Protocol,UrlHost,UrlStem,ProtocolAction,AuthenticationType,IsAuthenticated,AuthenticatedUser,Organization,AnchorMailbox,UserAgent,ClientIpAddress,ServerHostName,HttpStatus,BackEndStatus,ErrorCode,Method,ProxyAction,TargetServer,TargetServerVersion,RoutingType,RoutingHint,BackEndCookie,ServerLocatorHost,ServerLocatorLatency,RequestBytes,ResponseBytes,TargetOutstandingRequests,AuthModulePerfContext,HttpPipelineLatency,CalculateTargetBackEndLatency,GlsLatencyBreakup,TotalGlsLatency,AccountForestLatencyBreakup,TotalAccountForestLatency,ResourceForestLatencyBreakup,TotalResourceForestLatency,ADLatency,SharedCacheLatencyBreakup,TotalSharedCacheLatency,ActivityContextLifeTime,ModuleToHandlerSwitchingLatency,ClientReqStreamLatency,BackendReqInitLatency,BackendReqStreamLatency,BackendProcessingLatency,BackendRespInitLatency,BackendRespStreamLatency,ClientRespStreamLatency,KerberosAuthHeaderLatency,HandlerCompletionLatency,RequestHandlerLatency,HandlerToModuleSwitchingLatency,ProxyTime,CoreLatency,RoutingLatency,HttpProxyOverhead,TotalRequestTime,RouteRefresherLatency,UrlQuery,BackEndGenericInfo,GenericInfo,GenericErrors,EdgeTraceId,DatabaseGuid,UserADObjectGuid,PartitionEndpointLookupLatency,RoutingStatus
2024-02-20T14:00:00.980Z,c3581a8e-2033-4fa0-8dbf-3efdc06ba7c3,15,2,1118,40,{5745B4EE-6A69-4E12-8EBD-6AD2820CA5D1},Mapi,mail.domain.com,/mapi/nspi/,,,false,,,,Microsoft Office/15.0 (Windows NT 10.0; Microsoft Outlook 15.0.5589; Pro),172.16.5.94,SERVERMBX06,401,,,POST,,,,,,,,,13,,,,,,,,,,,,,,,38,,,,,,,,,,,,,,0,,0,0,,?MailboxId=5918ae5a-9281-4301-b94e-407395ba2824@domain.com,,BeginRequest=2024-02-20T14:00:00.980Z;CorrelationID=<empty>;SharedCacheGuard=0;EndRequest=2024-02-20T14:00:00.980Z;S:ServiceLatencyMetadata.AuthModuleLatency=0,,,,,,

 

0 Karma

richgalloway
SplunkTrust
SplunkTrust

TIME_FORMAT is one of the "Great 8" settings every sourcetype should have.  They help ensure events are onboarded properly.  See if these settings help.

[exch_file_httpproxy-mapi]
ANNOTATE_PUNCT = false
LINE_BREAKER = ([\r\n]+)\d\d\d\d-\d\d
INDEXED_EXTRACTIONS = csv
initCrcLength = 2735
HEADER_FIELD_LINE_NUMBER = 1
MAX_TIMESTAMP_LOOKAHEAD = 24
SHOULD_LINEMERGE = false
TIMESTAMP_FIELDS = DateTime
TIME_FORMAT = %Y-%m-%dT%H:%M:%S.%3N%Z
TRANSFORMS-no_column_headers = no_column_headers
EVENT_BREAKER_ENABLE = true
EVENT_BREAKER = ([\r\n]+)\d\d\d\d-\d\d
TRUNCATE = 10000

 

---
If this reply helps you, Karma would be appreciated.
0 Karma

NoSpaces
Path Finder

Sorry for the long answer
I tested your settings and I can say with confidence that there is no difference
Events also unexpectedly break

0 Karma

NoSpaces
Path Finder

Up

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...