Getting Data In

CSV log unexpectedly breaking by lines

NoSpaces
Contributor

Hello to everyone!
I have a Win server with Splunk UF installed that consumes MS Exchange logs
This logs is stored in CSV format

Splunk UF settings look like this:
props.conf

[exch_file_httpproxy-mapi]
ANNOTATE_PUNCT = false
BREAK_ONLY_BEFORE_DATE = true
INDEXED_EXTRACTIONS = csv
initCrcLength = 2735
HEADER_FIELD_LINE_NUMBER = 1
MAX_TIMESTAMP_LOOKAHEAD = 24
SHOULD_LINEMERGE = false
TIMESTAMP_FIELDS = DateTime
TRANSFORMS-no_column_headers = no_column_headers

transforms.conf

[no_column_headers]
REGEX = ^#.*
DEST_KEY = queue
FORMAT = nullQueue

 

Thanks to the data quality report on the indexers layer, I found out that this source type has some timestamp issues
I investigated this problem by executing a search on the searched layer and found surprising events breaking
You can see an example in the attachment
_raw data is OK and is not contain "unxepected" next-line characters

What is wrong with my settings?

Labels (1)
0 Karma

richgalloway
SplunkTrust
SplunkTrust

The props.conf settings are missing TIME_FORMAT.  Other settings may need to be changed, but we need to see the raw data (the CSV file before it gets to Splunk) to determine that.

---
If this reply helps you, Karma would be appreciated.
0 Karma

NoSpaces
Contributor

Is TIME_FORMAT necessarily? Events that are not broken haven't problem with timestamp determination 

This is the log example that contains headers, a preamble, and one event

DateTime,RequestId,MajorVersion,MinorVersion,BuildVersion,RevisionVersion,ClientRequestId,Protocol,UrlHost,UrlStem,ProtocolAction,AuthenticationType,IsAuthenticated,AuthenticatedUser,Organization,AnchorMailbox,UserAgent,ClientIpAddress,ServerHostName,HttpStatus,BackEndStatus,ErrorCode,Method,ProxyAction,TargetServer,TargetServerVersion,RoutingType,RoutingHint,BackEndCookie,ServerLocatorHost,ServerLocatorLatency,RequestBytes,ResponseBytes,TargetOutstandingRequests,AuthModulePerfContext,HttpPipelineLatency,CalculateTargetBackEndLatency,GlsLatencyBreakup,TotalGlsLatency,AccountForestLatencyBreakup,TotalAccountForestLatency,ResourceForestLatencyBreakup,TotalResourceForestLatency,ADLatency,SharedCacheLatencyBreakup,TotalSharedCacheLatency,ActivityContextLifeTime,ModuleToHandlerSwitchingLatency,ClientReqStreamLatency,BackendReqInitLatency,BackendReqStreamLatency,BackendProcessingLatency,BackendRespInitLatency,BackendRespStreamLatency,ClientRespStreamLatency,KerberosAuthHeaderLatency,HandlerCompletionLatency,RequestHandlerLatency,HandlerToModuleSwitchingLatency,ProxyTime,CoreLatency,RoutingLatency,HttpProxyOverhead,TotalRequestTime,RouteRefresherLatency,UrlQuery,BackEndGenericInfo,GenericInfo,GenericErrors,EdgeTraceId,DatabaseGuid,UserADObjectGuid,PartitionEndpointLookupLatency,RoutingStatus
#Software: Microsoft Exchange Server
#Version: 15.02.1118.040
#Log-type: HttpProxy Logs
#Date: 2024-02-20T14:00:01.019Z
#Fields: DateTime,RequestId,MajorVersion,MinorVersion,BuildVersion,RevisionVersion,ClientRequestId,Protocol,UrlHost,UrlStem,ProtocolAction,AuthenticationType,IsAuthenticated,AuthenticatedUser,Organization,AnchorMailbox,UserAgent,ClientIpAddress,ServerHostName,HttpStatus,BackEndStatus,ErrorCode,Method,ProxyAction,TargetServer,TargetServerVersion,RoutingType,RoutingHint,BackEndCookie,ServerLocatorHost,ServerLocatorLatency,RequestBytes,ResponseBytes,TargetOutstandingRequests,AuthModulePerfContext,HttpPipelineLatency,CalculateTargetBackEndLatency,GlsLatencyBreakup,TotalGlsLatency,AccountForestLatencyBreakup,TotalAccountForestLatency,ResourceForestLatencyBreakup,TotalResourceForestLatency,ADLatency,SharedCacheLatencyBreakup,TotalSharedCacheLatency,ActivityContextLifeTime,ModuleToHandlerSwitchingLatency,ClientReqStreamLatency,BackendReqInitLatency,BackendReqStreamLatency,BackendProcessingLatency,BackendRespInitLatency,BackendRespStreamLatency,ClientRespStreamLatency,KerberosAuthHeaderLatency,HandlerCompletionLatency,RequestHandlerLatency,HandlerToModuleSwitchingLatency,ProxyTime,CoreLatency,RoutingLatency,HttpProxyOverhead,TotalRequestTime,RouteRefresherLatency,UrlQuery,BackEndGenericInfo,GenericInfo,GenericErrors,EdgeTraceId,DatabaseGuid,UserADObjectGuid,PartitionEndpointLookupLatency,RoutingStatus
2024-02-20T14:00:00.980Z,c3581a8e-2033-4fa0-8dbf-3efdc06ba7c3,15,2,1118,40,{5745B4EE-6A69-4E12-8EBD-6AD2820CA5D1},Mapi,mail.domain.com,/mapi/nspi/,,,false,,,,Microsoft Office/15.0 (Windows NT 10.0; Microsoft Outlook 15.0.5589; Pro),172.16.5.94,SERVERMBX06,401,,,POST,,,,,,,,,13,,,,,,,,,,,,,,,38,,,,,,,,,,,,,,0,,0,0,,?MailboxId=5918ae5a-9281-4301-b94e-407395ba2824@domain.com,,BeginRequest=2024-02-20T14:00:00.980Z;CorrelationID=<empty>;SharedCacheGuard=0;EndRequest=2024-02-20T14:00:00.980Z;S:ServiceLatencyMetadata.AuthModuleLatency=0,,,,,,

 

0 Karma

richgalloway
SplunkTrust
SplunkTrust

TIME_FORMAT is one of the "Great 8" settings every sourcetype should have.  They help ensure events are onboarded properly.  See if these settings help.

[exch_file_httpproxy-mapi]
ANNOTATE_PUNCT = false
LINE_BREAKER = ([\r\n]+)\d\d\d\d-\d\d
INDEXED_EXTRACTIONS = csv
initCrcLength = 2735
HEADER_FIELD_LINE_NUMBER = 1
MAX_TIMESTAMP_LOOKAHEAD = 24
SHOULD_LINEMERGE = false
TIMESTAMP_FIELDS = DateTime
TIME_FORMAT = %Y-%m-%dT%H:%M:%S.%3N%Z
TRANSFORMS-no_column_headers = no_column_headers
EVENT_BREAKER_ENABLE = true
EVENT_BREAKER = ([\r\n]+)\d\d\d\d-\d\d
TRUNCATE = 10000

 

---
If this reply helps you, Karma would be appreciated.
0 Karma

NoSpaces
Contributor

Sorry for the long answer
I tested your settings and I can say with confidence that there is no difference
Events also unexpectedly break

0 Karma

NoSpaces
Contributor

Up

0 Karma
Get Updates on the Splunk Community!

Aligning Observability Costs with Business Value: Practical Strategies

 Join us for an engaging Tech Talk on Aligning Observability Costs with Business Value: Practical ...

Mastering Data Pipelines: Unlocking Value with Splunk

 In today's AI-driven world, organizations must balance the challenges of managing the explosion of data with ...

Splunk Up Your Game: Why It's Time to Embrace Python 3.9+ and OpenSSL 3.0

Did you know that for Splunk Enterprise 9.4, Python 3.9 is the default interpreter? This shift is not just a ...