Getting Data In

CSV log unexpectedly breaking by lines

NoSpaces
Contributor

Hello to everyone!
I have a Win server with Splunk UF installed that consumes MS Exchange logs
This logs is stored in CSV format

Splunk UF settings look like this:
props.conf

[exch_file_httpproxy-mapi]
ANNOTATE_PUNCT = false
BREAK_ONLY_BEFORE_DATE = true
INDEXED_EXTRACTIONS = csv
initCrcLength = 2735
HEADER_FIELD_LINE_NUMBER = 1
MAX_TIMESTAMP_LOOKAHEAD = 24
SHOULD_LINEMERGE = false
TIMESTAMP_FIELDS = DateTime
TRANSFORMS-no_column_headers = no_column_headers

transforms.conf

[no_column_headers]
REGEX = ^#.*
DEST_KEY = queue
FORMAT = nullQueue

 

Thanks to the data quality report on the indexers layer, I found out that this source type has some timestamp issues
I investigated this problem by executing a search on the searched layer and found surprising events breaking
You can see an example in the attachment
_raw data is OK and is not contain "unxepected" next-line characters

What is wrong with my settings?

Labels (1)
0 Karma

richgalloway
SplunkTrust
SplunkTrust

The props.conf settings are missing TIME_FORMAT.  Other settings may need to be changed, but we need to see the raw data (the CSV file before it gets to Splunk) to determine that.

---
If this reply helps you, Karma would be appreciated.
0 Karma

NoSpaces
Contributor

Is TIME_FORMAT necessarily? Events that are not broken haven't problem with timestamp determination 

This is the log example that contains headers, a preamble, and one event

DateTime,RequestId,MajorVersion,MinorVersion,BuildVersion,RevisionVersion,ClientRequestId,Protocol,UrlHost,UrlStem,ProtocolAction,AuthenticationType,IsAuthenticated,AuthenticatedUser,Organization,AnchorMailbox,UserAgent,ClientIpAddress,ServerHostName,HttpStatus,BackEndStatus,ErrorCode,Method,ProxyAction,TargetServer,TargetServerVersion,RoutingType,RoutingHint,BackEndCookie,ServerLocatorHost,ServerLocatorLatency,RequestBytes,ResponseBytes,TargetOutstandingRequests,AuthModulePerfContext,HttpPipelineLatency,CalculateTargetBackEndLatency,GlsLatencyBreakup,TotalGlsLatency,AccountForestLatencyBreakup,TotalAccountForestLatency,ResourceForestLatencyBreakup,TotalResourceForestLatency,ADLatency,SharedCacheLatencyBreakup,TotalSharedCacheLatency,ActivityContextLifeTime,ModuleToHandlerSwitchingLatency,ClientReqStreamLatency,BackendReqInitLatency,BackendReqStreamLatency,BackendProcessingLatency,BackendRespInitLatency,BackendRespStreamLatency,ClientRespStreamLatency,KerberosAuthHeaderLatency,HandlerCompletionLatency,RequestHandlerLatency,HandlerToModuleSwitchingLatency,ProxyTime,CoreLatency,RoutingLatency,HttpProxyOverhead,TotalRequestTime,RouteRefresherLatency,UrlQuery,BackEndGenericInfo,GenericInfo,GenericErrors,EdgeTraceId,DatabaseGuid,UserADObjectGuid,PartitionEndpointLookupLatency,RoutingStatus
#Software: Microsoft Exchange Server
#Version: 15.02.1118.040
#Log-type: HttpProxy Logs
#Date: 2024-02-20T14:00:01.019Z
#Fields: DateTime,RequestId,MajorVersion,MinorVersion,BuildVersion,RevisionVersion,ClientRequestId,Protocol,UrlHost,UrlStem,ProtocolAction,AuthenticationType,IsAuthenticated,AuthenticatedUser,Organization,AnchorMailbox,UserAgent,ClientIpAddress,ServerHostName,HttpStatus,BackEndStatus,ErrorCode,Method,ProxyAction,TargetServer,TargetServerVersion,RoutingType,RoutingHint,BackEndCookie,ServerLocatorHost,ServerLocatorLatency,RequestBytes,ResponseBytes,TargetOutstandingRequests,AuthModulePerfContext,HttpPipelineLatency,CalculateTargetBackEndLatency,GlsLatencyBreakup,TotalGlsLatency,AccountForestLatencyBreakup,TotalAccountForestLatency,ResourceForestLatencyBreakup,TotalResourceForestLatency,ADLatency,SharedCacheLatencyBreakup,TotalSharedCacheLatency,ActivityContextLifeTime,ModuleToHandlerSwitchingLatency,ClientReqStreamLatency,BackendReqInitLatency,BackendReqStreamLatency,BackendProcessingLatency,BackendRespInitLatency,BackendRespStreamLatency,ClientRespStreamLatency,KerberosAuthHeaderLatency,HandlerCompletionLatency,RequestHandlerLatency,HandlerToModuleSwitchingLatency,ProxyTime,CoreLatency,RoutingLatency,HttpProxyOverhead,TotalRequestTime,RouteRefresherLatency,UrlQuery,BackEndGenericInfo,GenericInfo,GenericErrors,EdgeTraceId,DatabaseGuid,UserADObjectGuid,PartitionEndpointLookupLatency,RoutingStatus
2024-02-20T14:00:00.980Z,c3581a8e-2033-4fa0-8dbf-3efdc06ba7c3,15,2,1118,40,{5745B4EE-6A69-4E12-8EBD-6AD2820CA5D1},Mapi,mail.domain.com,/mapi/nspi/,,,false,,,,Microsoft Office/15.0 (Windows NT 10.0; Microsoft Outlook 15.0.5589; Pro),172.16.5.94,SERVERMBX06,401,,,POST,,,,,,,,,13,,,,,,,,,,,,,,,38,,,,,,,,,,,,,,0,,0,0,,?MailboxId=5918ae5a-9281-4301-b94e-407395ba2824@domain.com,,BeginRequest=2024-02-20T14:00:00.980Z;CorrelationID=<empty>;SharedCacheGuard=0;EndRequest=2024-02-20T14:00:00.980Z;S:ServiceLatencyMetadata.AuthModuleLatency=0,,,,,,

 

0 Karma

richgalloway
SplunkTrust
SplunkTrust

TIME_FORMAT is one of the "Great 8" settings every sourcetype should have.  They help ensure events are onboarded properly.  See if these settings help.

[exch_file_httpproxy-mapi]
ANNOTATE_PUNCT = false
LINE_BREAKER = ([\r\n]+)\d\d\d\d-\d\d
INDEXED_EXTRACTIONS = csv
initCrcLength = 2735
HEADER_FIELD_LINE_NUMBER = 1
MAX_TIMESTAMP_LOOKAHEAD = 24
SHOULD_LINEMERGE = false
TIMESTAMP_FIELDS = DateTime
TIME_FORMAT = %Y-%m-%dT%H:%M:%S.%3N%Z
TRANSFORMS-no_column_headers = no_column_headers
EVENT_BREAKER_ENABLE = true
EVENT_BREAKER = ([\r\n]+)\d\d\d\d-\d\d
TRUNCATE = 10000

 

---
If this reply helps you, Karma would be appreciated.
0 Karma

NoSpaces
Contributor

Sorry for the long answer
I tested your settings and I can say with confidence that there is no difference
Events also unexpectedly break

0 Karma

NoSpaces
Contributor

Up

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Index This | What travels the world but is also stuck in place?

April 2026 Edition  Hayyy Splunk Education Enthusiasts and the Eternally Curious!   We’re back with this ...

Discover New Use Cases: Unlock Greater Value from Your Existing Splunk Data

Realizing the full potential of your Splunk investment requires more than just understanding current usage; it ...

Continue Your Journey: Join Session 2 of the Data Management and Federation Bootcamp ...

As data volumes continue to grow and environments become more distributed, managing and optimizing data ...