Getting Data In

CSV log unexpectedly breaking by lines

NoSpaces
Contributor

Hello to everyone!
I have a Win server with Splunk UF installed that consumes MS Exchange logs
This logs is stored in CSV format

Splunk UF settings look like this:
props.conf

[exch_file_httpproxy-mapi]
ANNOTATE_PUNCT = false
BREAK_ONLY_BEFORE_DATE = true
INDEXED_EXTRACTIONS = csv
initCrcLength = 2735
HEADER_FIELD_LINE_NUMBER = 1
MAX_TIMESTAMP_LOOKAHEAD = 24
SHOULD_LINEMERGE = false
TIMESTAMP_FIELDS = DateTime
TRANSFORMS-no_column_headers = no_column_headers

transforms.conf

[no_column_headers]
REGEX = ^#.*
DEST_KEY = queue
FORMAT = nullQueue

 

Thanks to the data quality report on the indexers layer, I found out that this source type has some timestamp issues
I investigated this problem by executing a search on the searched layer and found surprising events breaking
You can see an example in the attachment
_raw data is OK and is not contain "unxepected" next-line characters

What is wrong with my settings?

Labels (1)
0 Karma

richgalloway
SplunkTrust
SplunkTrust

The props.conf settings are missing TIME_FORMAT.  Other settings may need to be changed, but we need to see the raw data (the CSV file before it gets to Splunk) to determine that.

---
If this reply helps you, Karma would be appreciated.
0 Karma

NoSpaces
Contributor

Is TIME_FORMAT necessarily? Events that are not broken haven't problem with timestamp determination 

This is the log example that contains headers, a preamble, and one event

DateTime,RequestId,MajorVersion,MinorVersion,BuildVersion,RevisionVersion,ClientRequestId,Protocol,UrlHost,UrlStem,ProtocolAction,AuthenticationType,IsAuthenticated,AuthenticatedUser,Organization,AnchorMailbox,UserAgent,ClientIpAddress,ServerHostName,HttpStatus,BackEndStatus,ErrorCode,Method,ProxyAction,TargetServer,TargetServerVersion,RoutingType,RoutingHint,BackEndCookie,ServerLocatorHost,ServerLocatorLatency,RequestBytes,ResponseBytes,TargetOutstandingRequests,AuthModulePerfContext,HttpPipelineLatency,CalculateTargetBackEndLatency,GlsLatencyBreakup,TotalGlsLatency,AccountForestLatencyBreakup,TotalAccountForestLatency,ResourceForestLatencyBreakup,TotalResourceForestLatency,ADLatency,SharedCacheLatencyBreakup,TotalSharedCacheLatency,ActivityContextLifeTime,ModuleToHandlerSwitchingLatency,ClientReqStreamLatency,BackendReqInitLatency,BackendReqStreamLatency,BackendProcessingLatency,BackendRespInitLatency,BackendRespStreamLatency,ClientRespStreamLatency,KerberosAuthHeaderLatency,HandlerCompletionLatency,RequestHandlerLatency,HandlerToModuleSwitchingLatency,ProxyTime,CoreLatency,RoutingLatency,HttpProxyOverhead,TotalRequestTime,RouteRefresherLatency,UrlQuery,BackEndGenericInfo,GenericInfo,GenericErrors,EdgeTraceId,DatabaseGuid,UserADObjectGuid,PartitionEndpointLookupLatency,RoutingStatus
#Software: Microsoft Exchange Server
#Version: 15.02.1118.040
#Log-type: HttpProxy Logs
#Date: 2024-02-20T14:00:01.019Z
#Fields: DateTime,RequestId,MajorVersion,MinorVersion,BuildVersion,RevisionVersion,ClientRequestId,Protocol,UrlHost,UrlStem,ProtocolAction,AuthenticationType,IsAuthenticated,AuthenticatedUser,Organization,AnchorMailbox,UserAgent,ClientIpAddress,ServerHostName,HttpStatus,BackEndStatus,ErrorCode,Method,ProxyAction,TargetServer,TargetServerVersion,RoutingType,RoutingHint,BackEndCookie,ServerLocatorHost,ServerLocatorLatency,RequestBytes,ResponseBytes,TargetOutstandingRequests,AuthModulePerfContext,HttpPipelineLatency,CalculateTargetBackEndLatency,GlsLatencyBreakup,TotalGlsLatency,AccountForestLatencyBreakup,TotalAccountForestLatency,ResourceForestLatencyBreakup,TotalResourceForestLatency,ADLatency,SharedCacheLatencyBreakup,TotalSharedCacheLatency,ActivityContextLifeTime,ModuleToHandlerSwitchingLatency,ClientReqStreamLatency,BackendReqInitLatency,BackendReqStreamLatency,BackendProcessingLatency,BackendRespInitLatency,BackendRespStreamLatency,ClientRespStreamLatency,KerberosAuthHeaderLatency,HandlerCompletionLatency,RequestHandlerLatency,HandlerToModuleSwitchingLatency,ProxyTime,CoreLatency,RoutingLatency,HttpProxyOverhead,TotalRequestTime,RouteRefresherLatency,UrlQuery,BackEndGenericInfo,GenericInfo,GenericErrors,EdgeTraceId,DatabaseGuid,UserADObjectGuid,PartitionEndpointLookupLatency,RoutingStatus
2024-02-20T14:00:00.980Z,c3581a8e-2033-4fa0-8dbf-3efdc06ba7c3,15,2,1118,40,{5745B4EE-6A69-4E12-8EBD-6AD2820CA5D1},Mapi,mail.domain.com,/mapi/nspi/,,,false,,,,Microsoft Office/15.0 (Windows NT 10.0; Microsoft Outlook 15.0.5589; Pro),172.16.5.94,SERVERMBX06,401,,,POST,,,,,,,,,13,,,,,,,,,,,,,,,38,,,,,,,,,,,,,,0,,0,0,,?MailboxId=5918ae5a-9281-4301-b94e-407395ba2824@domain.com,,BeginRequest=2024-02-20T14:00:00.980Z;CorrelationID=<empty>;SharedCacheGuard=0;EndRequest=2024-02-20T14:00:00.980Z;S:ServiceLatencyMetadata.AuthModuleLatency=0,,,,,,

 

0 Karma

richgalloway
SplunkTrust
SplunkTrust

TIME_FORMAT is one of the "Great 8" settings every sourcetype should have.  They help ensure events are onboarded properly.  See if these settings help.

[exch_file_httpproxy-mapi]
ANNOTATE_PUNCT = false
LINE_BREAKER = ([\r\n]+)\d\d\d\d-\d\d
INDEXED_EXTRACTIONS = csv
initCrcLength = 2735
HEADER_FIELD_LINE_NUMBER = 1
MAX_TIMESTAMP_LOOKAHEAD = 24
SHOULD_LINEMERGE = false
TIMESTAMP_FIELDS = DateTime
TIME_FORMAT = %Y-%m-%dT%H:%M:%S.%3N%Z
TRANSFORMS-no_column_headers = no_column_headers
EVENT_BREAKER_ENABLE = true
EVENT_BREAKER = ([\r\n]+)\d\d\d\d-\d\d
TRUNCATE = 10000

 

---
If this reply helps you, Karma would be appreciated.
0 Karma

NoSpaces
Contributor

Sorry for the long answer
I tested your settings and I can say with confidence that there is no difference
Events also unexpectedly break

0 Karma

NoSpaces
Contributor

Up

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Keep the Learning Going with the New Best of .conf Hub

Hello Splunkers, With .conf26 getting closer, there’s already a lot of excitement building around this year’s ...

Splunk Community Badges!

  Hey everyone! Ready to earn some serious bragging rights in the community? Along with our existing badges ...

How to find the worst searches in your Splunk environment and how to fix them

Everyone knows Splunk is a powerful platform for running searches and doing data analytics. Your ...