Getting Data In

CSV log unexpectedly breaking by lines

NoSpaces
Communicator

Hello to everyone!
I have a Win server with Splunk UF installed that consumes MS Exchange logs
This logs is stored in CSV format

Splunk UF settings look like this:
props.conf

[exch_file_httpproxy-mapi]
ANNOTATE_PUNCT = false
BREAK_ONLY_BEFORE_DATE = true
INDEXED_EXTRACTIONS = csv
initCrcLength = 2735
HEADER_FIELD_LINE_NUMBER = 1
MAX_TIMESTAMP_LOOKAHEAD = 24
SHOULD_LINEMERGE = false
TIMESTAMP_FIELDS = DateTime
TRANSFORMS-no_column_headers = no_column_headers

transforms.conf

[no_column_headers]
REGEX = ^#.*
DEST_KEY = queue
FORMAT = nullQueue

 

Thanks to the data quality report on the indexers layer, I found out that this source type has some timestamp issues
I investigated this problem by executing a search on the searched layer and found surprising events breaking
You can see an example in the attachment
_raw data is OK and is not contain "unxepected" next-line characters

What is wrong with my settings?

Labels (1)
0 Karma

richgalloway
SplunkTrust
SplunkTrust

The props.conf settings are missing TIME_FORMAT.  Other settings may need to be changed, but we need to see the raw data (the CSV file before it gets to Splunk) to determine that.

---
If this reply helps you, Karma would be appreciated.
0 Karma

NoSpaces
Communicator

Is TIME_FORMAT necessarily? Events that are not broken haven't problem with timestamp determination 

This is the log example that contains headers, a preamble, and one event

DateTime,RequestId,MajorVersion,MinorVersion,BuildVersion,RevisionVersion,ClientRequestId,Protocol,UrlHost,UrlStem,ProtocolAction,AuthenticationType,IsAuthenticated,AuthenticatedUser,Organization,AnchorMailbox,UserAgent,ClientIpAddress,ServerHostName,HttpStatus,BackEndStatus,ErrorCode,Method,ProxyAction,TargetServer,TargetServerVersion,RoutingType,RoutingHint,BackEndCookie,ServerLocatorHost,ServerLocatorLatency,RequestBytes,ResponseBytes,TargetOutstandingRequests,AuthModulePerfContext,HttpPipelineLatency,CalculateTargetBackEndLatency,GlsLatencyBreakup,TotalGlsLatency,AccountForestLatencyBreakup,TotalAccountForestLatency,ResourceForestLatencyBreakup,TotalResourceForestLatency,ADLatency,SharedCacheLatencyBreakup,TotalSharedCacheLatency,ActivityContextLifeTime,ModuleToHandlerSwitchingLatency,ClientReqStreamLatency,BackendReqInitLatency,BackendReqStreamLatency,BackendProcessingLatency,BackendRespInitLatency,BackendRespStreamLatency,ClientRespStreamLatency,KerberosAuthHeaderLatency,HandlerCompletionLatency,RequestHandlerLatency,HandlerToModuleSwitchingLatency,ProxyTime,CoreLatency,RoutingLatency,HttpProxyOverhead,TotalRequestTime,RouteRefresherLatency,UrlQuery,BackEndGenericInfo,GenericInfo,GenericErrors,EdgeTraceId,DatabaseGuid,UserADObjectGuid,PartitionEndpointLookupLatency,RoutingStatus
#Software: Microsoft Exchange Server
#Version: 15.02.1118.040
#Log-type: HttpProxy Logs
#Date: 2024-02-20T14:00:01.019Z
#Fields: DateTime,RequestId,MajorVersion,MinorVersion,BuildVersion,RevisionVersion,ClientRequestId,Protocol,UrlHost,UrlStem,ProtocolAction,AuthenticationType,IsAuthenticated,AuthenticatedUser,Organization,AnchorMailbox,UserAgent,ClientIpAddress,ServerHostName,HttpStatus,BackEndStatus,ErrorCode,Method,ProxyAction,TargetServer,TargetServerVersion,RoutingType,RoutingHint,BackEndCookie,ServerLocatorHost,ServerLocatorLatency,RequestBytes,ResponseBytes,TargetOutstandingRequests,AuthModulePerfContext,HttpPipelineLatency,CalculateTargetBackEndLatency,GlsLatencyBreakup,TotalGlsLatency,AccountForestLatencyBreakup,TotalAccountForestLatency,ResourceForestLatencyBreakup,TotalResourceForestLatency,ADLatency,SharedCacheLatencyBreakup,TotalSharedCacheLatency,ActivityContextLifeTime,ModuleToHandlerSwitchingLatency,ClientReqStreamLatency,BackendReqInitLatency,BackendReqStreamLatency,BackendProcessingLatency,BackendRespInitLatency,BackendRespStreamLatency,ClientRespStreamLatency,KerberosAuthHeaderLatency,HandlerCompletionLatency,RequestHandlerLatency,HandlerToModuleSwitchingLatency,ProxyTime,CoreLatency,RoutingLatency,HttpProxyOverhead,TotalRequestTime,RouteRefresherLatency,UrlQuery,BackEndGenericInfo,GenericInfo,GenericErrors,EdgeTraceId,DatabaseGuid,UserADObjectGuid,PartitionEndpointLookupLatency,RoutingStatus
2024-02-20T14:00:00.980Z,c3581a8e-2033-4fa0-8dbf-3efdc06ba7c3,15,2,1118,40,{5745B4EE-6A69-4E12-8EBD-6AD2820CA5D1},Mapi,mail.domain.com,/mapi/nspi/,,,false,,,,Microsoft Office/15.0 (Windows NT 10.0; Microsoft Outlook 15.0.5589; Pro),172.16.5.94,SERVERMBX06,401,,,POST,,,,,,,,,13,,,,,,,,,,,,,,,38,,,,,,,,,,,,,,0,,0,0,,?MailboxId=5918ae5a-9281-4301-b94e-407395ba2824@domain.com,,BeginRequest=2024-02-20T14:00:00.980Z;CorrelationID=<empty>;SharedCacheGuard=0;EndRequest=2024-02-20T14:00:00.980Z;S:ServiceLatencyMetadata.AuthModuleLatency=0,,,,,,

 

0 Karma

richgalloway
SplunkTrust
SplunkTrust

TIME_FORMAT is one of the "Great 8" settings every sourcetype should have.  They help ensure events are onboarded properly.  See if these settings help.

[exch_file_httpproxy-mapi]
ANNOTATE_PUNCT = false
LINE_BREAKER = ([\r\n]+)\d\d\d\d-\d\d
INDEXED_EXTRACTIONS = csv
initCrcLength = 2735
HEADER_FIELD_LINE_NUMBER = 1
MAX_TIMESTAMP_LOOKAHEAD = 24
SHOULD_LINEMERGE = false
TIMESTAMP_FIELDS = DateTime
TIME_FORMAT = %Y-%m-%dT%H:%M:%S.%3N%Z
TRANSFORMS-no_column_headers = no_column_headers
EVENT_BREAKER_ENABLE = true
EVENT_BREAKER = ([\r\n]+)\d\d\d\d-\d\d
TRUNCATE = 10000

 

---
If this reply helps you, Karma would be appreciated.
0 Karma

NoSpaces
Communicator

Sorry for the long answer
I tested your settings and I can say with confidence that there is no difference
Events also unexpectedly break

0 Karma

NoSpaces
Communicator

Up

0 Karma
Get Updates on the Splunk Community!

Detecting Remote Code Executions With the Splunk Threat Research Team

WATCH NOWRemote code execution (RCE) vulnerabilities pose a significant risk to organizations. If exploited, ...

Enter the Splunk Community Dashboard Challenge for Your Chance to Win!

The Splunk Community Dashboard Challenge is underway! This is your chance to showcase your skills in creating ...

.conf24 | Session Scheduler is Live!!

.conf24 is happening June 11 - 14 in Las Vegas, and we are thrilled to announce that the conference catalog ...