Splunk Enterprise

Why are events getting split by Splunk parser?

jabezds
Path Finder

HI,

We are trying to process and  ingest  aws s3 events into splunk, but noticed few events are getting split, after checking the configuration we realized this should be caused by splunk internal parsing algorithm. 

Please let us know if there is any issues in my configuration or could it be something related to splunk parser?

Below is the entries on props and transform.conf:

props-->

[proxy]
REPORT-proxylogs-fields = proxylogs_fields,extract_url_domain
LINE_BREAKER = ([\r\n]+)
# EVENT_BREAKER = ([\r\n]+)
# EVENT_BREAKER_ENABLE = true
SHOULD_LINEMERGE = false
CHARSET = AUTO
disabled = false
TRUNCATE = 1000000
MAX_EVENTS = 1000000
EVAL-product = "Umbrella"
EVAL-vendor = "xyz"
EVAL-vendor_product = "abc"
MAX_TIMESTAMP_LOOKAHEAD = 22
NO_BINARY_CHECK = true
TIME_PREFIX = ^
TIME_FORMAT = %Y-%m-%d %H:%M:%S
TZ = UTC

 

Transforms.conf -->

[proxylogs_fields]
DELIMS = ","
FIELDS = Timestamp,policy_identities,src,src_translated_ip,dest,content_type,action,url,http_referrer,http_user_agent,status,requestSize,responseSize,responseBodySize,sha256,category,av_detection,pua,amp_disposition,amp_malwarename,amp_score,policy_identity_type,blocked_category,identities,identity_type,request_method,dlp_status,certificate_errors,filename,rulesetID,ruleID,destinationListID,s3_filename

example of the events:

"2022-06-27 08:57:14","wer.com","1.1.1.1","1.1.1.1","10.10.10.10","image/gif","ALLOWED","https://www.moug.net/img/btn_learning.gif","https://www.mikhgg.net/tech/woopr/0025.html","Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.5005.124 Safari/537.36 Edg/102.0.1245.44","200","","3571","3328","1a146b09676811234dddccd6dc0ee3cf11aa1803e774df17aa9a49a7370a40ec","Allow List,Fashion","","","","","","AD Users","","wer.com","AD Users,Network Tunnels","GET","ALLOWED","","btn_learning.gif","13347559","346105","15065619",2022-06-27-09-50-ade8.csv.gz

 

Events as seen in splunk:

jabezds_0-1658500519516.png

 

 

Labels (2)
0 Karma

matt8679
Path Finder

Noticed one issue with your TIME_FORMAT. Looks like  your

You have:

TIME_FORMAT = %Y-%m-%d %H:%M:%S

Should be:

TIME_FORMAT = "%Y-%m-%d %H:%M:%S"

For the parsing issue, I have seen issues like this when we had a sourcetype from a different app with the same sourcetype name. I would run the btool to double check that is not the issue.

Another thing you could try is breaking on the gz at the end of the log. That is assuming that value is in every event

LINE_BREAKER = .csv.gz

0 Karma

sloshburch
Splunk Employee
Splunk Employee

Where are these props and transforms loading from? I don't see that sourcetype in Splunk Enterprise, any of your apps, or the Splunk Add-on for Amazon Web Services (AWS) ? I was looking to play with it and I don't see them.

The sourcetype here doesn't match the sourcetype in the screenshot. Here's it's "proxy" but in the screenshot there appears to be more to the name. I'm guessing you changed the name here in text to hide the blocked out part from the screenshot. Nonetheless it's worth highlighting to make sure the names match the data being viewed.

This is interesting because the config appears to be for splitting just on newlines but it's splitting in the middle of that text. Is there a hidden newline in that text? The truncate value seems high enough to not be at play.

What does `extract_url_domain` do? What's the config for that?

Have you used btool or the UI to ensure the fields are properly loading? Sometimes it's easier to overlook another config file that is overriding your desired settings.

0 Karma

jabezds
Path Finder

Thanks @richgalloway We tried this expression, btw I had to tweak the expression to  ([\r\n]+)\"\d{4}-\d\d , to accept the double quotes before the year, but im still facing the same issue. and the event is splitting exactly as the previous.

Is there any other parameters i'm missing in props.conf?

 

0 Karma

sloshburch
Splunk Employee
Splunk Employee

The config all looks correct. Just to be safe, make sure you restart splunk if you make changes to the config. It's possible that your changes were added to the conf file but Splunk didn't load them because it wasn't prompted to (which can happen with a restart, /debug/refresh, or "extract reload=t").

Also, it might help to see if other's have the same issue or if this happens on a clean install of Splunk.

Finally, if you have customer support then this type of basic sourcetype functionality could be something they may be able to help with.

0 Karma

richgalloway
SplunkTrust
SplunkTrust

It looks like line breaks need to be isolated to newlines that precede a date.  Try LINE_BREAKER = ([\r\n]+)\d{4}-\d\d

---
If this reply helps you, Karma would be appreciated.
0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...