Hi guys!
I want to filter data out on my forwarder by using Regular Expression in transforms.conf. The strange thing is, that it only works partially but my regex itself is or should be fine.
transforms.conf
[deleteAdvertisingTracking]
REGEX=(\t)hostname=.*(adnxs|doubleclick|adsafeprotected|pubmatic|xiti|smartadserver|lijit|ads\.yahoo|insurads)\.(com|net)
DEST_KEY = queue
FORMAT = nullQueue
[deleteShopping]
REGEX=(\t)hostname=.*(amazon|ebay)\.(com|net)
DEST_KEY = queue
FORMAT = nullQueue
props.conf
[STANZA_NAME]
TRANSFORMS-DeleteStuff = deleteAdvertisingTracking,deleteShopping
The second Stanza named "deleteShopping" works just fine but not the first. I could observe that it stopped working with 3 or more substrings (e.g adnxs|doubleclick|adsafeprotected|pubmatic) in the regex. I've tried adding "LOOKAHEAD = 65535" but that didn't help.
Of course I restarted the forwarder after the changes. Do you have any idea what's going wrong? I'm using Splunk v8.1.1.
Hi again. I've tried more things but nothing worked. I think my assumption that it works with less substrings is wrong, I cannot reproduce that anymore. Things I've tried:
- Set MATCH_LIMIT to a higher value
- Replaced the sourcetype in the stanza specified in props.conf with source::tcp:1422
- Included a named group in the regex, e.g:
REGEX=(\t)hostname=(.*\.)?(?<site>amazon|ebay)\.(de|com|net)
Data to test:
2021-01-27 16:55:42 action=Allowed event_id=6922469125184356358 protocol=SSL category=Corporate Marketing dest=1.1.1.1 http_referrer=None http_user_agent=XXXXXXX clientpublicIP=1.1.1 status=NA user=something.something@something.com url=ebayimg.ebay.com hostname=ebayimg.ebay.com clientIP=1.1.1 threatcategory=None threatname=None appname=XXX pagerisk=0 department=XXXX supercategory=XXXX appclass=File Share urlclass=XXXXX threatclass=None
bytes_out=2272 bytes_in=5100
props.conf
[XXXXXX]
DATETIME_CONFIG = CURRENT
TZ = UTC
[source::tcp:1422]
TRANSFORMS-DeleteStuff = deleteAdvertisingTracking,deleteShopping
transforms.conf
[deleteAdvertisingTracking]
REGEX=(\t)hostname=.*(adnxs|doubleclick|adsafeprotected|pubmatic|xiti|smartadserver|lijit|ads\.yahoo|insurads)\.(com|net)
DEST_KEY = queue
FORMAT = nullQueue
[deleteShopping]
REGEX=(\t)hostname=(.*\.)?(amazon|ebay)\.(de|com|net|cn)
DEST_KEY = queue
FORMAT = nullQueue
This is what the command "cmd btool transforms list deleteAds" says:
CAN_OPTIMIZE = True
CLEAN_KEYS = True
DEFAULT_VALUE =
DEPTH_LIMIT = 1000
DEST_KEY = queue
FORMAT = nullQueue
KEEP_EMPTY_VALS = False
LOOKAHEAD = 4096
MATCH_LIMIT = 100000
MV_ADD = False
REGEX = (\t)hostname=(.*\.)?(amazon|ebay)\.(de|com|net|cn)
SOURCE_KEY = _raw
WRITE_META = False
Is stanza name deleteAdvertisingTracking OR deleteAds in transformas.conf?
Both, I want to execute both stanzas. Is that not possible?
Yes, it is possible. Make sure that regex of any one of them is matching to the logs you want to send to the null queue.
Hi thanks for your response. The regex matches it, that's my problem.
Try with the below configurations without "\".
[deleteAdvertisingTracking]
REGEX hostname = .*(adnxs|doubleclick|adsafeprotected|pubmatic|xiti|smartadserver|lijit|ads.yahoo|insurads).(com|net)
DEST_KEY = queue
FORMAT = nullQueue
[deleteShopping]
REGEX = hostname=.*(amazon|ebay).(de|com|net|cn)
DEST_KEY = queue
FORMAT = nullQueue