Getting Data In

Filtering data out with RegEx based on blacklist works only partially

kepffr
Explorer

Hi guys!

I want to filter data out on my forwarder by using Regular Expression in transforms.conf. The strange thing is, that it only works partially but my regex itself is or should be fine.

transforms.conf

 

[deleteAdvertisingTracking]
REGEX=(\t)hostname=.*(adnxs|doubleclick|adsafeprotected|pubmatic|xiti|smartadserver|lijit|ads\.yahoo|insurads)\.(com|net)
DEST_KEY = queue
FORMAT = nullQueue

[deleteShopping]
REGEX=(\t)hostname=.*(amazon|ebay)\.(com|net)
DEST_KEY = queue
FORMAT = nullQueue

 

props.conf

 

[STANZA_NAME]
TRANSFORMS-DeleteStuff = deleteAdvertisingTracking,deleteShopping

 

The second Stanza named "deleteShopping" works just fine but not the first. I could observe that it stopped working with 3 or more substrings (e.g adnxs|doubleclick|adsafeprotected|pubmatic)  in the regex. I've tried adding "LOOKAHEAD = 65535" but that didn't help.

Of course I restarted the forwarder after the changes. Do you have any idea what's going wrong? I'm using Splunk v8.1.1.

Labels (2)
0 Karma

kepffr
Explorer

Hi again. I've tried more things but nothing worked. I think my assumption that it works with less substrings is wrong, I cannot reproduce that anymore. Things I've tried:

- Set MATCH_LIMIT to a higher value

- Replaced the sourcetype in the stanza specified in props.conf with source::tcp:1422

- Included a named group in the regex, e.g:

REGEX=(\t)hostname=(.*\.)?(?<site>amazon|ebay)\.(de|com|net)

Data to test:

 

2021-01-27 16:55:42	action=Allowed	event_id=6922469125184356358	protocol=SSL	category=Corporate Marketing	dest=1.1.1.1	http_referrer=None	http_user_agent=XXXXXXX	clientpublicIP=1.1.1	status=NA	user=something.something@something.com	url=ebayimg.ebay.com	hostname=ebayimg.ebay.com	clientIP=1.1.1	threatcategory=None	threatname=None	appname=XXX	pagerisk=0	department=XXXX	supercategory=XXXX	appclass=File Share	urlclass=XXXXX	threatclass=None
	bytes_out=2272	bytes_in=5100

 

props.conf

 

[XXXXXX]
DATETIME_CONFIG = CURRENT
TZ = UTC
[source::tcp:1422]
TRANSFORMS-DeleteStuff = deleteAdvertisingTracking,deleteShopping

 

transforms.conf

 

[deleteAdvertisingTracking]
REGEX=(\t)hostname=.*(adnxs|doubleclick|adsafeprotected|pubmatic|xiti|smartadserver|lijit|ads\.yahoo|insurads)\.(com|net)
DEST_KEY = queue
FORMAT = nullQueue

[deleteShopping]
REGEX=(\t)hostname=(.*\.)?(amazon|ebay)\.(de|com|net|cn)
DEST_KEY = queue
FORMAT = nullQueue

 

 

This is what the command "cmd btool transforms list deleteAds" says:

CAN_OPTIMIZE = True
CLEAN_KEYS = True
DEFAULT_VALUE =
DEPTH_LIMIT = 1000
DEST_KEY = queue
FORMAT = nullQueue
KEEP_EMPTY_VALS = False
LOOKAHEAD = 4096
MATCH_LIMIT = 100000
MV_ADD = False
REGEX = (\t)hostname=(.*\.)?(amazon|ebay)\.(de|com|net|cn)
SOURCE_KEY = _raw
WRITE_META = False

0 Karma

manjunathmeti
Champion

Is stanza name deleteAdvertisingTracking OR deleteAds in transformas.conf?

0 Karma

kepffr
Explorer

Both, I want to execute both stanzas. Is that not possible?

0 Karma

manjunathmeti
Champion

Yes, it is possible. Make sure that regex of any one of them is matching to the logs you want to send to the null queue.

0 Karma

kepffr
Explorer

Hi thanks for your response. The regex matches it, that's my problem.

0 Karma

manjunathmeti
Champion

Try with the below configurations without "\".

[deleteAdvertisingTracking]
REGEX hostname = .*(adnxs|doubleclick|adsafeprotected|pubmatic|xiti|smartadserver|lijit|ads.yahoo|insurads).(com|net)
DEST_KEY = queue
FORMAT = nullQueue

[deleteShopping]
REGEX = hostname=.*(amazon|ebay).(de|com|net|cn)
DEST_KEY = queue
FORMAT = nullQueue

 

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Why Splunk Customers Should Attend Cisco Live 2026 Las Vegas

Why Splunk Customers Should Attend Cisco Live 2026 Las Vegas     Cisco Live 2026 is almost here, and this ...

What Is the Name of the USB Key Inserted by Bob Smith? (BOTS Hint, Not the Answer)

Hello Splunkers,   So you searched, “what is the name of the usb key inserted by bob smith?”  Not gonna lie… ...

Automating Threat Operations and Threat Hunting with Recorded Future

    Automating Threat Operations and Threat Hunting with Recorded Future June 29, 2026 | Register   Is your ...