Getting Data In

Filtering data out with RegEx based on blacklist works only partially

kepffr
Explorer

Hi guys!

I want to filter data out on my forwarder by using Regular Expression in transforms.conf. The strange thing is, that it only works partially but my regex itself is or should be fine.

transforms.conf

 

[deleteAdvertisingTracking]
REGEX=(\t)hostname=.*(adnxs|doubleclick|adsafeprotected|pubmatic|xiti|smartadserver|lijit|ads\.yahoo|insurads)\.(com|net)
DEST_KEY = queue
FORMAT = nullQueue

[deleteShopping]
REGEX=(\t)hostname=.*(amazon|ebay)\.(com|net)
DEST_KEY = queue
FORMAT = nullQueue

 

props.conf

 

[STANZA_NAME]
TRANSFORMS-DeleteStuff = deleteAdvertisingTracking,deleteShopping

 

The second Stanza named "deleteShopping" works just fine but not the first. I could observe that it stopped working with 3 or more substrings (e.g adnxs|doubleclick|adsafeprotected|pubmatic)  in the regex. I've tried adding "LOOKAHEAD = 65535" but that didn't help.

Of course I restarted the forwarder after the changes. Do you have any idea what's going wrong? I'm using Splunk v8.1.1.

Labels (2)
0 Karma

kepffr
Explorer

Hi again. I've tried more things but nothing worked. I think my assumption that it works with less substrings is wrong, I cannot reproduce that anymore. Things I've tried:

- Set MATCH_LIMIT to a higher value

- Replaced the sourcetype in the stanza specified in props.conf with source::tcp:1422

- Included a named group in the regex, e.g:

REGEX=(\t)hostname=(.*\.)?(?<site>amazon|ebay)\.(de|com|net)

Data to test:

 

2021-01-27 16:55:42	action=Allowed	event_id=6922469125184356358	protocol=SSL	category=Corporate Marketing	dest=1.1.1.1	http_referrer=None	http_user_agent=XXXXXXX	clientpublicIP=1.1.1	status=NA	user=something.something@something.com	url=ebayimg.ebay.com	hostname=ebayimg.ebay.com	clientIP=1.1.1	threatcategory=None	threatname=None	appname=XXX	pagerisk=0	department=XXXX	supercategory=XXXX	appclass=File Share	urlclass=XXXXX	threatclass=None
	bytes_out=2272	bytes_in=5100

 

props.conf

 

[XXXXXX]
DATETIME_CONFIG = CURRENT
TZ = UTC
[source::tcp:1422]
TRANSFORMS-DeleteStuff = deleteAdvertisingTracking,deleteShopping

 

transforms.conf

 

[deleteAdvertisingTracking]
REGEX=(\t)hostname=.*(adnxs|doubleclick|adsafeprotected|pubmatic|xiti|smartadserver|lijit|ads\.yahoo|insurads)\.(com|net)
DEST_KEY = queue
FORMAT = nullQueue

[deleteShopping]
REGEX=(\t)hostname=(.*\.)?(amazon|ebay)\.(de|com|net|cn)
DEST_KEY = queue
FORMAT = nullQueue

 

 

This is what the command "cmd btool transforms list deleteAds" says:

CAN_OPTIMIZE = True
CLEAN_KEYS = True
DEFAULT_VALUE =
DEPTH_LIMIT = 1000
DEST_KEY = queue
FORMAT = nullQueue
KEEP_EMPTY_VALS = False
LOOKAHEAD = 4096
MATCH_LIMIT = 100000
MV_ADD = False
REGEX = (\t)hostname=(.*\.)?(amazon|ebay)\.(de|com|net|cn)
SOURCE_KEY = _raw
WRITE_META = False

0 Karma

manjunathmeti
SplunkTrust
SplunkTrust

Is stanza name deleteAdvertisingTracking OR deleteAds in transformas.conf?

0 Karma

kepffr
Explorer

Both, I want to execute both stanzas. Is that not possible?

0 Karma

manjunathmeti
SplunkTrust
SplunkTrust

Yes, it is possible. Make sure that regex of any one of them is matching to the logs you want to send to the null queue.

0 Karma

kepffr
Explorer

Hi thanks for your response. The regex matches it, that's my problem.

0 Karma

manjunathmeti
SplunkTrust
SplunkTrust

Try with the below configurations without "\".

[deleteAdvertisingTracking]
REGEX hostname = .*(adnxs|doubleclick|adsafeprotected|pubmatic|xiti|smartadserver|lijit|ads.yahoo|insurads).(com|net)
DEST_KEY = queue
FORMAT = nullQueue

[deleteShopping]
REGEX = hostname=.*(amazon|ebay).(de|com|net|cn)
DEST_KEY = queue
FORMAT = nullQueue

 

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...