(I am an absolute novice at this, the answer maybe obvious but I am still learning the trade please bear with me)
For this exercise I am trying to index the whole site e.g www.lkm93.com whilst avoiding massive file names that may cause my daily indexing allowance to go over the limit.
The regex I have figured out so far is:
[waf_exclude]
DEST_KEY = queue
FORMAT = nullQueue
REGEX = .*\(tif|mp3|jpg|js|css|mp4|java|waf|png|gif|svg|jpeg|JPG|JS|JPEG|MID|MIDI|MP3|MP4|MPG|MPEG|PDF|PNG|TIFF|TXT|WAV|ZIP)
(I have repeated some extensions is capitals letters to make sure I match the extensions in both cases)
This I believe should be indexing everything on my site www.lkm93.com and the regex I have added to that will exclude the file named file extensions. I have reloaded the transforms.conf file and I don't seem to be pulling in data outside of what I am already pulling in. Is there anything obvious that I could be missing here?
Hi @lkm93,
at first, I think that you used also props.conf adding:
[your_sourcetype]
TRANSFORMS-waf_exclude = waf_exclude
Then, where do you inserted props.conf and transforms.conf? they must be on Indexers or (when present) on Heavy Forwarders.
Then, do you restarted Splunk after modifying props.conf and transfrorms.conf?
Then, you didn't escaped the last parenthesis? the correct regex is .*\(tif|mp3|jpg|js|css|mp4|java|waf|png|gif|svg|jpeg|JPG|JS|JPEG|MID|MIDI|MP3|MP4|MPG|MPEG|PDF|PNG|TIFF|TXT|WAV|ZI\P)
At least, check your regex using the regex command:
index=your_index
| regex ".*\(tif|mp3|jpg|js|css|mp4|java|waf|png|gif|svg|jpeg|JPG|JS|JPEG|MID|MIDI|MP3|MP4|MPG|MPEG|PDF|PNG|TIFF|TXT|WAV|ZIP\)"
Finally I saw that there are extension in uppercase not present in lowercase or reverse (ZIP, css, etc...).
Ciao.
Giuseppe
Hi @lkm93,
at first, I think that you used also props.conf adding:
[your_sourcetype]
TRANSFORMS-waf_exclude = waf_exclude
Then, where do you inserted props.conf and transforms.conf? they must be on Indexers or (when present) on Heavy Forwarders.
Then, do you restarted Splunk after modifying props.conf and transfrorms.conf?
Then, you didn't escaped the last parenthesis? the correct regex is .*\(tif|mp3|jpg|js|css|mp4|java|waf|png|gif|svg|jpeg|JPG|JS|JPEG|MID|MIDI|MP3|MP4|MPG|MPEG|PDF|PNG|TIFF|TXT|WAV|ZI\P)
At least, check your regex using the regex command:
index=your_index
| regex ".*\(tif|mp3|jpg|js|css|mp4|java|waf|png|gif|svg|jpeg|JPG|JS|JPEG|MID|MIDI|MP3|MP4|MPG|MPEG|PDF|PNG|TIFF|TXT|WAV|ZIP\)"
Finally I saw that there are extension in uppercase not present in lowercase or reverse (ZIP, css, etc...).
Ciao.
Giuseppe
Hello Giuseppe,
thank you for your prompt reply.
I have re-arranged my props.conf file after reading your reply and also re-configured the transforms.conf file.
Here'show my props.conf file looks now:
[waf_log]
pulldown_type = true
MAX_TIMESTAMP_LOOKAHEAD = 32
SHOULD_LINEMERGE = False
TRANSFORMS-null = waf_include,waf_exclude,waf_include_xapi,waf_drop_x
LEARN_SOURCETYPE = false
TZ = GMT
Transforms.conf looks like this:
[waf_include]
DEST_KEY = queue
FORMAT = indexQueue
REGEX = .*
[waf_exclude]
DEST_KEY = queue
FORMAT = nullQueue
REGEX = .*\.(tif|mp3|jpg|js|css|java|Ico|waf|png|gif|svg|jpeg|avi|mid|midi|mpg|mpeg|mov|qt|png|ram|rar|tiff|txt|wav|zip|TIF|MP3|CSS|JAVA|ICO|WAF|PNG|SVG|AVI|CSS|EXE|GIF|JPG|JS|JPEG|MID|MIDI|MPG|MPEG|MOV|QT|PNG|RAM|RAR|TIFF|TXT|WAV|ZIP).*
[waf_include_xapi]
DEST_KEY = queue
FORMAT = indexQueue
REGEX = blah-blah
[waf_drop_x]
DEST_KEY = queue
FORMAT = nullQueue
REGEX = blahblah
My props.conf and transforms.conf files are on the Splunk manager, I thought that would be the reasonable place to have them.
I also discovered that by https://splunk-fqdn/en-US/debug/refresh I could refresh the all the .conf files. Do I definitely need to restart Splunk based on the new changes I have just made?
And lastly I have fixed the Regex to pick up whole urls on that domain, it's picking up everything I needs in the test I have done. also the extensions have been fixed I was in a rush to get the question out to the world..thank you!
What do you think of this now?
Hi @lkm93,
at first, you don't need the waf_include stanza, but I usually insert it!
Then, you don't need * in REGEX = .*
, you can use REGEX = .
.
Then you don't need the include stanzas whan you have REGEX = .
, because you already have all that you didn't discard, so try something like this:
in props.conf
[waf_log]
pulldown_type = true
MAX_TIMESTAMP_LOOKAHEAD = 32
SHOULD_LINEMERGE = False
TRANSFORMS-null = waf_include,waf_exclude
LEARN_SOURCETYPE = false
TZ = GMT
in transforms.conf
[waf_include]
DEST_KEY = queue
FORMAT = indexQueue
REGEX = .*
[waf_exclude]
DEST_KEY = queue
FORMAT = nullQueue
REGEX = .*\.(tif|mp3|jpg|js|css|java|Ico|waf|png|gif|svg|jpeg|avi|mid|midi|mpg|mpeg|mov|qt|png|ram|rar|tiff|txt|wav|zip|TIF|MP3|CSS|JAVA|ICO|WAF|PNG|SVG|AVI|CSS|EXE|GIF|JPG|JS|JPEG|MID|MIDI|MPG|MPEG|MOV|QT|PNG|RAM|RAR|TIFF|TXT|WAV|ZIP).*
Ciao.
Giuseppe
Hi @gcusello
Thank you for thi si applied this configuration and it seems to be working as you described! no longer picking up the unwanted extensions.