Splunk Search

Need rex help with URL

Path Finder

Hi I have this rex I'm trying to filter on for any URL that points to file extensions that have two or more extensions. So far I have this:

^(http:\/\/www\.|https:\/\/www\.|http:\/\/|https:\/\/|hxxp:\/\/|hxxps:\/\/)?[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5}(:[0-9]{1,5})?(\/.*)?$

Any help is appreciated. Thanks!

0 Karma
1 Solution

Champion

hmm still not sure but i will give this a try

  | makeresults 
    | eval url="hxxp://static.zipcloud.com/a/zipcloud//img/footer.break.png.exe"
    | rex field=url ".*\/(?<ext>.*)" 
    |eval ext=split(ext,".")
    | eval ext_count=mvcount(ext)

Now, what this does is extract everything after the last /. you make this a mvfield and count the number of extensions.
This will give you the count, in the example above this gives a count of 3 , for footer,break and png.
so you know that anything that has a count greater than 1 has at least 2 dots , something like xx.yyy......
Thats the easy part.Now how you want to to match against all extensions is a bit tricky, you can compare a against some common extensions in the rex or using a like function. But I will wait to first hear from you on whether this works for you
for your use and assuming the field is named url you just need to copy and re-use code from the rex onwards

View solution in original post

Champion

hmm still not sure but i will give this a try

  | makeresults 
    | eval url="hxxp://static.zipcloud.com/a/zipcloud//img/footer.break.png.exe"
    | rex field=url ".*\/(?<ext>.*)" 
    |eval ext=split(ext,".")
    | eval ext_count=mvcount(ext)

Now, what this does is extract everything after the last /. you make this a mvfield and count the number of extensions.
This will give you the count, in the example above this gives a count of 3 , for footer,break and png.
so you know that anything that has a count greater than 1 has at least 2 dots , something like xx.yyy......
Thats the easy part.Now how you want to to match against all extensions is a bit tricky, you can compare a against some common extensions in the rex or using a like function. But I will wait to first hear from you on whether this works for you
for your use and assuming the field is named url you just need to copy and re-use code from the rex onwards

View solution in original post

Path Finder
 | rex field=url ".*\/(?<ext>.*)" 
 | eval ext=split(ext,".")
 | eval ext_count=mvcount(ext)

This works great! So the split tells you how many sections are separated by dots. How do I only display ext_count of 3 or higher? How about 3 exactly?

Thanks!

0 Karma

Path Finder
 | rex field=url ".*\/(?<ext>.*)" 
 | eval ext=split(ext,".")
 | eval ext_count=mvcount(ext)
 | search ext_count>=3
 | dedup ext

Got it! Thanks for all your help!

Champion

glad to see you figured it out @fdevera . Sorry I am in IST times and it was too late in the night for me to see your comments,

0 Karma

Motivator

If you are just aiming to get everything after the last slash, this is the regex:

^.*\/([^\/]+)$

https://regex101.com/r/y0D5rr/1
If you'd like to fine tune it to clarify extensions, you can do something like this:

^.*\/([^\/]+\.(png|pdf|docx|scr|exe))$

https://regex101.com/r/tv2Th5/1

Cheers,
Jacob

Path Finder

I added this:
|rex url="^.*\/([^\/]+)$"

And received this error:

Error in 'rex' command: The regex 'url=^.*\/([^\/]+)$' does not extract anything. It should specify at least one named group. Format: (?...).

0 Karma

Motivator

Apologies, I was just trying to assist with the regex. If that's the error, here's what you need:

 ^.*\/(?<ThisIsWhatIWantMyFieldNamed>([^\/]+))$
Cheers,
Jacob
0 Karma

Champion

hi @fdevera
can you share a sample event and what all you want to extract?

Path Finder

index=webproxy |table url

example output:
hxxp://static.zipcloud.com/a/zipcloud//img/footer.break.png

I only want to display events with url that have more than one extension. I know this will be difficult because of ransom existence of periods which will cause alot of false positives but that's fine. Any ideas to reduce that would be great too.

0 Karma

Champion

hi @fdevera
bit confused on the 'estensions', is it 2 here because of footer.break.png containing 2 dots? or how do you calculate the extensions for this url?

0 Karma

Path Finder

I'm looking for direct links to files that have two extensions like .docx.scr or .pdf.exe. What would be the best way to do that in rex? I'm ok with false positives in the results.

0 Karma

Champion

uh ha so the example you gave above
hxxp://static.zipcloud.com/a/zipcloud//img/footer.break.png
qualifies as it has break.png, right?

0 Karma

Path Finder

What am I doing wrong here?

| rex field=url "^.*\/([^\/]+)$" | table urlrisk_gibson src_host src_ip dst_host dst_ip mwg_client_sent sent user_agent url field10 http_message http_method http_response http_version

0 Karma

Motivator

As they said, we need to see your data and what you expect to see in order to help you.

Cheers,
Jacob
0 Karma

Path Finder

Correct - no way around that since extensions can have more than 3 letters, sometimes 5 or 6. And filenames commonly have periods in them. At the very least I'd like to limit my results to those that have only two periods in the file name.

0 Karma

Revered Legend

Agree. For questions like this, sample data is required as just based on regex, we can know what your current regex is doing but can't know if it's doing what you want. Please share which events/values you want to include and which you want to exclude. Please scrub any sensitive data while posting samples.

Don’t Miss Global Splunk
User Groups Week!

Free LIVE events worldwide 2/8-2/12
Connect, learn, and collect rad prizes
and swag!