Splunk Search

Rex not extracting expected value

DEAD_BEEF
Builder

Utilizing web logs, I am trying to extract via rex, all text after the last / of the URL field and put the text into a field called, "filename". The catch is that I only want the text if it ends in .zip

After trying multiple variations of my regex statement, splunk keeps returning values that do not match my regex statement (I tested it on multiple online testers).

index=bc_logs | rex field=url "(?<filename>[^/]+\.zip)" | stats count by filename | sort -count

    filename                     count
1   sprint.zip                    400
2   message.zip                   31
3   track.zip                     4
4   www.zip                       4
5   Software%20Update             3
6   signaturerq.png               2
7   3po.zip                       1
8   W2n=41#cb=fb4&domain=www.zip  1
9   [455DE-DA3-4A-BCE-69F56D4]    1
10  americaninfidelmiddlefi.jpg   1

Some results end in .zip and some don't... not sure what's going on.

EDIT: added url log samples

url=track.ziprecruiter.com
url=files.getsoftfree.com/get/click/479ymt8s/?uid=6X102VhaCZ&filename=Software%20Update&sid=173652
url=desmond.imageshack.us/Himg62/scaled.php?server=62&filename=americaninfidelmiddlefi.jpg&res=medium   
Tags (3)
1 Solution

Ayn
Legend

From your log samples it seems likely that Splunk's auto-kv extraction is overwriting your own field extraction in cases where there's a "filename=<something>" as part of a log event. Verify this by calling your field something else and check if results are correct.

EDIT: Or rather, it's the other way around - Splunk's auto-kv will run first, and find some "filename" values. Then you apply your own field extraction which will only write results to the "filename" field if it finds anything that matches your regex. However, for results where it DOESN'T match, but auto-kv has extracted something, that value will not get overwritten and so you're left with matches from both kinds of extractions.

View solution in original post

DEAD_BEEF
Builder

@gpradeepkumarreddy: This kinda works. It works as I inteded, but eliminates all logs that already contain 'filename' in the URL. The final solution was combining your addition along with Ayn's.

@rroberts: tried with the anchor, doesn't help. Documentation says, "The rex command matches the value of the specified field against the unanchored regular expression..."

0 Karma

Ayn
Legend

From your log samples it seems likely that Splunk's auto-kv extraction is overwriting your own field extraction in cases where there's a "filename=<something>" as part of a log event. Verify this by calling your field something else and check if results are correct.

EDIT: Or rather, it's the other way around - Splunk's auto-kv will run first, and find some "filename" values. Then you apply your own field extraction which will only write results to the "filename" field if it finds anything that matches your regex. However, for results where it DOESN'T match, but auto-kv has extracted something, that value will not get overwritten and so you're left with matches from both kinds of extractions.

DEAD_BEEF
Builder

That was it! when I changed 'filename' to 'blabla' and re-ran it, it worked perfectly. Thank you all!

Final query for any future readers:
index=bc_logs url=*.zip | rex field=url "(?<blabla>[^/]+\.zip)" | stats count by blabla | sort -count

rroberts
Splunk Employee
Splunk Employee

Did you try .. index=bc_logs | rex field=url "(?[^/]+.zip$)" | stats count by filename | sort -count

Putting a $ after zip to declare "ends with zip"?

0 Karma

pradeepkumarg
Influencer

Can you post some of the values of url?

One more way would be filtering out the results where the url contains "zip" prior to your extraction url=*.zip

Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...