Splunk Search

Rex not extracting expected value

DEAD_BEEF
Builder

Utilizing web logs, I am trying to extract via rex, all text after the last / of the URL field and put the text into a field called, "filename". The catch is that I only want the text if it ends in .zip

After trying multiple variations of my regex statement, splunk keeps returning values that do not match my regex statement (I tested it on multiple online testers).

index=bc_logs | rex field=url "(?<filename>[^/]+\.zip)" | stats count by filename | sort -count

    filename                     count
1   sprint.zip                    400
2   message.zip                   31
3   track.zip                     4
4   www.zip                       4
5   Software%20Update             3
6   signaturerq.png               2
7   3po.zip                       1
8   W2n=41#cb=fb4&domain=www.zip  1
9   [455DE-DA3-4A-BCE-69F56D4]    1
10  americaninfidelmiddlefi.jpg   1

Some results end in .zip and some don't... not sure what's going on.

EDIT: added url log samples

url=track.ziprecruiter.com
url=files.getsoftfree.com/get/click/479ymt8s/?uid=6X102VhaCZ&filename=Software%20Update&sid=173652
url=desmond.imageshack.us/Himg62/scaled.php?server=62&filename=americaninfidelmiddlefi.jpg&res=medium   
Tags (3)
1 Solution

Ayn
Legend

From your log samples it seems likely that Splunk's auto-kv extraction is overwriting your own field extraction in cases where there's a "filename=<something>" as part of a log event. Verify this by calling your field something else and check if results are correct.

EDIT: Or rather, it's the other way around - Splunk's auto-kv will run first, and find some "filename" values. Then you apply your own field extraction which will only write results to the "filename" field if it finds anything that matches your regex. However, for results where it DOESN'T match, but auto-kv has extracted something, that value will not get overwritten and so you're left with matches from both kinds of extractions.

View solution in original post

DEAD_BEEF
Builder

@gpradeepkumarreddy: This kinda works. It works as I inteded, but eliminates all logs that already contain 'filename' in the URL. The final solution was combining your addition along with Ayn's.

@rroberts: tried with the anchor, doesn't help. Documentation says, "The rex command matches the value of the specified field against the unanchored regular expression..."

0 Karma

Ayn
Legend

From your log samples it seems likely that Splunk's auto-kv extraction is overwriting your own field extraction in cases where there's a "filename=<something>" as part of a log event. Verify this by calling your field something else and check if results are correct.

EDIT: Or rather, it's the other way around - Splunk's auto-kv will run first, and find some "filename" values. Then you apply your own field extraction which will only write results to the "filename" field if it finds anything that matches your regex. However, for results where it DOESN'T match, but auto-kv has extracted something, that value will not get overwritten and so you're left with matches from both kinds of extractions.

DEAD_BEEF
Builder

That was it! when I changed 'filename' to 'blabla' and re-ran it, it worked perfectly. Thank you all!

Final query for any future readers:
index=bc_logs url=*.zip | rex field=url "(?<blabla>[^/]+\.zip)" | stats count by blabla | sort -count

rroberts
Splunk Employee
Splunk Employee

Did you try .. index=bc_logs | rex field=url "(?[^/]+.zip$)" | stats count by filename | sort -count

Putting a $ after zip to declare "ends with zip"?

0 Karma

pradeepkumarg
Influencer

Can you post some of the values of url?

One more way would be filtering out the results where the url contains "zip" prior to your extraction url=*.zip

Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.

Can’t make it to .conf25? Join us online!

Get Updates on the Splunk Community!

Can’t Make It to Boston? Stream .conf25 and Learn with Haya Husain

Boston may be buzzing this September with Splunk University and .conf25, but you don’t have to pack a bag to ...

Splunk Lantern’s Guide to The Most Popular .conf25 Sessions

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...

Unlock What’s Next: The Splunk Cloud Platform at .conf25

In just a few days, Boston will be buzzing as the Splunk team and thousands of community members come together ...