Splunk Search

Rex not extracting expected value

Builder

Utilizing web logs, I am trying to extract via rex, all text after the last / of the URL field and put the text into a field called, "filename". The catch is that I only want the text if it ends in .zip

After trying multiple variations of my regex statement, splunk keeps returning values that do not match my regex statement (I tested it on multiple online testers).

index=bc_logs | rex field=url "(?<filename>[^/]+\.zip)" | stats count by filename | sort -count

    filename                     count
1   sprint.zip                    400
2   message.zip                   31
3   track.zip                     4
4   www.zip                       4
5   Software%20Update             3
6   signaturerq.png               2
7   3po.zip                       1
8   W2n=41#cb=fb4&domain=www.zip  1
9   [455DE-DA3-4A-BCE-69F56D4]    1
10  americaninfidelmiddlefi.jpg   1

Some results end in .zip and some don't... not sure what's going on.

EDIT: added url log samples

url=track.ziprecruiter.com
url=files.getsoftfree.com/get/click/479ymt8s/?uid=6X102VhaCZ&filename=Software%20Update&sid=173652
url=desmond.imageshack.us/Himg62/scaled.php?server=62&filename=americaninfidelmiddlefi.jpg&res=medium   
Tags (3)
1 Solution

Legend

From your log samples it seems likely that Splunk's auto-kv extraction is overwriting your own field extraction in cases where there's a "filename=<something>" as part of a log event. Verify this by calling your field something else and check if results are correct.

EDIT: Or rather, it's the other way around - Splunk's auto-kv will run first, and find some "filename" values. Then you apply your own field extraction which will only write results to the "filename" field if it finds anything that matches your regex. However, for results where it DOESN'T match, but auto-kv has extracted something, that value will not get overwritten and so you're left with matches from both kinds of extractions.

View solution in original post

Builder

@gpradeepkumarreddy: This kinda works. It works as I inteded, but eliminates all logs that already contain 'filename' in the URL. The final solution was combining your addition along with Ayn's.

@rroberts: tried with the anchor, doesn't help. Documentation says, "The rex command matches the value of the specified field against the unanchored regular expression..."

0 Karma

Legend

From your log samples it seems likely that Splunk's auto-kv extraction is overwriting your own field extraction in cases where there's a "filename=<something>" as part of a log event. Verify this by calling your field something else and check if results are correct.

EDIT: Or rather, it's the other way around - Splunk's auto-kv will run first, and find some "filename" values. Then you apply your own field extraction which will only write results to the "filename" field if it finds anything that matches your regex. However, for results where it DOESN'T match, but auto-kv has extracted something, that value will not get overwritten and so you're left with matches from both kinds of extractions.

View solution in original post

Builder

That was it! when I changed 'filename' to 'blabla' and re-ran it, it worked perfectly. Thank you all!

Final query for any future readers:
index=bc_logs url=*.zip | rex field=url "(?<blabla>[^/]+\.zip)" | stats count by blabla | sort -count

Splunk Employee
Splunk Employee

Did you try .. index=bc_logs | rex field=url "(?[^/]+.zip$)" | stats count by filename | sort -count

Putting a $ after zip to declare "ends with zip"?

0 Karma

Influencer

Can you post some of the values of url?

One more way would be filtering out the results where the url contains "zip" prior to your extraction url=*.zip