Splunk Search

Rex not extracting expected value

DEAD_BEEF
Builder

Utilizing web logs, I am trying to extract via rex, all text after the last / of the URL field and put the text into a field called, "filename". The catch is that I only want the text if it ends in .zip

After trying multiple variations of my regex statement, splunk keeps returning values that do not match my regex statement (I tested it on multiple online testers).

index=bc_logs | rex field=url "(?<filename>[^/]+\.zip)" | stats count by filename | sort -count

    filename                     count
1   sprint.zip                    400
2   message.zip                   31
3   track.zip                     4
4   www.zip                       4
5   Software%20Update             3
6   signaturerq.png               2
7   3po.zip                       1
8   W2n=41#cb=fb4&domain=www.zip  1
9   [455DE-DA3-4A-BCE-69F56D4]    1
10  americaninfidelmiddlefi.jpg   1

Some results end in .zip and some don't... not sure what's going on.

EDIT: added url log samples

url=track.ziprecruiter.com
url=files.getsoftfree.com/get/click/479ymt8s/?uid=6X102VhaCZ&filename=Software%20Update&sid=173652
url=desmond.imageshack.us/Himg62/scaled.php?server=62&filename=americaninfidelmiddlefi.jpg&res=medium   
Tags (3)
1 Solution

Ayn
Legend

From your log samples it seems likely that Splunk's auto-kv extraction is overwriting your own field extraction in cases where there's a "filename=<something>" as part of a log event. Verify this by calling your field something else and check if results are correct.

EDIT: Or rather, it's the other way around - Splunk's auto-kv will run first, and find some "filename" values. Then you apply your own field extraction which will only write results to the "filename" field if it finds anything that matches your regex. However, for results where it DOESN'T match, but auto-kv has extracted something, that value will not get overwritten and so you're left with matches from both kinds of extractions.

View solution in original post

DEAD_BEEF
Builder

@gpradeepkumarreddy: This kinda works. It works as I inteded, but eliminates all logs that already contain 'filename' in the URL. The final solution was combining your addition along with Ayn's.

@rroberts: tried with the anchor, doesn't help. Documentation says, "The rex command matches the value of the specified field against the unanchored regular expression..."

0 Karma

Ayn
Legend

From your log samples it seems likely that Splunk's auto-kv extraction is overwriting your own field extraction in cases where there's a "filename=<something>" as part of a log event. Verify this by calling your field something else and check if results are correct.

EDIT: Or rather, it's the other way around - Splunk's auto-kv will run first, and find some "filename" values. Then you apply your own field extraction which will only write results to the "filename" field if it finds anything that matches your regex. However, for results where it DOESN'T match, but auto-kv has extracted something, that value will not get overwritten and so you're left with matches from both kinds of extractions.

DEAD_BEEF
Builder

That was it! when I changed 'filename' to 'blabla' and re-ran it, it worked perfectly. Thank you all!

Final query for any future readers:
index=bc_logs url=*.zip | rex field=url "(?<blabla>[^/]+\.zip)" | stats count by blabla | sort -count

rroberts
Splunk Employee
Splunk Employee

Did you try .. index=bc_logs | rex field=url "(?[^/]+.zip$)" | stats count by filename | sort -count

Putting a $ after zip to declare "ends with zip"?

0 Karma

pradeepkumarg
Influencer

Can you post some of the values of url?

One more way would be filtering out the results where the url contains "zip" prior to your extraction url=*.zip

Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...