Splunk Search

Extract data from URL string

Vfinney
Observer

I am trying to extract the file types, file names, and URLs from proxy logs for monitoring purposes. Here is what I'm looking for. Thanks in advance for any and all assistance.

URL Filetype Filename
http://dmp.truoptik.com .gif sync
http://r14---sn-bvvbax jpl.gvt1 .exe Chrome_updater
http://workforce-ks.com/ .pdf 2019-One-Stop-Advisory-Council-Meeting-Packet

Proxy logs examples:
7/30/19
1:29:52.000 PM

Jul 30 13:29:52 10.140.24.233 Jul 30 13:29:52 Access_Logs_Splunk: Info: 1564511389.352 80 10.140.6.27 TCP_MISS/204 793 GET http://dmp.truoptik.com/239e300e6dca3b53/sync.gif?dm=ib.adnxs.com&fck=6298473322644763945 "DOL\sroth@KDOL_Web_Auth" DIRECT/dmp.truoptik.com - DEFAULT_CASE_12-KDOL_Access_Policy-KDOL_Web_Identity-NONE-NONE-NONE-DefaultGroup <-,-,-,"-",-,-,-,-,"-",-,-,-,"-",-,-,"-","-",-,-,-,-,"-","-","-","-","-","-",79.30,0,-,"-","-",-,"-",-,-,"-","-",-,-,"-"> -

7/30/19 1:29:42.000 PM Jul 30 13:29:42 10.140.24.233 Jul 30 13:29:42 Access_Logs_Splunk: Info: 1564511379.248 324 10.140.10.21 TCP_MISS/206 1587824 GET http://r14---sn-bvvbax jpl.gvt1.com/edgedl/release2/chrome/AOnIEhGH7WaH0jVMgWzb_TU_76.0.3809.87/76.0.3809.87_75.0.3770.142_chrome_updater.exe?cms_redirect=yes&mip=165.201.56.130&mm=28&mn=sn-bvvbax-hjpl&ms=nvh&mt=1564511187&mv=m&mvi=13&nh=EAE&pl=16&shardbypass=yes "DOL\dingels@KDOL_Web_Auth" DIRECT/r14---sn-bvvbax-hjpl.gvt1.com application/octet-stream DEFAULT_CASE_12-KDOL_Access_Policy-KDOL_Web_Identity-NONE-NONE-NONE-DefaultGroup <-,-,-,"-",-,-,-,-,"-",-,-,-,"-",-,-,"-","-",-,-,-,-,"-","-","-","-","-","-",39205.53,0,-,"-","-",-,"-",-,-,"-","-",-,-,"-"> -

7/30/19
1:33:50.000 PM

Jul 30 13:33:50 10.140.24.234 Jul 30 13:33:50 Access_Logs_Splunk: Info: 1564511627.609 1779 10.140.4.14 TCP_MISS/200 3461685 GET http://workforce-ks.com/wp-content/uploads/2015/05/08.01.2019-One-Stop-Advisory-Council-Meeting-Pack... "DOL\nstruckhoff@KDOL_Web_Auth" DIRECT/workforce-ks.com application/pdf DEFAULT_CASE_12-Social_Media_Access_Policy-KDOL_Web_Identity-NONE-NONE-NONE-DefaultGroup <-,-,-,"-",-,-,-,-,"-",-,-,-,"-",-,-,"-","-",-,-,-,-,"-","-","-","-","-","-",15566.88,0,-,"-","-",-,"-",-,-,"-","-",-,-,"-"> -

7/30/19
1:33:11.000 PM

Jul 30 13:33:11 10.140.24.234 Jul 30 13:33:11 Access_Logs_Splunk: Info: 1564511588.080 44 10.140.4.104 TCP_MISS/200 35005 GET http://ts.intra.dol.ks.gov/Files/PDF/EmployeeRecognition.pdf "DOL\njanco@KDOL_Web_Auth" DIRECT/ts.intra.dol.ks.gov application/pdf DEFAULT_CASE_12-Social_Media_Access_Policy-KDOL_Web_Identity-NONE-NONE-NONE-DefaultGroup <-,-,-,"-",-,-,-,-,"-",-,-,-,"-",-,-,"-","-",-,-,-,-,"-","-","-","-","-","-",6364.55,0,-,"-","-",-,"-",-,-,"-","-",-,-,"-"> -

0 Karma
1 Solution

jacobpevans
Motivator

Greetings @Vfinney,

Please try the following run-anywhere search.

| makeresults
| eval _raw="7/30/19 1:29:52.000 PM Jul 30 13:29:52 10.140.24.233 Jul 30 13:29:52 Access_Logs_Splunk: Info: 1564511389.352 80 10.140.6.27 TCP_MISS/204 793 GET http://dmp.truoptik.com/239e300e6dca3b53/sync.gif?dm=ib.adnxs.com&fck=6298473322644763945 \"DOL\sroth@KDOL_Web_Auth\" DIRECT/dmp.truoptik.com - DEFAULT_CASE_12-KDOL_Access_Policy-KDOL_Web_Identity-NONE-NONE-NONE-DefaultGroup <-,-,-,\"-\",-,-,-,-,\"-\",-,-,-,\"-\",-,-,\"-\",\"-\",-,-,-,-,\"-\",\"-\",\"-\",\"-\",\"-\",\"-\",79.30,0,-,\"-\",\"-\",-,\"-\",-,-,\"-\",\"-\",-,-,\"-\"> -"
| append [ makeresults | eval _raw="7/30/19 1:29:42.000 PM Jul 30 13:29:42 10.140.24.233 Jul 30 13:29:42 Access_Logs_Splunk: Info: 1564511379.248 324 10.140.10.21 TCP_MISS/206 1587824 GET http://r14---sn-bvvbaxjpl.gvt1.com/edgedl/release2/chrome/AOnIEhGH7WaH0jVMgWzb_TU_76.0.3809.87/76.0.... \"DOL\dingels@KDOL_Web_Auth\" DIRECT/r14---sn-bvvbax-hjpl.gvt1.com application/octet-stream DEFAULT_CASE_12-KDOL_Access_Policy-KDOL_Web_Identity-NONE-NONE-NONE-DefaultGroup <-,-,-,\"-\",-,-,-,-,\"-\",-,-,-,\"-\",-,-,\"-\",\"-\",-,-,-,-,\"-\",\"-\",\"-\",\"-\",\"-\",\"-\",39205.53,0,-,\"-\",\"-\",-,\"-\",-,-,\"-\",\"-\",-,-,\"-\"> -" ]
| append [ makeresults | eval _raw="7/30/19 1:33:50.000 PM Jul 30 13:33:50 10.140.24.234 Jul 30 13:33:50 Access_Logs_Splunk: Info: 1564511627.609 1779 10.140.4.14 TCP_MISS/200 3461685 GET http://workforce-ks.com/wp-content/uploads/2015/05/08.01.2019-One-Stop-Advisory-Council-Meeting-Pack... \"DOL\nstruckhoff@KDOL_Web_Auth\" DIRECT/workforce-ks.com application/pdf DEFAULT_CASE_12-Social_Media_Access_Policy-KDOL_Web_Identity-NONE-NONE-NONE-DefaultGroup <-,-,-,\"-\",-,-,-,-,\"-\",-,-,-,\"-\",-,-,\"-\",\"-\",-,-,-,-,\"-\",\"-\",\"-\",\"-\",\"-\",\"-\",15566.88,0,-,\"-\",\"-\",-,\"-\",-,-,\"-\",\"-\",-,-,\"-\"> -" ]
| append [ makeresults | eval _raw="7/30/19 1:33:11.000 PM  Jul 30 13:33:11 10.140.24.234 Jul 30 13:33:11 Access_Logs_Splunk: Info: 1564511588.080 44 10.140.4.104 TCP_MISS/200 35005 GET http://ts.intra.dol.ks.gov/Files/PDF/EmployeeRecognition.pdf \"DOL\njanco@KDOL_Web_Auth\" DIRECT/ts.intra.dol.ks.gov application/pdf DEFAULT_CASE_12-Social_Media_Access_Policy-KDOL_Web_Identity-NONE-NONE-NONE-DefaultGroup <-,-,-,\"-\",-,-,-,-,\"-\",-,-,-,\"-\",-,-,\"-\",\"-\",-,-,-,-,\"-\",\"-\",\"-\",\"-\",\"-\",\"-\",6364.55,0,-,\"-\",\"-\",-,\"-\",-,-,\"-\",\"-\",-,-,\"-\"> -" ]
| rex field=_raw     "GET (?<Full_URL>https?://[^\s]+)"
| rex field=Full_URL "(?<URL>https?://[^/]+/)"
| rex field=Full_URL "/(?<Filename>[^/]+)(?<Filetype>\.(gif|exe|pdf))\??"
| table URL Filename Filetype

These are the results:

URL                                    Filename                                              Filetype
http://dmp.truoptik.com/               sync                                                  .gif
http://r14---sn-bvvbaxjpl.gvt1.com/    76.0.3809.87_75.0.3770.142_chrome_updater             .exe
http://workforce-ks.com/               08.01.2019-One-Stop-Advisory-Council-Meeting-Packet   .pdf
http://ts.intra.dol.ks.gov/            EmployeeRecognition                                   .pdf

Assumptions:
- URL is always preceded by "GET " and does not contain spaces.
- Filename does not contain spaces or "/" symbol
- Filetype is either .gif, .exe, or .pdf. You can add | and the new extension after gif|exe|pdf to add others.

Cheers,
Jacob

If you feel this response answered your question, please do not forget to mark it as such. If it did not, but you do have the answer, feel free to answer your own post and accept that as the answer.

View solution in original post

jacobpevans
Motivator

Greetings @Vfinney,

Please try the following run-anywhere search.

| makeresults
| eval _raw="7/30/19 1:29:52.000 PM Jul 30 13:29:52 10.140.24.233 Jul 30 13:29:52 Access_Logs_Splunk: Info: 1564511389.352 80 10.140.6.27 TCP_MISS/204 793 GET http://dmp.truoptik.com/239e300e6dca3b53/sync.gif?dm=ib.adnxs.com&fck=6298473322644763945 \"DOL\sroth@KDOL_Web_Auth\" DIRECT/dmp.truoptik.com - DEFAULT_CASE_12-KDOL_Access_Policy-KDOL_Web_Identity-NONE-NONE-NONE-DefaultGroup <-,-,-,\"-\",-,-,-,-,\"-\",-,-,-,\"-\",-,-,\"-\",\"-\",-,-,-,-,\"-\",\"-\",\"-\",\"-\",\"-\",\"-\",79.30,0,-,\"-\",\"-\",-,\"-\",-,-,\"-\",\"-\",-,-,\"-\"> -"
| append [ makeresults | eval _raw="7/30/19 1:29:42.000 PM Jul 30 13:29:42 10.140.24.233 Jul 30 13:29:42 Access_Logs_Splunk: Info: 1564511379.248 324 10.140.10.21 TCP_MISS/206 1587824 GET http://r14---sn-bvvbaxjpl.gvt1.com/edgedl/release2/chrome/AOnIEhGH7WaH0jVMgWzb_TU_76.0.3809.87/76.0.... \"DOL\dingels@KDOL_Web_Auth\" DIRECT/r14---sn-bvvbax-hjpl.gvt1.com application/octet-stream DEFAULT_CASE_12-KDOL_Access_Policy-KDOL_Web_Identity-NONE-NONE-NONE-DefaultGroup <-,-,-,\"-\",-,-,-,-,\"-\",-,-,-,\"-\",-,-,\"-\",\"-\",-,-,-,-,\"-\",\"-\",\"-\",\"-\",\"-\",\"-\",39205.53,0,-,\"-\",\"-\",-,\"-\",-,-,\"-\",\"-\",-,-,\"-\"> -" ]
| append [ makeresults | eval _raw="7/30/19 1:33:50.000 PM Jul 30 13:33:50 10.140.24.234 Jul 30 13:33:50 Access_Logs_Splunk: Info: 1564511627.609 1779 10.140.4.14 TCP_MISS/200 3461685 GET http://workforce-ks.com/wp-content/uploads/2015/05/08.01.2019-One-Stop-Advisory-Council-Meeting-Pack... \"DOL\nstruckhoff@KDOL_Web_Auth\" DIRECT/workforce-ks.com application/pdf DEFAULT_CASE_12-Social_Media_Access_Policy-KDOL_Web_Identity-NONE-NONE-NONE-DefaultGroup <-,-,-,\"-\",-,-,-,-,\"-\",-,-,-,\"-\",-,-,\"-\",\"-\",-,-,-,-,\"-\",\"-\",\"-\",\"-\",\"-\",\"-\",15566.88,0,-,\"-\",\"-\",-,\"-\",-,-,\"-\",\"-\",-,-,\"-\"> -" ]
| append [ makeresults | eval _raw="7/30/19 1:33:11.000 PM  Jul 30 13:33:11 10.140.24.234 Jul 30 13:33:11 Access_Logs_Splunk: Info: 1564511588.080 44 10.140.4.104 TCP_MISS/200 35005 GET http://ts.intra.dol.ks.gov/Files/PDF/EmployeeRecognition.pdf \"DOL\njanco@KDOL_Web_Auth\" DIRECT/ts.intra.dol.ks.gov application/pdf DEFAULT_CASE_12-Social_Media_Access_Policy-KDOL_Web_Identity-NONE-NONE-NONE-DefaultGroup <-,-,-,\"-\",-,-,-,-,\"-\",-,-,-,\"-\",-,-,\"-\",\"-\",-,-,-,-,\"-\",\"-\",\"-\",\"-\",\"-\",\"-\",6364.55,0,-,\"-\",\"-\",-,\"-\",-,-,\"-\",\"-\",-,-,\"-\"> -" ]
| rex field=_raw     "GET (?<Full_URL>https?://[^\s]+)"
| rex field=Full_URL "(?<URL>https?://[^/]+/)"
| rex field=Full_URL "/(?<Filename>[^/]+)(?<Filetype>\.(gif|exe|pdf))\??"
| table URL Filename Filetype

These are the results:

URL                                    Filename                                              Filetype
http://dmp.truoptik.com/               sync                                                  .gif
http://r14---sn-bvvbaxjpl.gvt1.com/    76.0.3809.87_75.0.3770.142_chrome_updater             .exe
http://workforce-ks.com/               08.01.2019-One-Stop-Advisory-Council-Meeting-Packet   .pdf
http://ts.intra.dol.ks.gov/            EmployeeRecognition                                   .pdf

Assumptions:
- URL is always preceded by "GET " and does not contain spaces.
- Filename does not contain spaces or "/" symbol
- Filetype is either .gif, .exe, or .pdf. You can add | and the new extension after gif|exe|pdf to add others.

Cheers,
Jacob

If you feel this response answered your question, please do not forget to mark it as such. If it did not, but you do have the answer, feel free to answer your own post and accept that as the answer.
Get Updates on the Splunk Community!

Say goodbye to manually analyzing phishing and malware threats with Splunk Attack ...

In today’s evolving threat landscape, we understand you’re constantly bombarded with phishing and malware ...

AppDynamics is now part of Splunk Ideas

Hello Splunkers, We have exciting news for you! AppDynamics has been added to the Splunk Ideas Portal. Which ...

Advanced Splunk Data Management Strategies

Join us on Wednesday, May 14, 2025, at 11 AM PDT / 2 PM EDT for an exclusive Tech Talk that delves into ...