Splunk Search

Extract data from URL string

New Member

I am trying to extract the file types, file names, and URLs from proxy logs for monitoring purposes. Here is what I'm looking for. Thanks in advance for any and all assistance.

URL Filetype Filename
http://dmp.truoptik.com .gif sync
http://r14---sn-bvvbax jpl.gvt1 .exe Chrome_updater
http://workforce-ks.com/ .pdf 2019-One-Stop-Advisory-Council-Meeting-Packet

Proxy logs examples:
7/30/19
1:29:52.000 PM

Jul 30 13:29:52 10.140.24.233 Jul 30 13:29:52 Access_Logs_Splunk: Info: 1564511389.352 80 10.140.6.27 TCP_MISS/204 793 GET http://dmp.truoptik.com/239e300e6dca3b53/sync.gif?dm=ib.adnxs.com&fck=6298473322644763945 "DOL\sroth@KDOL_Web_Auth" DIRECT/dmp.truoptik.com - DEFAULT_CASE_12-KDOL_Access_Policy-KDOL_Web_Identity-NONE-NONE-NONE-DefaultGroup <-,-,-,"-",-,-,-,-,"-",-,-,-,"-",-,-,"-","-",-,-,-,-,"-","-","-","-","-","-",79.30,0,-,"-","-",-,"-",-,-,"-","-",-,-,"-"> -

7/30/19 1:29:42.000 PM Jul 30 13:29:42 10.140.24.233 Jul 30 13:29:42 Access_Logs_Splunk: Info: 1564511379.248 324 10.140.10.21 TCP_MISS/206 1587824 GET http://r14---sn-bvvbax jpl.gvt1.com/edgedl/release2/chrome/AOnIEhGH7WaH0jVMgWzb_TU_76.0.3809.87/76.0.3809.87_75.0.3770.142_chrome_updater.exe?cms_redirect=yes&mip=165.201.56.130&mm=28&mn=sn-bvvbax-hjpl&ms=nvh&mt=1564511187&mv=m&mvi=13&nh=EAE&pl=16&shardbypass=yes "DOL\dingels@KDOL_Web_Auth" DIRECT/r14---sn-bvvbax-hjpl.gvt1.com application/octet-stream DEFAULT_CASE_12-KDOL_Access_Policy-KDOL_Web_Identity-NONE-NONE-NONE-DefaultGroup <-,-,-,"-",-,-,-,-,"-",-,-,-,"-",-,-,"-","-",-,-,-,-,"-","-","-","-","-","-",39205.53,0,-,"-","-",-,"-",-,-,"-","-",-,-,"-"> -

7/30/19
1:33:50.000 PM

Jul 30 13:33:50 10.140.24.234 Jul 30 13:33:50 Access_Logs_Splunk: Info: 1564511627.609 1779 10.140.4.14 TCP_MISS/200 3461685 GET http://workforce-ks.com/wp-content/uploads/2015/05/08.01.2019-One-Stop-Advisory-Council-Meeting-Pack... "DOL\nstruckhoff@KDOL_Web_Auth" DIRECT/workforce-ks.com application/pdf DEFAULT_CASE_12-Social_Media_Access_Policy-KDOL_Web_Identity-NONE-NONE-NONE-DefaultGroup <-,-,-,"-",-,-,-,-,"-",-,-,-,"-",-,-,"-","-",-,-,-,-,"-","-","-","-","-","-",15566.88,0,-,"-","-",-,"-",-,-,"-","-",-,-,"-"> -

7/30/19
1:33:11.000 PM

Jul 30 13:33:11 10.140.24.234 Jul 30 13:33:11 Access_Logs_Splunk: Info: 1564511588.080 44 10.140.4.104 TCP_MISS/200 35005 GET http://ts.intra.dol.ks.gov/Files/PDF/EmployeeRecognition.pdf "DOL\njanco@KDOL_Web_Auth" DIRECT/ts.intra.dol.ks.gov application/pdf DEFAULT_CASE_12-Social_Media_Access_Policy-KDOL_Web_Identity-NONE-NONE-NONE-DefaultGroup <-,-,-,"-",-,-,-,-,"-",-,-,-,"-",-,-,"-","-",-,-,-,-,"-","-","-","-","-","-",6364.55,0,-,"-","-",-,"-",-,-,"-","-",-,-,"-"> -

0 Karma
1 Solution

Motivator

Greetings @Vfinney,

Please try the following run-anywhere search.

| makeresults
| eval _raw="7/30/19 1:29:52.000 PM Jul 30 13:29:52 10.140.24.233 Jul 30 13:29:52 Access_Logs_Splunk: Info: 1564511389.352 80 10.140.6.27 TCP_MISS/204 793 GET http://dmp.truoptik.com/239e300e6dca3b53/sync.gif?dm=ib.adnxs.com&fck=6298473322644763945 \"DOL\sroth@KDOL_Web_Auth\" DIRECT/dmp.truoptik.com - DEFAULT_CASE_12-KDOL_Access_Policy-KDOL_Web_Identity-NONE-NONE-NONE-DefaultGroup <-,-,-,\"-\",-,-,-,-,\"-\",-,-,-,\"-\",-,-,\"-\",\"-\",-,-,-,-,\"-\",\"-\",\"-\",\"-\",\"-\",\"-\",79.30,0,-,\"-\",\"-\",-,\"-\",-,-,\"-\",\"-\",-,-,\"-\"> -"
| append [ makeresults | eval _raw="7/30/19 1:29:42.000 PM Jul 30 13:29:42 10.140.24.233 Jul 30 13:29:42 Access_Logs_Splunk: Info: 1564511379.248 324 10.140.10.21 TCP_MISS/206 1587824 GET http://r14---sn-bvvbaxjpl.gvt1.com/edgedl/release2/chrome/AOnIEhGH7WaH0jVMgWzb_TU_76.0.3809.87/76.0.... \"DOL\dingels@KDOL_Web_Auth\" DIRECT/r14---sn-bvvbax-hjpl.gvt1.com application/octet-stream DEFAULT_CASE_12-KDOL_Access_Policy-KDOL_Web_Identity-NONE-NONE-NONE-DefaultGroup <-,-,-,\"-\",-,-,-,-,\"-\",-,-,-,\"-\",-,-,\"-\",\"-\",-,-,-,-,\"-\",\"-\",\"-\",\"-\",\"-\",\"-\",39205.53,0,-,\"-\",\"-\",-,\"-\",-,-,\"-\",\"-\",-,-,\"-\"> -" ]
| append [ makeresults | eval _raw="7/30/19 1:33:50.000 PM Jul 30 13:33:50 10.140.24.234 Jul 30 13:33:50 Access_Logs_Splunk: Info: 1564511627.609 1779 10.140.4.14 TCP_MISS/200 3461685 GET http://workforce-ks.com/wp-content/uploads/2015/05/08.01.2019-One-Stop-Advisory-Council-Meeting-Pack... \"DOL\nstruckhoff@KDOL_Web_Auth\" DIRECT/workforce-ks.com application/pdf DEFAULT_CASE_12-Social_Media_Access_Policy-KDOL_Web_Identity-NONE-NONE-NONE-DefaultGroup <-,-,-,\"-\",-,-,-,-,\"-\",-,-,-,\"-\",-,-,\"-\",\"-\",-,-,-,-,\"-\",\"-\",\"-\",\"-\",\"-\",\"-\",15566.88,0,-,\"-\",\"-\",-,\"-\",-,-,\"-\",\"-\",-,-,\"-\"> -" ]
| append [ makeresults | eval _raw="7/30/19 1:33:11.000 PM  Jul 30 13:33:11 10.140.24.234 Jul 30 13:33:11 Access_Logs_Splunk: Info: 1564511588.080 44 10.140.4.104 TCP_MISS/200 35005 GET http://ts.intra.dol.ks.gov/Files/PDF/EmployeeRecognition.pdf \"DOL\njanco@KDOL_Web_Auth\" DIRECT/ts.intra.dol.ks.gov application/pdf DEFAULT_CASE_12-Social_Media_Access_Policy-KDOL_Web_Identity-NONE-NONE-NONE-DefaultGroup <-,-,-,\"-\",-,-,-,-,\"-\",-,-,-,\"-\",-,-,\"-\",\"-\",-,-,-,-,\"-\",\"-\",\"-\",\"-\",\"-\",\"-\",6364.55,0,-,\"-\",\"-\",-,\"-\",-,-,\"-\",\"-\",-,-,\"-\"> -" ]
| rex field=_raw     "GET (?<Full_URL>https?://[^\s]+)"
| rex field=Full_URL "(?<URL>https?://[^/]+/)"
| rex field=Full_URL "/(?<Filename>[^/]+)(?<Filetype>\.(gif|exe|pdf))\??"
| table URL Filename Filetype

These are the results:

URL                                    Filename                                              Filetype
http://dmp.truoptik.com/               sync                                                  .gif
http://r14---sn-bvvbaxjpl.gvt1.com/    76.0.3809.87_75.0.3770.142_chrome_updater             .exe
http://workforce-ks.com/               08.01.2019-One-Stop-Advisory-Council-Meeting-Packet   .pdf
http://ts.intra.dol.ks.gov/            EmployeeRecognition                                   .pdf

Assumptions:
- URL is always preceded by "GET " and does not contain spaces.
- Filename does not contain spaces or "/" symbol
- Filetype is either .gif, .exe, or .pdf. You can add | and the new extension after gif|exe|pdf to add others.

Cheers,
Jacob

View solution in original post

Motivator

Greetings @Vfinney,

Please try the following run-anywhere search.

| makeresults
| eval _raw="7/30/19 1:29:52.000 PM Jul 30 13:29:52 10.140.24.233 Jul 30 13:29:52 Access_Logs_Splunk: Info: 1564511389.352 80 10.140.6.27 TCP_MISS/204 793 GET http://dmp.truoptik.com/239e300e6dca3b53/sync.gif?dm=ib.adnxs.com&fck=6298473322644763945 \"DOL\sroth@KDOL_Web_Auth\" DIRECT/dmp.truoptik.com - DEFAULT_CASE_12-KDOL_Access_Policy-KDOL_Web_Identity-NONE-NONE-NONE-DefaultGroup <-,-,-,\"-\",-,-,-,-,\"-\",-,-,-,\"-\",-,-,\"-\",\"-\",-,-,-,-,\"-\",\"-\",\"-\",\"-\",\"-\",\"-\",79.30,0,-,\"-\",\"-\",-,\"-\",-,-,\"-\",\"-\",-,-,\"-\"> -"
| append [ makeresults | eval _raw="7/30/19 1:29:42.000 PM Jul 30 13:29:42 10.140.24.233 Jul 30 13:29:42 Access_Logs_Splunk: Info: 1564511379.248 324 10.140.10.21 TCP_MISS/206 1587824 GET http://r14---sn-bvvbaxjpl.gvt1.com/edgedl/release2/chrome/AOnIEhGH7WaH0jVMgWzb_TU_76.0.3809.87/76.0.... \"DOL\dingels@KDOL_Web_Auth\" DIRECT/r14---sn-bvvbax-hjpl.gvt1.com application/octet-stream DEFAULT_CASE_12-KDOL_Access_Policy-KDOL_Web_Identity-NONE-NONE-NONE-DefaultGroup <-,-,-,\"-\",-,-,-,-,\"-\",-,-,-,\"-\",-,-,\"-\",\"-\",-,-,-,-,\"-\",\"-\",\"-\",\"-\",\"-\",\"-\",39205.53,0,-,\"-\",\"-\",-,\"-\",-,-,\"-\",\"-\",-,-,\"-\"> -" ]
| append [ makeresults | eval _raw="7/30/19 1:33:50.000 PM Jul 30 13:33:50 10.140.24.234 Jul 30 13:33:50 Access_Logs_Splunk: Info: 1564511627.609 1779 10.140.4.14 TCP_MISS/200 3461685 GET http://workforce-ks.com/wp-content/uploads/2015/05/08.01.2019-One-Stop-Advisory-Council-Meeting-Pack... \"DOL\nstruckhoff@KDOL_Web_Auth\" DIRECT/workforce-ks.com application/pdf DEFAULT_CASE_12-Social_Media_Access_Policy-KDOL_Web_Identity-NONE-NONE-NONE-DefaultGroup <-,-,-,\"-\",-,-,-,-,\"-\",-,-,-,\"-\",-,-,\"-\",\"-\",-,-,-,-,\"-\",\"-\",\"-\",\"-\",\"-\",\"-\",15566.88,0,-,\"-\",\"-\",-,\"-\",-,-,\"-\",\"-\",-,-,\"-\"> -" ]
| append [ makeresults | eval _raw="7/30/19 1:33:11.000 PM  Jul 30 13:33:11 10.140.24.234 Jul 30 13:33:11 Access_Logs_Splunk: Info: 1564511588.080 44 10.140.4.104 TCP_MISS/200 35005 GET http://ts.intra.dol.ks.gov/Files/PDF/EmployeeRecognition.pdf \"DOL\njanco@KDOL_Web_Auth\" DIRECT/ts.intra.dol.ks.gov application/pdf DEFAULT_CASE_12-Social_Media_Access_Policy-KDOL_Web_Identity-NONE-NONE-NONE-DefaultGroup <-,-,-,\"-\",-,-,-,-,\"-\",-,-,-,\"-\",-,-,\"-\",\"-\",-,-,-,-,\"-\",\"-\",\"-\",\"-\",\"-\",\"-\",6364.55,0,-,\"-\",\"-\",-,\"-\",-,-,\"-\",\"-\",-,-,\"-\"> -" ]
| rex field=_raw     "GET (?<Full_URL>https?://[^\s]+)"
| rex field=Full_URL "(?<URL>https?://[^/]+/)"
| rex field=Full_URL "/(?<Filename>[^/]+)(?<Filetype>\.(gif|exe|pdf))\??"
| table URL Filename Filetype

These are the results:

URL                                    Filename                                              Filetype
http://dmp.truoptik.com/               sync                                                  .gif
http://r14---sn-bvvbaxjpl.gvt1.com/    76.0.3809.87_75.0.3770.142_chrome_updater             .exe
http://workforce-ks.com/               08.01.2019-One-Stop-Advisory-Council-Meeting-Packet   .pdf
http://ts.intra.dol.ks.gov/            EmployeeRecognition                                   .pdf

Assumptions:
- URL is always preceded by "GET " and does not contain spaces.
- Filename does not contain spaces or "/" symbol
- Filetype is either .gif, .exe, or .pdf. You can add | and the new extension after gif|exe|pdf to add others.

Cheers,
Jacob

View solution in original post