topic Re: Extract fields from URL in Splunk Search

Extract fields from URL

vrmandadi — Tue, 29 Sep 2020 15:27:44 GMT

Hello,

I have the below URL Types and I am trying to extract 3 fields from them

LIVE as form
hls as rule
TWAMCPH as mode

URL
Example URL1:
http://linear-scope010.abc.com/LIVE/1002/hls/ae/TWAMCPH/98.m3u8

ExampleURL2:
http://mmdai-linear-west-03.abc.com/linear-scope010.abc.com/LIVE/1008/hls/ae/Nat_HD/.swn71c39e69-9b76-45a0-a2da-005056b23b1dapple2apple/.rate_2737280/index_v_2737280_6.m3u8?nw=376521&prof=376521:twc_hls_live&mode=live&vdur=600&caid=NGC_LIVE&csid=stva_android_ph_live&vcid=369573a4-4f5b-3aa7-a42b-2eec0477efda&z5=79912&ads=VAST_LIVE&tagset_name=VAST&_fw_lpu=http://linear-scope010.abc.com/LIVE/1008/hl...

Help with regex to extract a particular field- I want to extract the numbers after as Identity which are in BOLD in the below sample event
Identity: 33752527

Sample event:

19 Aug 2017 01:09:41 [WARN ] http_srv: DONE 5018465 0.010309 404[Not Found] UNKNOWN-ID 69.134.235.12:12113 GET http://mmdai-linear-west-03.abc.com/linear-scope010.abc.com/LIVE/1008/hls/ae/Nat_HD/.swn71c39e69-9b76-45a0-a2da-005056b23b1dapple2apple/.rate_2737280/index_v_2737280_6.m3u8?nw=376521&prof=376521:twc_hls_live&mode=live&vdur=600&caid=NGC_LIVE&csid=stva_android_ph_live&vcid=369573a4-4f5b-3aa7-a42b-2eec0477efda&z5=79912&ads=VAST_LIVE&tagset_name=VAST&_fw_lpu=http://linear-scope010.abc.com/LIVE/1008/hl... (id 33752527)

Sample event 2:

19 Aug 2017 01:16:22 [WARN ] http_cli: Origin latency exceeded threshold: 0.068990 seconds GET Status: 200[OK] Bytes: 10079 Origin URL: http://linear-scope010.abc.com/LIVE/1008/hls/ae/FX_HD/98.m3u8 refReqId 34040636 reqWait 0 (id 3291791648)

Sample event 3:

19 Aug 2017 01:16:22 [WARN ] http_srv: Total latency exceeded threshold: 0.054962 seconds (internal 0.055000 s) origin 0.000000 seconds MCHit 0 Status: 404 IP: 69.134.235.13:9290 URL: http://mmdai-linear-west-03.abc.com/linear-scope010.abc.com/LIVE/1007/hls/ae/MSNBC_HD/.swn0f1c1094-9a82-4a38-9396-005056b23b1dapple2apple/.rate_730944/index_v_730944_2.m3u8?nw=376521&prof=376521:twc_hls_live&mode=live&vdur=600&caid=MSNBC_LIVE&csid=stva_roku_tv_live&vcid=54550573-abff-36b4-b9aa-78deceeccdc6&z5=76051&ads=VAST_LIVE&tagset_n... (id 34040648)

Re: Extract fields from URL

gcusello — Sat, 19 Aug 2017 06:28:50 GMT

Hi vrmandadi,
regex to extract the second one you asked it's no difficoult:

\(id\s(?<identity>\d+)\)

test it at https://regex101.com/r/F1dWey/1

The problem is the first one because you have two different paths with a different number of segments before the form field:
If your sure that after "LIVE" there's a number, you can use this one:

\/(?<form>[^\/]*)\/\d+\/(?<rule>[^\/]*)\/\w+\/(?<mode>[^\/]*)

test it at https://regex101.com/r/HaU7mr/1

Bye.
Giuseppe

Re: Extract fields from URL

niketn — Sat, 19 Aug 2017 12:07:12 GMT

@vrmandadi, your field extraction is a bit complicated only because you are looking at two different log patterns http_srv and http_cli. Are these both coming from same log (or source/sourcetype), or is it your query which has brought them together? You use case would be simpler if these are two separate sources or sourcetypes (so please confirm).

In case both the type of events are in the same source/sourcetype, you can try the following

 <YourBaseSearch>
| rex field=_raw "http:\/\/(?<URL>[^\s]+)\s"
| rex field=_raw "\(id\s(?<id>\d+)\)"
| eval URL=split(URL,"/")
| table id URL
| eval index=if(match(mvindex(URL,1),".com"),"1,2,3","0,1,2")
| eval index=split(index,",")
| eval firstIndex=mvindex(index,0)
| eval secondIndex=mvindex(index,1)
| eval thirdIndex=mvindex(index,2)
| eval form=mvindex(URL,firstIndex)
| eval rule=mvindex(URL,secondIndex)
| eval mode=mvindex(URL,thirdIndex)
| table id form mode rule

Re: Extract fields from URL

vrmandadi — Sat, 19 Aug 2017 16:09:55 GMT

Hello niketnilay,

I am trying to extract those fields and create new fields using the IFX, but its not successful since they dont follow a pattern and they are from same sourcetype.

I want help with the regex to extract
LIVE as form
hls as rule
TWAMCPH as mode ..Can you please help with regex for each field to extract

Re: Extract fields from URL

vrmandadi — Sat, 19 Aug 2017 16:18:08 GMT

Hello Giuseppe,

I am using IFX to extract separate fields for each of them,Can you please help me with regex for each of the field like the one you mentioned for ID

Thanks for your time

Re: Extract fields from URL

niketn — Sat, 19 Aug 2017 16:23:46 GMT

You can create a regex for URL and id since the regular expressions remains the same as that in the rex command.

You can move eval section to macro URL as input. However, give the two different types of events for http_srv and http_cli, I was not able to find same pattern applicable for both.

Re: Extract fields from URL

gcusello — Tue, 29 Sep 2020 15:27:49 GMT

Hi vrmandadi,
you don't need to create a regex for each field, also using IFX you can have a single regex to extract more fields.

Use IFX and when you arrive to the extraction, there's a link to show regex, click on it and then click on modify regex, so you can insert my regex with all the fields.

Otherwise, if you already have regex, it's easier to create a new field from web interface but not using IFX, go in [Settings -- Fields -- Fields Extractions -- New], insert:

Destination App,
a name for the extraction (e.g. form_rule_mode),
sourcetype (it's the more important thing!)
and finally the full regex; in few minutes you'll have your fields.

Remember that fields will not be immediately available, but after a few minutes (I don't know why!).
Bye.
Giuseppe

Re: Extract fields from URL

vrmandadi — Sat, 19 Aug 2017 19:59:52 GMT

Thank You Giuseppe.Thanks a lot