Dashboards & Visualizations

How to extract the field from _raw logs

aditsss
Motivator

Hi Everyone,

Below are my logs:

2020-10-12 23:52:22,228 INFO [Web Server-25646] o.a.n.w.s.Filter Attempting request for (<drath20><CN=50088.phx.aexp.com, OU=Middleware Utilities, L=Phoenix, ST=Arizona, C=US>) GET https://abcdefghk50088.phx.exp.com:9091/api/flow/process-groups/ef451556-016d-1000-0000-00005025535d (source ip)

2020-10-12 23:52:22,228 INFO [Web Server-25646] o.a.n.w.s.Filter Attempting request for (<drath20><CN=50088.phx.aexp.com, OU=Middleware Utilities, L=Phoenix, ST=Arizona, C=US>) GET https://abcdefghk50088.phx.exp.com:9091/api/flow/process-groups/22b93621-b347-1f81-964a-a87c2019828c (source ip:)

 

I want to extract the highlighted field as Request URL. Can someone guide me how I can extract it.

Thanks in advance.

Labels (2)
0 Karma
1 Solution

gcusello
SplunkTrust
SplunkTrust

Hi @aditsss,

it was an example, try this:

index=abc sourcetype=xyz source="user.log" process-groups (Request_Type ="*") (ADS_Id ="*")
| rex "\)\s+\w+\s+(?<Request_URL>[^ ]+)\s+\("
| rex field=Request_URL "(?<Request_URL>.*)\/.*$"
| table Request_URL

Ciao.

Giuseppe

View solution in original post

Dalganjan
Observer
Log recieved as : 2022-03-30T15:00:24.355Z method = GET url = /app/mvtf9U2tAQ6fesPgoufB?detail=verbose status = 400 response-time = 28.628 ms referrer = - user-agent = rest-client/2.0.2 (linux-gnu x86_64) ruby/2.4.2p198 remote-addr = 52.215.37.123

 

Need to collect distinct "url".

Please provide solution, thanks 

Tags (3)
0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @aditsss,

You can use a regex, something like this:

| rex "GET\s+(?<Request_URL>[^ ]+)"

that you can test at https://regex101.com/r/VZQjXU/1

Ciao.

Giuseppe

0 Karma

aditsss
Motivator

@gcusello 

| rex "GET\s+(?<Request_URL>[^ ]+)"

 

Its not always GET before. Is there any general way to extract it by not using GET. 

I want till here:

https://abcdefghu50089.phx.aexp.com:9091/api/process-groups/abd637bd-ec30-1447-0000-00002c0daf8d

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @aditsss,

The only way to extract a field is to identify a rule (a regex).

If in your logs you could also have POST instead GET or another word, you have to find a rule:

can you say that you always have in order:

  • open parentesys,
  • GET or POST or another word,
  • the URL to extract,
  • closed parenthesis.

If this is your situation, you could try something like this:

| rex "\)\s+\w+\s+(?<Request_URL>[^ ]+)\s+\("

that you can test at https://regex101.com/r/VZQjXU/2

Ciao.

Giuseppe

0 Karma

aditsss
Motivator

@bowesmana @gcusello 

 

some more logs:

2020-10-13 00:19:06,574 INFO [24962] o.a.n.w.s.Filter Attempting request for (<agane22><CN=com, OU=Middleware Utilities,L=Phoenix, ST=Arizona, C=US>) GET https://abcdefghj50089.phx.xp.com:9091/api/flow/process-groups/00cbae21-0174-1000-ffff-ffffeaa916f7/controller-services (source )

2020-10-13 00:19:06,574 INFO [24962] o.a.n.w.s.Filter Attempting request for (<agane22><CN=com, OU=Middleware Utilities,L=Phoenix, ST=Arizona, C=US>) POST https://abcdefghj50089.phx.xp.com:9091/api/flow/process-groups/20d3839e-0175-1000-0000-00007664d52e/snippet-instance (source )

I want to extract Id's . Can someone provide me the exact regex for it?

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @aditsss,

you could use two times the rex command, something like this:

| makeresults 
| eval my_field="2020-10-13 00:19:06,574 INFO [24962] o.a.n.w.s.Filter Attempting request for (<agane22><CN=com, OU=Middleware Utilities,L=Phoenix, ST=Arizona, C=US>) GET https://abcdefghj50089.phx.xp.com:9091/api/flow/process-groups/00cbae21-0174-1000-ffff-ffffeaa916f7/controller-services (source )"
| rex field=my_field "\)\s+\w+\s+(?<Request_URL>[^ ]+)\s+\("
| rex field=Request_URL "(?<Request_URL>.*)\/.*$"
| table Request_URL

Ciao.

Giuseppe

aditsss
Motivator

@gcusello 

I tried with below. Not able to see any Request_URL

index=abc sourcetype=xyz source="user.log" process-groups (Request_Type ="*") (ADS_Id ="*")| convert timeformat="%Y-%m-%d" ctime(_time) AS Date| rex field=my_field "\)\s+\w+\s+(?<Request_URL>[^ ]+)\s+\("
| rex field=Request_URL "(?<Request_URL>.*)\/.*$"
| table Request_URL

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @aditsss,

it was an example, try this:

index=abc sourcetype=xyz source="user.log" process-groups (Request_Type ="*") (ADS_Id ="*")
| rex "\)\s+\w+\s+(?<Request_URL>[^ ]+)\s+\("
| rex field=Request_URL "(?<Request_URL>.*)\/.*$"
| table Request_URL

Ciao.

Giuseppe

aditsss
Motivator

Can Someone Guide me on this please.

Can someone provide me the exact regex.

0 Karma

inventsekar
Ultra Champion

Hi @aditsss you want to extract the red-color'ed field or the full URL?


POST
 https://abcdefghj50089.phx.xp.com:9091/api/flow/process-groups/20d3839e-0175-1000-0000-00007664d52e/snippet-instance (source )

 

0 Karma

aditsss
Motivator

@inventsekar 

I want to extract the RequestURL that I have highlighted in Blue.

GET https://abcdefghj50089.phx.xp.com:9091/api/flow/process-groups/00cbae21-0174-1000-ffff-ffffeaa916f7/controller-services (source )

POST https://abcdefghj50089.phx.xp.com:9091/api/flow/process-groups/65883f7c-6c05-1bad-b438-26aa4783bf51/snippet-instance (source )

DELETE https://abcdefgh50090.phx.aexp.com:9091/api/process-groups/e9123704-0d3a-10fd-937d-ec61cdddff63(source)

 

0 Karma

inventsekar
Ultra Champion

Hi @aditsss ... @@gcusello 's rex is working perfectly:

| makeresults 
| eval my_field="2020-10-13 00:19:06,574 INFO [24962] o.a.n.w.s.Filter Attempting request for (<agane22><CN=com, OU=Middleware Utilities,L=Phoenix, ST=Arizona, C=US>) GET https://abcdefghj50089.phx.xp.com:9091/api/flow/process-groups/00cbae21-0174-1000-ffff-ffffeaa916f7/controller-services (source )"
| rex field=my_field "\)\s+\w+\s+(?<Request_URL>[^ ]+)\s+\("
| rex field=Request_URL "(?<Request_URL>.*)\/.*$"
| table Request_URL

rex-url.jpg

0 Karma

aditsss
Motivator

@bowesmana  I tried with this 

rex field=_raw ".*(?<RequestURL>http[^ $]*)"

But its picking after the Id also like below:

https://abcdefghu50089.phx.aexp.com:9091/api/process-groups/abd637bd-ec30-1447-0000-00002c0daf8d/pro...

It should take always the Request_Url till here:

https://abcdefghu50089.phx.aexp.com:9091/api/process-groups/abd637bd-ec30-1447-0000-00002c0daf8d

0 Karma

bowesmana
SplunkTrust
SplunkTrust

You'll need to keep playing with how to define the rule that matches your URL syntax, e.g. this one will match the example you've given now

https://regex101.com/r/tnieWO/1

 

0 Karma

bowesmana
SplunkTrust
SplunkTrust

You can use this rex statement - this example will run in the search bar

| makeresults
| eval _raw="2020-10-12 23:52:22,228 INFO [Web Server-25646] o.a.n.w.s.Filter Attempting request for (<drath20><CN=50088.phx.aexp.com, OU=Middleware Utilities, L=Phoenix, ST=Arizona, C=US>) GET https://abcdefghk50088.phx.exp.com:9091/api/flow/process-groups/22b93621-b347-1f81-964a-a87c2019828c (source ip:)"
| rex field=_raw ".*(?<RequestURL>http[^ $]*)"

The regex will terminate on first space or end of event, but you may need to test that against more than the example urls.

 

 

0 Karma
Get Updates on the Splunk Community!

Registration for Splunk University is Now Open!

Are you ready for an adventure in learning?   Brace yourselves because Splunk University is back, and it's ...

Splunkbase | Splunk Dashboard Examples App for SimpleXML End of Life

The Splunk Dashboard Examples App for SimpleXML will reach end of support on Dec 19, 2024, after which no new ...

Understanding Generative AI Techniques and Their Application in Cybersecurity

Watch On-Demand Artificial intelligence is the talk of the town nowadays, with industries of all kinds ...