Dashboards & Visualizations

How to extract the field from _raw logs

aditsss
Motivator

Hi Everyone,

Below are my logs:

2020-10-12 23:52:22,228 INFO [Web Server-25646] o.a.n.w.s.Filter Attempting request for (<drath20><CN=50088.phx.aexp.com, OU=Middleware Utilities, L=Phoenix, ST=Arizona, C=US>) GET https://abcdefghk50088.phx.exp.com:9091/api/flow/process-groups/ef451556-016d-1000-0000-00005025535d (source ip)

2020-10-12 23:52:22,228 INFO [Web Server-25646] o.a.n.w.s.Filter Attempting request for (<drath20><CN=50088.phx.aexp.com, OU=Middleware Utilities, L=Phoenix, ST=Arizona, C=US>) GET https://abcdefghk50088.phx.exp.com:9091/api/flow/process-groups/22b93621-b347-1f81-964a-a87c2019828c (source ip:)

 

I want to extract the highlighted field as Request URL. Can someone guide me how I can extract it.

Thanks in advance.

Labels (2)
0 Karma
1 Solution

gcusello
SplunkTrust
SplunkTrust

Hi @aditsss,

it was an example, try this:

index=abc sourcetype=xyz source="user.log" process-groups (Request_Type ="*") (ADS_Id ="*")
| rex "\)\s+\w+\s+(?<Request_URL>[^ ]+)\s+\("
| rex field=Request_URL "(?<Request_URL>.*)\/.*$"
| table Request_URL

Ciao.

Giuseppe

View solution in original post

Dalganjan
Observer
Log recieved as : 2022-03-30T15:00:24.355Z method = GET url = /app/mvtf9U2tAQ6fesPgoufB?detail=verbose status = 400 response-time = 28.628 ms referrer = - user-agent = rest-client/2.0.2 (linux-gnu x86_64) ruby/2.4.2p198 remote-addr = 52.215.37.123

 

Need to collect distinct "url".

Please provide solution, thanks 

Tags (3)
0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @aditsss,

You can use a regex, something like this:

| rex "GET\s+(?<Request_URL>[^ ]+)"

that you can test at https://regex101.com/r/VZQjXU/1

Ciao.

Giuseppe

0 Karma

aditsss
Motivator

@gcusello 

| rex "GET\s+(?<Request_URL>[^ ]+)"

 

Its not always GET before. Is there any general way to extract it by not using GET. 

I want till here:

https://abcdefghu50089.phx.aexp.com:9091/api/process-groups/abd637bd-ec30-1447-0000-00002c0daf8d

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @aditsss,

The only way to extract a field is to identify a rule (a regex).

If in your logs you could also have POST instead GET or another word, you have to find a rule:

can you say that you always have in order:

  • open parentesys,
  • GET or POST or another word,
  • the URL to extract,
  • closed parenthesis.

If this is your situation, you could try something like this:

| rex "\)\s+\w+\s+(?<Request_URL>[^ ]+)\s+\("

that you can test at https://regex101.com/r/VZQjXU/2

Ciao.

Giuseppe

0 Karma

aditsss
Motivator

@bowesmana @gcusello 

 

some more logs:

2020-10-13 00:19:06,574 INFO [24962] o.a.n.w.s.Filter Attempting request for (<agane22><CN=com, OU=Middleware Utilities,L=Phoenix, ST=Arizona, C=US>) GET https://abcdefghj50089.phx.xp.com:9091/api/flow/process-groups/00cbae21-0174-1000-ffff-ffffeaa916f7/controller-services (source )

2020-10-13 00:19:06,574 INFO [24962] o.a.n.w.s.Filter Attempting request for (<agane22><CN=com, OU=Middleware Utilities,L=Phoenix, ST=Arizona, C=US>) POST https://abcdefghj50089.phx.xp.com:9091/api/flow/process-groups/20d3839e-0175-1000-0000-00007664d52e/snippet-instance (source )

I want to extract Id's . Can someone provide me the exact regex for it?

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @aditsss,

you could use two times the rex command, something like this:

| makeresults 
| eval my_field="2020-10-13 00:19:06,574 INFO [24962] o.a.n.w.s.Filter Attempting request for (<agane22><CN=com, OU=Middleware Utilities,L=Phoenix, ST=Arizona, C=US>) GET https://abcdefghj50089.phx.xp.com:9091/api/flow/process-groups/00cbae21-0174-1000-ffff-ffffeaa916f7/controller-services (source )"
| rex field=my_field "\)\s+\w+\s+(?<Request_URL>[^ ]+)\s+\("
| rex field=Request_URL "(?<Request_URL>.*)\/.*$"
| table Request_URL

Ciao.

Giuseppe

aditsss
Motivator

@gcusello 

I tried with below. Not able to see any Request_URL

index=abc sourcetype=xyz source="user.log" process-groups (Request_Type ="*") (ADS_Id ="*")| convert timeformat="%Y-%m-%d" ctime(_time) AS Date| rex field=my_field "\)\s+\w+\s+(?<Request_URL>[^ ]+)\s+\("
| rex field=Request_URL "(?<Request_URL>.*)\/.*$"
| table Request_URL

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @aditsss,

it was an example, try this:

index=abc sourcetype=xyz source="user.log" process-groups (Request_Type ="*") (ADS_Id ="*")
| rex "\)\s+\w+\s+(?<Request_URL>[^ ]+)\s+\("
| rex field=Request_URL "(?<Request_URL>.*)\/.*$"
| table Request_URL

Ciao.

Giuseppe

aditsss
Motivator

Can Someone Guide me on this please.

Can someone provide me the exact regex.

0 Karma

inventsekar
SplunkTrust
SplunkTrust

Hi @aditsss you want to extract the red-color'ed field or the full URL?


POST
 https://abcdefghj50089.phx.xp.com:9091/api/flow/process-groups/20d3839e-0175-1000-0000-00007664d52e/snippet-instance (source )

 

0 Karma

aditsss
Motivator

@inventsekar 

I want to extract the RequestURL that I have highlighted in Blue.

GET https://abcdefghj50089.phx.xp.com:9091/api/flow/process-groups/00cbae21-0174-1000-ffff-ffffeaa916f7/controller-services (source )

POST https://abcdefghj50089.phx.xp.com:9091/api/flow/process-groups/65883f7c-6c05-1bad-b438-26aa4783bf51/snippet-instance (source )

DELETE https://abcdefgh50090.phx.aexp.com:9091/api/process-groups/e9123704-0d3a-10fd-937d-ec61cdddff63(source)

 

0 Karma

inventsekar
SplunkTrust
SplunkTrust

Hi @aditsss ... @@gcusello 's rex is working perfectly:

| makeresults 
| eval my_field="2020-10-13 00:19:06,574 INFO [24962] o.a.n.w.s.Filter Attempting request for (<agane22><CN=com, OU=Middleware Utilities,L=Phoenix, ST=Arizona, C=US>) GET https://abcdefghj50089.phx.xp.com:9091/api/flow/process-groups/00cbae21-0174-1000-ffff-ffffeaa916f7/controller-services (source )"
| rex field=my_field "\)\s+\w+\s+(?<Request_URL>[^ ]+)\s+\("
| rex field=Request_URL "(?<Request_URL>.*)\/.*$"
| table Request_URL

rex-url.jpg

0 Karma

aditsss
Motivator

@bowesmana  I tried with this 

rex field=_raw ".*(?<RequestURL>http[^ $]*)"

But its picking after the Id also like below:

https://abcdefghu50089.phx.aexp.com:9091/api/process-groups/abd637bd-ec30-1447-0000-00002c0daf8d/pro...

It should take always the Request_Url till here:

https://abcdefghu50089.phx.aexp.com:9091/api/process-groups/abd637bd-ec30-1447-0000-00002c0daf8d

0 Karma

bowesmana
SplunkTrust
SplunkTrust

You'll need to keep playing with how to define the rule that matches your URL syntax, e.g. this one will match the example you've given now

https://regex101.com/r/tnieWO/1

 

0 Karma

bowesmana
SplunkTrust
SplunkTrust

You can use this rex statement - this example will run in the search bar

| makeresults
| eval _raw="2020-10-12 23:52:22,228 INFO [Web Server-25646] o.a.n.w.s.Filter Attempting request for (<drath20><CN=50088.phx.aexp.com, OU=Middleware Utilities, L=Phoenix, ST=Arizona, C=US>) GET https://abcdefghk50088.phx.exp.com:9091/api/flow/process-groups/22b93621-b347-1f81-964a-a87c2019828c (source ip:)"
| rex field=_raw ".*(?<RequestURL>http[^ $]*)"

The regex will terminate on first space or end of event, but you may need to test that against more than the example urls.

 

 

0 Karma
Get Updates on the Splunk Community!

Detecting Remote Code Executions With the Splunk Threat Research Team

REGISTER NOWRemote code execution (RCE) vulnerabilities pose a significant risk to organizations. If ...

Observability | Use Synthetic Monitoring for Website Metadata Verification

If you are on Splunk Observability Cloud, you may already have Synthetic Monitoringin your observability ...

More Ways To Control Your Costs With Archived Metrics | Register for Tech Talk

Tuesday, May 14, 2024  |  11AM PT / 2PM ET Register to Attend Join us for this Tech Talk and learn how to ...