Splunk Search

How to truncate a field value after a regex pattern?

mbolostk
Explorer

How can I truncate a field value after a given pattern. For example, if I am looking at web page logs, how can I truncate everything after .html so that no parameters or variables are reported in my web page count?

Tags (2)
0 Karma
1 Solution

mbolostk
Explorer

I figured it out. Nevermind. It was based on the first post - but had to redo the regex as the uri didn't, in this case, start with http://

View solution in original post

0 Karma

mbolostk
Explorer

I figured it out. Nevermind. It was based on the first post - but had to redo the regex as the uri didn't, in this case, start with http://

0 Karma

cpetterborg
SplunkTrust
SplunkTrust

If you already have the field extracted, then you can use eval or rex to create a new field to extract the first part of the URL with something like (using eval):

eval mainpart=replace(origurl,"(.*)[?].*","\1")

Where origurl is the already extracted URL field, and ? is the ? in the URL for separating the Parameters from the rest of the URL. That will enable you to have more than .html at the end of the URL (like jpeg, js css, etc). The REX would be like the example already given by aljohnson_splunk. If your logs don't include the http:// (as many apache log files do), then your rex would need to allow for finding the URL differently from his example.

aljohnson_splun
Splunk Employee
Splunk Employee

Things that will help us help you:

  1. Post sample data
  2. Post sample search
  3. Post desired output

It sorta sounds like you want to use the rex command.
E.g.

| rex field=url_field "http://(?<url_path>.+html)"
| stats count by url_path

jrodman
Splunk Employee
Splunk Employee

For this particular goal, I would usually make the .+ be ungreedy with .+?

e.g.

| rex field=url_field "http://(?<url_path>.+?html)"
Get Updates on the Splunk Community!

Splunk Observability for AI

Don’t miss out on an exciting Tech Talk on Splunk Observability for AI! Discover how Splunk’s agentic AI ...

[Puzzles] Solve, Learn, Repeat: Dereferencing XML to Fixed-length events

This challenge was first posted on Slack #puzzles channelFor a previous puzzle, I needed a set of fixed-length ...

Stay Connected: Your Guide to December Tech Talks, Office Hours, and Webinars!

What are Community Office Hours? Community Office Hours is an interactive 60-minute Zoom series where ...