Splunk Search

Inconsistent number of spaces in a space-delimited event.

rturk
Builder

Oh hai.

So I have some logs from a web cache. Here's an example (note the spaces between 'TimeStamp' & 'Operation' in the header):

TimeStamp        Operation Priority URL
1281654000.385657  refreshed 0.7850 http://xxx.xxx.xxx/drm4/OnlineMovies/DWS/28/836/summersam_270x390.jpg.http
#Number of transaction records: 1

My props.conf for this sourcetype is:

[cache_content]
pulldown_type=true
KV_MODE=none
SHOULD_LINEMERGE=false
TZ=Australia/Melbourne
TRANSFORMS-toNull=cache_content_header,cache_content_comment
REPORT-cacheContentFields=cache_content_fields

My transforms.conf is as follows:

[cache_content_header]
REGEX = ^T
DEST_KEY = queue
FORMAT = nullQueue

[cache_content_comment]
REGEX = ^#
DEST_KEY = queue
FORMAT = nullQueue

[cache_content_fields]
DELIMS = " "
FIELDS = "TimeStamp", "Operation", "Priority", "URL"


Given this example, the fields are extracted as follows (visible in the field picker):

TimeStamp = 1281654000.385657 (yep that's fine)
Operation... Doesn't show up in list of available fields (default behavior with zero value?)
Priority = refreshed
URL = 0.7850

I've checked for special characters (eg. tabs) in vi and there are none, so it looks as though the number of spaces is the issue, with nothing being allocated to the Operation field, and this throwing the subsequent extractions out.

The transforms.conf doco doesn't cover off the use of REGEX's in the DELIM statement, so I'm wondering what I can do here.

As always, any help would be greatly appreciated 🙂

Tags (1)
0 Karma
1 Solution

Hajime
Path Finder

Hello.

I think it doesn't work because there are multi spaces between "TimeStamp" and "Operation".

So, try this one.

In props.conf:

[cache_content_fields]
REGEX = ^([^\s]+)\s+([^\s]+)\s([^\s]+)\s(.*)$
FORMAT = TimeStamp::"$1" Operation::"$2" Priority::"$3" URL::"$4"

View solution in original post

Hajime
Path Finder

Hello.

I think it doesn't work because there are multi spaces between "TimeStamp" and "Operation".

So, try this one.

In props.conf:

[cache_content_fields]
REGEX = ^([^\s]+)\s+([^\s]+)\s([^\s]+)\s(.*)$
FORMAT = TimeStamp::"$1" Operation::"$2" Priority::"$3" URL::"$4"

rturk
Builder

Your REGEX-fu is stronger than my REGEX-fu! Thanks so much 🙂

0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.
Get Updates on the Splunk Community!

Tech Talk Recap | Mastering Threat Hunting

Mastering Threat HuntingDive into the world of threat hunting, exploring the key differences between ...

Observability for AI Applications: Troubleshooting Latency

If you’re working with proprietary company data, you’re probably going to have a locally hosted LLM or many ...

Splunk AI Assistant for SPL vs. ChatGPT: Which One is Better?

In the age of AI, every tool promises to make our lives easier. From summarizing content to writing code, ...