Getting Data In

Why is the KV pair extraction with custom delimiters not working?

Communicator

Hello all,

I'm trying to get extraction to work on a dynamic key value log.

I've tried the following without any success (open to other suggestions away from this).

Ideally the output should be:

Thread=5\=/blah/blah
Method=GET
URI=/
Protocol=HTTP/1.1
IP=1.2.3.4
Port=54809
Referer=https://referrer
field=value
.
.
.
field=value

props.conf

[testsourcetype_log]
CHARSET=UTF-8
KV_MODE=none
NO_BINARY_CHECK=true
SHOULD_LINEMERGE=false
category=Testing
description=Test KV log sourcetype
disabled=false
pulldown_type=true
REPORT-kv=kv_extraction
EXTRACT-status=^(\d{4}\-\d{2}\-\d{2}T\d{2}:\d{2}:\d{2})\s\[(?<status>\w+)

transforms.conf

[kv_extraction]
DELIMS = "]", ":"
MV_ADD=true   

log snip:

2019-03-01T09:42:01 [status] [Thread: 5=/blah/blah] [Method: GET] [URI: /blah/blah]  [Protocol: HTTP/1.1] [IP: 1.2.3.4] [Port: 54809] [Referer: https://referrer] [..] ... [..] text string here

References:
https://www.splunk.com/blog/2008/02/12/delimiter-based-key-value-pair-extraction.html
https://answers.splunk.com/answers/170826/set-delimiter.html

Thanks in advance

UPDATE 6/25:
I've tried combinations from @FrankVl, @VatsalJagani, @woodcock but it seems none of them work.

Naturally, I've restarted splunk after each change. Here is the output from btool to show that I'm not going insane

/opt/splunk/bin/splunk cmd btool props list
[testsourcetype_log]
ADD_EXTRA_TIME_FIELDS = True
ANNOTATE_PUNCT = True
AUTO_KV_JSON = true
BREAK_ONLY_BEFORE =
BREAK_ONLY_BEFORE_DATE = True
CHARSET = UTF-8
DATETIME_CONFIG = /etc/datetime.xml
DEPTH_LIMIT = 1000
HEADER_MODE =
KV_MODE = none
LEARN_MODEL = true
LEARN_SOURCETYPE = true
LINE_BREAKER_LOOKBEHIND = 100
MATCH_LIMIT = 100000
MAX_DAYS_AGO = 2000
MAX_DAYS_HENCE = 2
MAX_DIFF_SECS_AGO = 3600
MAX_DIFF_SECS_HENCE = 604800
MAX_EVENTS = 256
MAX_TIMESTAMP_LOOKAHEAD = 128
MUST_BREAK_AFTER =
MUST_NOT_BREAK_AFTER =
MUST_NOT_BREAK_BEFORE =
NO_BINARY_CHECK = true
SEGMENTATION = indexing
SEGMENTATION-all = full
SEGMENTATION-inner = inner
SEGMENTATION-outer = outer
SEGMENTATION-raw = none
SEGMENTATION-standard = standard
SHOULD_LINEMERGE = false
TRANSFORMS =
TRANSFORMS-kv = kv_extraction
TRUNCATE = 10000
category = Testing
description = Test KV log sourcetype
detect_trailing_nulls = false
disabled = false
maxDist = 100
priority =
pulldown_type = true
sourcetype =

/opt/splunk/bin/splunk cmd btool transforms list
[kv_extraction]
CAN_OPTIMIZE = True
CLEAN_KEYS = True
DEFAULT_VALUE =
DEPTH_LIMIT = 1000
DEST_KEY =
FORMAT = $1::$2
KEEP_EMPTY_VALS = False
LOOKAHEAD = 4096
MATCH_LIMIT = 100000
MV_ADD = true
REGEX = \[([^:[]+):\s+([^\]]+)]
SOURCE_KEY = _raw
WRITE_META = False

Updated props.conf

[testsourcetype_log]
CHARSET=UTF-8
KV_MODE=none
NO_BINARY_CHECK=true
SHOULD_LINEMERGE=false
category=Testing
description=Test KV log sourcetype
disabled=false
pulldown_type=true
TRANSFORMS-kv=kv_extraction

updated transforms.conf

[kv_extraction]
REGEX = \[([^:[]+):\s+([^\]]+)]
FORMAT = $1::$2
MV_ADD=true

UPDATE 6/27:

Using a clean splunk docker image, I:

  • recreated indexers, inputs, props, transforms on the docker instance (external volume)
  • stripped those files to a bare minimum
  • renamed the sourcetype (to be sure that Splunk is reading props/transforms)
  • moved the configs from being inside an app to system/local/*.conf
  • checked the knowledge object existence via the gui (new/renamed transform is listed)
  • checked the knowledge object permissions (global)
  • and restarted after each change

nada, log is being ingested but no new fields created (except for the value of thread that is field: 5 value: /blah/blah/)

current config:

$ cat system/local/inputs.conf
[default]
host = 7278c011e1e0

[monitor:///opt/splunk/var/log/testlogs/*.log]
disabled=false
sourcetype=blahblah
index = testindex

$ cat system/local/props.conf
[blahblah]
CHARSET=UTF-8
KV_MODE=none
NO_BINARY_CHECK=true
SHOULD_LINEMERGE=false
category=Testing
description=Test KV log sourcetype access
disabled=false
pulldown_type=true
TRANSFORMS-blahkv=blahkvextraction
#TRANSFORMS-replace_source = replacedefaultsource2

$ cat system/local/transforms.conf
[blahkvextraction]
FORMAT = $1::$2
MV_ADD = 1
#REGEX = \[([^:[]+):\s+([^\]]+)]
REGEX = \[([^:\]]+):\s+([^\]]+)\]

BTW: for @FrankVl, @VatsalJagani, @woodcock, thanks. I have used iterations of each of your code and strongly believe that it works. I've done variations of the below to prove that your solutions work and it does (I get one instance of field1=Thread, field2= value😞

index=testindex sourcetype=blahblah
| rex field=_raw "\[(?<field1>[^:\]]+):\s+(?<field2>[^\]]+)\]"
0 Karma
1 Solution

Communicator

Firstly, thanks to @VatsalJagani, @woodcock for your answers and a special mention to @FrankVl for persisting.

There were 3 main problems:

  1. The regex was incorrect, all three contributors provided working solutions (I upvoted all three solutions)
  2. Knowledge object sharing, inputs/props/transforms are part of an app and @FrankVl pointed out that permissions/sharing could be incorrect, which turned out to be correct. Object sharing was not set, defaulting to app only and not global sharing
  3. Field extraction should have been REPORT- but was incorrectly set to TRANSFORMS- somewhere between changes. I believe this was due to the original app being pushed down to the server overwriting any changes. I had to revert back to the original app at one stage to restore the defaults but missed the changes. This was the main reason why I built a new server to avoid pushing applications multiple times in order to troubleshoot.

View solution in original post

0 Karma

Motivator

Hello @splunked38,

Please give a shot to below transforms instead of DELIMS.

[kv_extraction]
REGEX = \[([^:\[]+):\s+([^\]]+)\]
FORMAT = $1::$2

Hope this helps!!!

Ultra Champion

You're missing the : between key and value. Also: using *? is typically not the best performing construct and is best avoided if it is possible to just use a more specific regex that doesn't require backtracking.

0 Karma

Motivator

Yeah agree with you @FrankVI. Here is the well performing regex \[([^:]+):\s*([^\]]+)\].

0 Karma

Ultra Champion

With still one small mistake (I made that myself as well initially). Have a look at what this regex does with the provided sample event: https://regex101.com/r/NtE461/1

That is why I added a \] in the character class of the first capture group. @woodcock solved it by adding a [ in there, which has roughly the same effect.

0 Karma

Motivator

Yeah right, to be on safer side I always add \ before all special characters in regex. In our new regex is using only 87 steps to complete with sample event.

0 Karma

Ultra Champion

My point was that it trips over the [status] part when using your regex. It takes status] [Thread as the first fieldname.

0 Karma

Motivator

This regex is working for me - \[([^:\[]+):\s+([^\]]+)\].

0 Karma

Ultra Champion

Yes, adding the \[ to the first capture group's negative character set solves it indeed 🙂

0 Karma

Ultra Champion

If you can't get it working with the DELIMS suggestion from @VatsalJagani then try it using REGEX:

[kv_extraction]
REGEX = \[([^:\]]+):\s+([^\]]+)\]
FORMAT = $1::$2
MV_ADD=true 

See: https://regex101.com/r/PDHjzk/1
Note: this assumes fieldnames do not contain : or ] and field values do not contain ].

Motivator

Can you try: DELIMS = "] [", ": "?

0 Karma

Ultra Champion

Really curious if that works, because I don't think DELIMS in transforms.conf is intended to contain multi-character delimiter strings. Each character is interpreted on its own as a delimeter and especially with the space occuring in both delimiters, I kind of expect this will fail.

0 Karma

Communicator

I tried @VatsalJagani's example but it didn't work (then again, it could other things as well)

0 Karma

Motivator

Try transform that I've given in my answer if that is not working then comment further issues that you are having and we'll debug issues further.
Also, what do you mean by "it could other things as well", please describe.

0 Karma

Motivator

Yeah right my bad, it will not work. Instead use REGEX from transforms as suggested in answer.

0 Karma