Getting Data In

How to assign unique field names to JSON payload and capture multiple occurrences at index time?

beetlegeuse
Path Finder

I'm currently indexing a JSON payload that looks like this (snippet):

"data":[{"dimensions":["HTTP_CHECK-F009EA2B6AA8E2C0","SYNTHETIC_LOCATION-833A207E28766E49"],"dimensionMap":{"dt.entity.synthetic_location":"SYNTHETIC_LOCATION-833A207E28766E49","dt.entity.http_check":"HTTP_CHECK-F009EA2B6AA8E2C0"},"timestamps":[1617467520000],"values":[186]},{"dimensions":["HTTP_CHECK-F06A1F4F9C3252AD","SYNTHETIC_LOCATION-1D85D445F05E239A"],"dimensionMap":{"dt.entity.synthetic_location":"SYNTHETIC_LOCATION-1D85D445F05E239A","dt.entity.http_check":"HTTP_CHECK-F06A1F4F9C3252AD"},"timestamps":[1617467520000],"values":[187]},{"dimensions":["HTTP_CHECK-F06A1F4F9C3252AD","SYNTHETIC_LOCATION-833A207E28766E49"],"dimensionMap":{"dt.entity.synthetic_location":"SYNTHETIC_LOCATION-833A207E28766E49","dt.entity.http_check":"HTTP_CHECK-F06A1F4F9C3252AD"},"timestamps":[1617467520000],"values":[188]}

This is being collected by a REST API modular input, and is assigned to a specific sourcetype called "smoketest_json_dyn_tcp". Similar inputs are configured with unique sourcetype names; they are making REST calls to the same destination to collect different metrics. Since the same field names are being returned by the various calls, it makes for quite a conundrum when I'm trying to sort out what value belongs to what metric. The conventional way of assigning field names via extraction doesn't work, as only the first occurrence of the field/value pair is returned; as noted in my sample data, more than one occurrence exists.

To make my life easier, I'd like to assign unique field names to the values during index time, using props.conf and transforms.conf. This is what I have in place currently:

props.conf:

[smoketest_json_dyn_tcp]

#TZ = US/Eastern
#TZ = EST5EDT
INDEXED_EXTRACTIONS = json
KV_MODE = none
DATETIME_CONFIG = CURRENT
SHOULD_LINEMERGE = false
TRUNCATE = 200000

REPORT-mv_jdt = mv_jdt

transforms.conf:

[mv_jdt]

REGEX = \"dt.entity.synthetic_location\":\"(\w+)\",\"dt.entity.http_check\":\"(\
w+)\",\"timestamps\":\[(\d+)\],\"values\":\[(\d+)\]
FORMAT = testLocation::$1 testName::$2 unixTimeStamp::$3 TCPconnectTime::$4
MV_ADD = true
REPEAT_MATCH = true

Unfortunately, this is not working for me. I've also tried the following in transforms.conf...

[mv_jdt]

REGEX = \"dt.entity.synthetic_location\":\"(?<testLocation>\w+)\",\"dt.entity.h
ttp_check\":\"(?<testName>\w+)\",\"timestamps\":\[(?<unixTimeStamp>\d+)\],\"valu
es\":\[(?<TCPconnectTime>\d+)\]
MV_ADD = true
REPEAT_MATCH = true

...but still no luck. Is what I'm attempting possible? If so, what am I missing in my stanzas?

Thank you for any assistance provided!

 

Labels (3)
0 Karma
1 Solution

tscroggins
Influencer

@beetlegeuse 

I would recommend using Dyantrace Add-on for Splunk, but as I recall, it uses the Timeseries v1 API and not the Metrics v2 API.

You're already using INDEXED_EXTRACTIONS = JSON, so the _raw JSON field names should be available directly.

sourcetype=smoketest_json_dyn_tcp "data{}.dimensionMap.dt.entity.synthetic_location"="SYNTHETIC_LOCATION-1D85D445F05E239A"

or:

| tstats count where sourcetype=smoketest_json_dyn_tcp "data{}.dimensionMap.dt.entity.synthetic_location"="SYNTHETIC_LOCATION-833A207E28766E49" "data{}.dimensionMap.dt.entity.http_check"=* by "data{}.dimensionMap.dt.entity.synthetic_location" "data{}.dimensionMap.dt.entity.http_check"

tstats should be exceptionally fast using fields directly, but if you want to use search-time field aliases, you could try e.g.:

# props.conf
[smoketest_json_dyn_tcp]
FIELDALIAS-testName = "data{}.dimensionMap.dt.entity.http_check" AS testName
FIELDALIAS-testLocation = "data{}.dimensionMap.dt.entity.synthetic_location" AS testLocation

The difficulty with either solution is the data array itself. To properly analyze values by synthetic_location and http_check, you'll want to place the data points in separate events:

sourcetype=smoketest_json_dyn_tcp
| rex max_match=0 "(?:,|{\"data\":\\[)(?<data_point>{(?>[^{}]+|(?1))*})"
| table _time data_point
| mvexpand data_point
| spath input=data_point
| stats avg(values{}) as avg_value by dimensionMap.dt.entity.synthetic_location dimensionMap.dt.entity.http_check

View solution in original post

0 Karma

to4kawa
Ultra Champion

 

index=_internal | head 1 | fields _raw
| eval _raw="{\"data\":[{\"dimensions\":[\"HTTP_CHECK-F009EA2B6AA8E2C0\",\"SYNTHETIC_LOCATION-833A207E28766E49\"],\"dimensionMap\":{\"dt.entity.synthetic_location\":\"SYNTHETIC_LOCATION-833A207E28766E49\",\"dt.entity.http_check\":\"HTTP_CHECK-F009EA2B6AA8E2C0\"},\"timestamps\":[1617467520000],\"values\":[186]},{\"dimensions\":[\"HTTP_CHECK-F06A1F4F9C3252AD\",\"SYNTHETIC_LOCATION-1D85D445F05E239A\"],\"dimensionMap\":{\"dt.entity.synthetic_location\":\"SYNTHETIC_LOCATION-1D85D445F05E239A\",\"dt.entity.http_check\":\"HTTP_CHECK-F06A1F4F9C3252AD\"},\"timestamps\":[1617467520000],\"values\":[187]},{\"dimensions\":[\"HTTP_CHECK-F06A1F4F9C3252AD\",\"SYNTHETIC_LOCATION-833A207E28766E49\"],\"dimensionMap\":{\"dt.entity.synthetic_location\":\"SYNTHETIC_LOCATION-833A207E28766E49\",\"dt.entity.http_check\":\"HTTP_CHECK-F06A1F4F9C3252AD\"},\"timestamps\":[1617467520000],\"values\":[188]}]}"
| spath

 

https://docs.splunk.com/Documentation/Splunk/8.1.3/Knowledge/Configurefieldaliaseswithprops.conf

If indexed_extractions is working properly, then the field alias in props.conf is sufficient.

 

props.conf

[smoketest_json_dyn_tcp]
CHARSET=UTF-8
KV_MODE=json
SHOULD_LINEMERGE=false
category=Structured
disabled=false
pulldown_type=true
TIME_FORMAT=%s%2N
TIME_PREFIX=timestamps\":\[
LINE_BREAKER=(.){"dimensions|}(\])(})
TRANSFORMS-nulls=null1, null2

transforms.conf

[null1]
REGEX={\"data\".*
DEST_KEY=queue
FORMAT=nullQueue
[null2]
REGEX=^}.*
DEST_KEY=queue
FORMAT=nullQueue

This one may also be good.

 

 

0 Karma

beetlegeuse
Path Finder

The extractions are working correctly, so I'll apply the field aliases as previously cited by @tscroggins 

Thank you!

0 Karma

tscroggins
Influencer

@beetlegeuse 

I would recommend using Dyantrace Add-on for Splunk, but as I recall, it uses the Timeseries v1 API and not the Metrics v2 API.

You're already using INDEXED_EXTRACTIONS = JSON, so the _raw JSON field names should be available directly.

sourcetype=smoketest_json_dyn_tcp "data{}.dimensionMap.dt.entity.synthetic_location"="SYNTHETIC_LOCATION-1D85D445F05E239A"

or:

| tstats count where sourcetype=smoketest_json_dyn_tcp "data{}.dimensionMap.dt.entity.synthetic_location"="SYNTHETIC_LOCATION-833A207E28766E49" "data{}.dimensionMap.dt.entity.http_check"=* by "data{}.dimensionMap.dt.entity.synthetic_location" "data{}.dimensionMap.dt.entity.http_check"

tstats should be exceptionally fast using fields directly, but if you want to use search-time field aliases, you could try e.g.:

# props.conf
[smoketest_json_dyn_tcp]
FIELDALIAS-testName = "data{}.dimensionMap.dt.entity.http_check" AS testName
FIELDALIAS-testLocation = "data{}.dimensionMap.dt.entity.synthetic_location" AS testLocation

The difficulty with either solution is the data array itself. To properly analyze values by synthetic_location and http_check, you'll want to place the data points in separate events:

sourcetype=smoketest_json_dyn_tcp
| rex max_match=0 "(?:,|{\"data\":\\[)(?<data_point>{(?>[^{}]+|(?1))*})"
| table _time data_point
| mvexpand data_point
| spath input=data_point
| stats avg(values{}) as avg_value by dimensionMap.dt.entity.synthetic_location dimensionMap.dt.entity.http_check

0 Karma

beetlegeuse
Path Finder

The JSON indexed extraction is working correctly; I'll apply the field aliases approach to make the field names a bit more human friendly.

Also: Thank you for the SPL snippet that points out the use of "mvexpand". This will be helpful as I create visualizations and monitoring.

Get Updates on the Splunk Community!

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...

Let’s Get You Certified – Vegas-Style at .conf24

Are you ready to level up your Splunk game? Then, let’s get you certified live at .conf24 – our annual user ...