Getting Data In

How to assign unique field names to JSON payload and capture multiple occurrences at index time?

beetlegeuse
Explorer

I'm currently indexing a JSON payload that looks like this (snippet):

"data":[{"dimensions":["HTTP_CHECK-F009EA2B6AA8E2C0","SYNTHETIC_LOCATION-833A207E28766E49"],"dimensionMap":{"dt.entity.synthetic_location":"SYNTHETIC_LOCATION-833A207E28766E49","dt.entity.http_check":"HTTP_CHECK-F009EA2B6AA8E2C0"},"timestamps":[1617467520000],"values":[186]},{"dimensions":["HTTP_CHECK-F06A1F4F9C3252AD","SYNTHETIC_LOCATION-1D85D445F05E239A"],"dimensionMap":{"dt.entity.synthetic_location":"SYNTHETIC_LOCATION-1D85D445F05E239A","dt.entity.http_check":"HTTP_CHECK-F06A1F4F9C3252AD"},"timestamps":[1617467520000],"values":[187]},{"dimensions":["HTTP_CHECK-F06A1F4F9C3252AD","SYNTHETIC_LOCATION-833A207E28766E49"],"dimensionMap":{"dt.entity.synthetic_location":"SYNTHETIC_LOCATION-833A207E28766E49","dt.entity.http_check":"HTTP_CHECK-F06A1F4F9C3252AD"},"timestamps":[1617467520000],"values":[188]}

This is being collected by a REST API modular input, and is assigned to a specific sourcetype called "smoketest_json_dyn_tcp". Similar inputs are configured with unique sourcetype names; they are making REST calls to the same destination to collect different metrics. Since the same field names are being returned by the various calls, it makes for quite a conundrum when I'm trying to sort out what value belongs to what metric. The conventional way of assigning field names via extraction doesn't work, as only the first occurrence of the field/value pair is returned; as noted in my sample data, more than one occurrence exists.

To make my life easier, I'd like to assign unique field names to the values during index time, using props.conf and transforms.conf. This is what I have in place currently:

props.conf:

[smoketest_json_dyn_tcp]

#TZ = US/Eastern
#TZ = EST5EDT
INDEXED_EXTRACTIONS = json
KV_MODE = none
DATETIME_CONFIG = CURRENT
SHOULD_LINEMERGE = false
TRUNCATE = 200000

REPORT-mv_jdt = mv_jdt

transforms.conf:

[mv_jdt]

REGEX = \"dt.entity.synthetic_location\":\"(\w+)\",\"dt.entity.http_check\":\"(\
w+)\",\"timestamps\":\[(\d+)\],\"values\":\[(\d+)\]
FORMAT = testLocation::$1 testName::$2 unixTimeStamp::$3 TCPconnectTime::$4
MV_ADD = true
REPEAT_MATCH = true

Unfortunately, this is not working for me. I've also tried the following in transforms.conf...

[mv_jdt]

REGEX = \"dt.entity.synthetic_location\":\"(?<testLocation>\w+)\",\"dt.entity.h
ttp_check\":\"(?<testName>\w+)\",\"timestamps\":\[(?<unixTimeStamp>\d+)\],\"valu
es\":\[(?<TCPconnectTime>\d+)\]
MV_ADD = true
REPEAT_MATCH = true

...but still no luck. Is what I'm attempting possible? If so, what am I missing in my stanzas?

Thank you for any assistance provided!

 

Labels (3)
0 Karma
1 Solution

tscroggins
Builder

@beetlegeuse 

I would recommend using Dyantrace Add-on for Splunk, but as I recall, it uses the Timeseries v1 API and not the Metrics v2 API.

You're already using INDEXED_EXTRACTIONS = JSON, so the _raw JSON field names should be available directly.

sourcetype=smoketest_json_dyn_tcp "data{}.dimensionMap.dt.entity.synthetic_location"="SYNTHETIC_LOCATION-1D85D445F05E239A"

or:

| tstats count where sourcetype=smoketest_json_dyn_tcp "data{}.dimensionMap.dt.entity.synthetic_location"="SYNTHETIC_LOCATION-833A207E28766E49" "data{}.dimensionMap.dt.entity.http_check"=* by "data{}.dimensionMap.dt.entity.synthetic_location" "data{}.dimensionMap.dt.entity.http_check"

tstats should be exceptionally fast using fields directly, but if you want to use search-time field aliases, you could try e.g.:

# props.conf
[smoketest_json_dyn_tcp]
FIELDALIAS-testName = "data{}.dimensionMap.dt.entity.http_check" AS testName
FIELDALIAS-testLocation = "data{}.dimensionMap.dt.entity.synthetic_location" AS testLocation

The difficulty with either solution is the data array itself. To properly analyze values by synthetic_location and http_check, you'll want to place the data points in separate events:

sourcetype=smoketest_json_dyn_tcp
| rex max_match=0 "(?:,|{\"data\":\\[)(?<data_point>{(?>[^{}]+|(?1))*})"
| table _time data_point
| mvexpand data_point
| spath input=data_point
| stats avg(values{}) as avg_value by dimensionMap.dt.entity.synthetic_location dimensionMap.dt.entity.http_check

View solution in original post

0 Karma

to4kawa
Ultra Champion

 

index=_internal | head 1 | fields _raw
| eval _raw="{\"data\":[{\"dimensions\":[\"HTTP_CHECK-F009EA2B6AA8E2C0\",\"SYNTHETIC_LOCATION-833A207E28766E49\"],\"dimensionMap\":{\"dt.entity.synthetic_location\":\"SYNTHETIC_LOCATION-833A207E28766E49\",\"dt.entity.http_check\":\"HTTP_CHECK-F009EA2B6AA8E2C0\"},\"timestamps\":[1617467520000],\"values\":[186]},{\"dimensions\":[\"HTTP_CHECK-F06A1F4F9C3252AD\",\"SYNTHETIC_LOCATION-1D85D445F05E239A\"],\"dimensionMap\":{\"dt.entity.synthetic_location\":\"SYNTHETIC_LOCATION-1D85D445F05E239A\",\"dt.entity.http_check\":\"HTTP_CHECK-F06A1F4F9C3252AD\"},\"timestamps\":[1617467520000],\"values\":[187]},{\"dimensions\":[\"HTTP_CHECK-F06A1F4F9C3252AD\",\"SYNTHETIC_LOCATION-833A207E28766E49\"],\"dimensionMap\":{\"dt.entity.synthetic_location\":\"SYNTHETIC_LOCATION-833A207E28766E49\",\"dt.entity.http_check\":\"HTTP_CHECK-F06A1F4F9C3252AD\"},\"timestamps\":[1617467520000],\"values\":[188]}]}"
| spath

 

https://docs.splunk.com/Documentation/Splunk/8.1.3/Knowledge/Configurefieldaliaseswithprops.conf

If indexed_extractions is working properly, then the field alias in props.conf is sufficient.

 

props.conf

[smoketest_json_dyn_tcp]
CHARSET=UTF-8
KV_MODE=json
SHOULD_LINEMERGE=false
category=Structured
disabled=false
pulldown_type=true
TIME_FORMAT=%s%2N
TIME_PREFIX=timestamps\":\[
LINE_BREAKER=(.){"dimensions|}(\])(})
TRANSFORMS-nulls=null1, null2

transforms.conf

[null1]
REGEX={\"data\".*
DEST_KEY=queue
FORMAT=nullQueue
[null2]
REGEX=^}.*
DEST_KEY=queue
FORMAT=nullQueue

This one may also be good.

 

 

0 Karma

beetlegeuse
Explorer

The extractions are working correctly, so I'll apply the field aliases as previously cited by @tscroggins 

Thank you!

0 Karma

tscroggins
Builder

@beetlegeuse 

I would recommend using Dyantrace Add-on for Splunk, but as I recall, it uses the Timeseries v1 API and not the Metrics v2 API.

You're already using INDEXED_EXTRACTIONS = JSON, so the _raw JSON field names should be available directly.

sourcetype=smoketest_json_dyn_tcp "data{}.dimensionMap.dt.entity.synthetic_location"="SYNTHETIC_LOCATION-1D85D445F05E239A"

or:

| tstats count where sourcetype=smoketest_json_dyn_tcp "data{}.dimensionMap.dt.entity.synthetic_location"="SYNTHETIC_LOCATION-833A207E28766E49" "data{}.dimensionMap.dt.entity.http_check"=* by "data{}.dimensionMap.dt.entity.synthetic_location" "data{}.dimensionMap.dt.entity.http_check"

tstats should be exceptionally fast using fields directly, but if you want to use search-time field aliases, you could try e.g.:

# props.conf
[smoketest_json_dyn_tcp]
FIELDALIAS-testName = "data{}.dimensionMap.dt.entity.http_check" AS testName
FIELDALIAS-testLocation = "data{}.dimensionMap.dt.entity.synthetic_location" AS testLocation

The difficulty with either solution is the data array itself. To properly analyze values by synthetic_location and http_check, you'll want to place the data points in separate events:

sourcetype=smoketest_json_dyn_tcp
| rex max_match=0 "(?:,|{\"data\":\\[)(?<data_point>{(?>[^{}]+|(?1))*})"
| table _time data_point
| mvexpand data_point
| spath input=data_point
| stats avg(values{}) as avg_value by dimensionMap.dt.entity.synthetic_location dimensionMap.dt.entity.http_check

View solution in original post

0 Karma

beetlegeuse
Explorer

The JSON indexed extraction is working correctly; I'll apply the field aliases approach to make the field names a bit more human friendly.

Also: Thank you for the SPL snippet that points out the use of "mvexpand". This will be helpful as I create visualizations and monitoring.

Did you miss .conf21 Virtual?

Good news! The event's keynotes and many of its breakout sessions are now available online, and still totally FREE!