Hi,
I'm trying to pull a multivalue field from a JSON array to get statistics from it. The data looks as follows:
"APIHEADERRESPONSE":{...specific information...
},
"SERVICES": [
{
"HTTPStatus":"200",
"ServRespCode":"Success",
"ServRespTime":"1200",
"ServRespCached":"Y",
"ServiceShortName":"Service1",
"DataSource":"Source1",
"ServiceURI":"Service1Url"
},
{
"HTTPStatus":"200",
"ServRespCode":"Success",
"ServRespTime":"1200",
"ServRespCached":"Y",
"ServiceShortName":"Service2",
"DataSource":"Service2Source",
"ServiceURI":"Service2URL"
},
{
"HTTPStatus":"200",
"ServRespCode":"Success",
"ServRespTime":"12",
"ServRespCached":"Y",
"ServiceShortName":"Service3",
"DataSource":"Service3Source",
"ServiceURI":"Service3URL"
}
]
Our data follows the structure of an initial block of identifying information, followed by specific details of further backend calls our APIs make. We are printing the results of those backend calls into an array in the JSON (services block) we print out to splunk.
However, it is not a standardized amount of elements within the array. We can have 1 to many different services printed, and it varies from event to event. Further, the order is not normalized either, so we have no idea which service will print first.
Is there a way for us to extract the details of a specific service without using regex? If we had multiple events, how would I get something like the median response time of service1? Is there a way we can use SPATH to get this information without relying on the expensive nature of regex?
Give this a try (everything before spath
is to generate sample data, replace it with your base search)
| gentimes start=-1 | eval _raw="{
\"SERVICES\": [
{
\"HTTPStatus\":\"200\", \"ServRespCode\":\"Success\", \"ServRespTime\":\"1200\", \"ServRespCached\":\"Y\", \"ServiceShortName\":\"Service1\", \"DataSource\":\"Source1\", \"ServiceURI\":\"Service1Url\"
},
{
\"HTTPStatus\":\"200\", \"ServRespCode\":\"Success\", \"ServRespTime\":\"1200\", \"ServRespCached\":\"Y\", \"ServiceShortName\":\"Service2\", \"DataSource\":\"Service2Source\",
\"ServiceURI\":\"Service2URL\"
},
{
\"HTTPStatus\":\"200\", \"ServRespCode\":\"Success\", \"ServRespTime\":\"12\", \"ServRespCached\":\"Y\", \"ServiceShortName\":\"Service3\", \"DataSource\":\"Service3Source\",
\"ServiceURI\":\"Service3URL\"
}
]
}
"
| spath | rename SERVICES{}.* as *| eval temp=mvzip(HTTPStatus,mvzip(ServRespCode,mvzip(ServRespTime,mvzip(ServiceShortName,mvzip(DataSource,ServiceURI,"#"),"#"),"#"),"#"),"#")| mvexpand temp | rex field=temp "(?<HTTPStatus>.+)#(?<ServRespCode>.+)#(?<ServRespTime>.+)#(?<ServiceShortName>.+)#(?<DataSource>.+)#(?<ServiceURI>.+)" | fields - temp
Give this a try (everything before spath
is to generate sample data, replace it with your base search)
| gentimes start=-1 | eval _raw="{
\"SERVICES\": [
{
\"HTTPStatus\":\"200\", \"ServRespCode\":\"Success\", \"ServRespTime\":\"1200\", \"ServRespCached\":\"Y\", \"ServiceShortName\":\"Service1\", \"DataSource\":\"Source1\", \"ServiceURI\":\"Service1Url\"
},
{
\"HTTPStatus\":\"200\", \"ServRespCode\":\"Success\", \"ServRespTime\":\"1200\", \"ServRespCached\":\"Y\", \"ServiceShortName\":\"Service2\", \"DataSource\":\"Service2Source\",
\"ServiceURI\":\"Service2URL\"
},
{
\"HTTPStatus\":\"200\", \"ServRespCode\":\"Success\", \"ServRespTime\":\"12\", \"ServRespCached\":\"Y\", \"ServiceShortName\":\"Service3\", \"DataSource\":\"Service3Source\",
\"ServiceURI\":\"Service3URL\"
}
]
}
"
| spath | rename SERVICES{}.* as *| eval temp=mvzip(HTTPStatus,mvzip(ServRespCode,mvzip(ServRespTime,mvzip(ServiceShortName,mvzip(DataSource,ServiceURI,"#"),"#"),"#"),"#"),"#")| mvexpand temp | rex field=temp "(?<HTTPStatus>.+)#(?<ServRespCode>.+)#(?<ServRespTime>.+)#(?<ServiceShortName>.+)#(?<DataSource>.+)#(?<ServiceURI>.+)" | fields - temp
Hi,
Thanks for the response. That doesn't get quite what I need...in fact it seems to result in duplicating events? However, I was able to build off of what you gave me and various other questions(which actually led me to other answers you have given!) to get this query:
initial search | table SERVICES{}.* | eval respTable=mvzip('SERVICES{}.ServRespTime', 'SERVICES{}.ServiceShortName', "&&") | eval ResponseTime=mvindex(split(mvfilter(match(respTable,".*&&Service1Name")),"&&"),0) | table ResponseTime
Since I don't use spath, are there any concerns with the cost of using this type of query extensively?
With just the base search ran on Smart Mode, what all fields do you see(either on the left field sidebar OR by adding | fieldsummary
in your base search)??
I can see all of the fields inside the service array properly - they are labeled like:
Services{}.DataSource
Services{}.ServiceRespTime
etc.
Splunk is correctly identifying the fields as multivalue - it captures multiple response times and lists them all under Services{}.ServiceRespTime.
However, there is nothing in the fields that seem to indicate which block each response time corresponds to. So while it captures all the response times, I don't know how to(without using regex) get the response time of service1 specifically.