How to convert a json into table with some unstruc...

jonthree · ‎03-24-2021

So I have two different services where an API call starts from service A and propagates to service B. I want to trace the errors for this and creating a dashboard to show the consolidated errors.

Logs are as follows.

Service A logs:

10.0.9.456 - - 23/Mar/2021:17:29:52 +0000 "POST Error occured in service A status 400 bad request referenceid 1615 msg Some bad request error occured in application B status 400 url /test/user/myuserfield/authorize?service=myservicename&serviceT=myserviceTypeid

Service B logs:

{
  "userId": "/myuserfield",
  "transactionId": "abcd",
  "timestamp": "2021-03-24T15:41:25.770Z",
  "eventName": "myevent",
  "component": "mycomponent",
  "response": {
    "statusCode": "400",
    "detail": {
      "reason": "Bad Request"
    }
  },
  "http": {
    "request": {
      "method": "POST",
      "path": "http://dummyurl",
      "queryParameters": {
        "serviceId": [
          "myservicename"
        ],
        "serviceType": [
          "myservicetype"
        ]
      }
    }
  }
}

So the mapping of fields between these two service is as follows:

Service A Field	Service B Field
status	statusCode
service	serviceId
serviceType	serviceT
user	userId

I have tried to use subsearch extract fields from service A :

index=*serviceB*
| spath | rename userId as user, http.request.queryParameters.serviceId{} as service, http.request.queryParameters.serviceType{} as serviceT
| search [search index=*serviceA* | rex "/test/user(?<user>/\w+)+/authorize.*\?+service+\=(?<service>\w+)+\&+serviceT\=(?<serviceT>.*)\"" | dedup user service serviceT
| fields user service serviceT]

Above expression provides me the logs of Service B which are propagated from service A. What i want now is to display the data in Tabular format for better readability. So i have two questions:

1. Am i going with the right approach by using subsearch here ? Is the expression seems to be correct and best possible solution?

2. Above expression provides me different error logs for different users in json format. How do i convert these to tabular format having userId, time, status etc. ? I also want to filter my table based on multiple status filters (like 4XX, 5XX etc)..how to achieve that?

bowesmana · ‎03-24-2021

Your approach of using service A results to filter the outer search of service B is fine, but anywhere you have subsearches, you may need to consider the size of results coming back from the subsearch to constrain the outer one. If the subsearch result set is small then this is probably a good way to filter service B search.

There are a number of possible solutions to 'merge' two or more data sets. I tend to favour using stats to combine the data types, as that does not have any subsearch limitations. It's not always appropriate, as it will often depend on the data set sizes you're searching but uses the principle of

search (data_set_1_characteristics) OR (data_set_2_characteristics)
| eval user=if(DS1,user_from_DS1, user_from_DS2), ...
| eval .... further conditional eval to munge data ....
| stats values(*) as * by X Y Z

and then after you have the merged data sets, you then manipulate into the form needed to visualise.

If you have used the spath expression in your example, then you have the JSON extracted fields, so you can table them as needed.

As for filtering on status, I'm assuming this is via a dashboard. Then you need to set up the appropriate inputs, either with dropdowns with fixed status values, or populated from values in the data and then in your queries, use the tokens in where or search clauses as part of your query.

See the XML guide and token usage in dashboards to help you with dashboards

https://docs.splunk.com/Documentation/Splunk/8.1.2/Viz/PanelreferenceforSimplifiedXML

https://docs.splunk.com/Documentation/Splunk/8.1.2/Viz/tokens

jonthree · ‎03-30-2021

I want to fetch logs of only service B based on the userid(which is common field) fetched from service A. I don't want to merge these two data sets.

Also, as i am new to this...can you please give an example on this?

Like after extracting fields from rex, how do i use them in eval to join the conditions.

bowesmana · ‎03-30-2021

Your existing search is already giving you the data, so you just need to table the data as you need it. Look at your list of fields if you run the search in verbose mode. You already seem to know what you are doing having used spath and renaming the fields.

You say your search is giving you the logs you need, so just use the following command to put these into a table.

| table _time user service serviceT response.statusCode

jonthree · ‎03-30-2021

Thanks..one last problem that i am facing is this:

When i convert the data into tabular format, i am getting double values in my table:

time	userid	status
2021-03-29 18:46:05	testuserid testuserid	401 401

I have used below query to extract the results:

index=*serviceB*
| spath | rename userId as user, http.request.queryParameters.serviceId{} as service, http.request.queryParameters.serviceType{} as serviceT, rename statusCode as status 
| search [search index=*serviceA* | rex "Status/s+(?<status>/d+)/test/user(?<user>/\w+)+/authorize.*\?+service+\=(?<service>\w+)+\&+serviceT\=(?<serviceT>.*)\"" | dedup user service serviceT status
| fields user service serviceT status]| table _time user status

Does using spath alone extracts all the fields again here ? I am not using spatch individually for these fields.

Or is there any other reason?

bowesmana · ‎03-30-2021

For the time being you can always add

| eval user=mvdedup(user), status=mvdedup(status)

to remove the duplicates. If the fields are already index (you can check this by running index=serviceB and looking at the verbose fields list) then there is no need to run the spath statement, which will probably also fix your issue.

bowesmana · ‎03-30-2021

I suspect it depends on how you are ingesting your JSON data for serviceB. If you are using INDEXED_EXTRACTIONS=json then that's most likely the reason. In that case, your indexing process has already extracted all the JSON fields AND indexed them. Doing the spath with then auto extract all fields in the first 5000 bytes of JSON, so probably now creating multi value fields.

Do you really need to INDEX all fields. That will increase the size of your index and is not necessary.

How to convert a json into table with some unstructured data?

regex

subsearch

table

Operationalizing TDIR: Building a More Resilient, Scalable SOC

Pro Tips for First-Time .conf Attendees: Advice from SplunkTrust

Raise Your Skills at the .conf25 Builder Bar: Your Splunk Developer Destination

Are you a member of the Splunk Community?