Splunk Search

How to extract in Splunk at index time (with tstats) json field with same child-key from different father-key using regex?

piefragnisp
Explorer

We have to model a regex in order to extract in Splunk (at index time) some fileds from our event. These fields will be used in search using the tstats command. The regex will be used in a configuration file in Splunk settings transformation.conf.

The main aspect of the fields we want extract at index time is that they have the same json key but a different father json-key.

Is it possible modelling this extraction using regex?

This is an example of Splunk event having the structure described before (json by the way):

{
       "info":{
          "eventSource":"",
          "sourceType":"I/O",
          "status":{
             "code":"",
             "msg":"",
             "msgError":""
          },
          "transactionId":null,
          "traceId":null,
          "timestampStart":"2019-05-16T21:30:55.174Z",
          "timestampEnd":"2019-05-16T21:30:55.174Z",
          "companyIDCode":"",
          "channelIDCode":"",
          "branchCode":"",
          "searchFields":{
             "key_3":"value",
             "key_2":"value",
             "key_1":"value"
          },
          "annotation":{},
          "caller":{
             "id":"",
             "version":"",
             "acronym":""
          },
          "called":{
             "id":"",
             "version":"",
             "acronym":""
          },
             "storage":{
                "id":"",
                "start":"",
                "end":""
             }
          }
       },
       "headers":[],
       "payLoad":{
          "input":{
             "encoding":"1024",
             "ccsid":"1024",
             "data":"dati_in"
          },
          "output":{
             "encoding":"1024",
             "ccsid":"1024",
             "data":"dati_out"
          }
       }
    }

The attended result is something like that:

  • calledid -> aaa
  • callerversion -> 1
  • callerid -> bbb

We tried something like that

[calledid] 
REGEX =(?<=called).*"id":"(?P<calledid>.*?)(?=") 
FORMAT = calledid::"$1" 
WRITE_META =true

but it dowsn't work cause it matches until the last id he finds. Such as:

":{"id":"","version":"","acronym":""},"storage":{"id":"

Thanks in advance.

0 Karma
1 Solution

piefragnisp
Explorer

We solve it using these regex:

[called_id]
REGEX = (?:info":{)(?:[\s\S]*?)(?:"called":{)(?:[\s\S]*?)(?:"id":)(?:(?:(?:")([^"]*)(?:"))|(null))(?:(?:[\}])|,)(?:(?:[^}]*)(?:}))
FORMAT = chiamato::"$1"
WRITE_META = true

[caller_id]
REGEX = (?:info":{)(?:[\s\S]*?)(?:"caller":{)(?:[\s\S]*?)(?:"id":)(?:(?:(?:")([^"]*)(?:"))|(null))(?:(?:[\}])|,)(?:(?:[^}]*)(?:}))
FORMAT = chiamante::"$1"
WRITE_META = true

in transforms.conf

TRANSFORMS-callee_id = callee_id
TRANSFORMS-caller_id = caller_id

and enabling the fields in field.conf

Hope will help other people.

View solution in original post

0 Karma

piefragnisp
Explorer

We solve it using these regex:

[called_id]
REGEX = (?:info":{)(?:[\s\S]*?)(?:"called":{)(?:[\s\S]*?)(?:"id":)(?:(?:(?:")([^"]*)(?:"))|(null))(?:(?:[\}])|,)(?:(?:[^}]*)(?:}))
FORMAT = chiamato::"$1"
WRITE_META = true

[caller_id]
REGEX = (?:info":{)(?:[\s\S]*?)(?:"caller":{)(?:[\s\S]*?)(?:"id":)(?:(?:(?:")([^"]*)(?:"))|(null))(?:(?:[\}])|,)(?:(?:[^}]*)(?:}))
FORMAT = chiamante::"$1"
WRITE_META = true

in transforms.conf

TRANSFORMS-callee_id = callee_id
TRANSFORMS-caller_id = caller_id

and enabling the fields in field.conf

Hope will help other people.

0 Karma

wenthold
Communicator

*Update: Based on the data you provided in another comment I tweaked the regex.

I would avoid lookaheads and lookbehinds if possible, especially with a big payload. It's too easy to have an poorly performing or broken regex.

You also don't need to use the FORMAT command in transforms.conf if your regex is formatted to include the field names with the extractions.

You can extract the caller and called ID and version fields with two stanzas, one for caller and one for called.

[callerid]
REGEX = caller\"\s*:\s*\{\s*\"id\":\s*\"(?<callerid>[^\"]+)?\"\,\s*\"version\":\s*\"(?<callerversion>[^\"]+)?\"

[calledid]
REGEX = called\":\s*\{\s*\"id\":\s*\"(?<calledid>[^\"]+)?\"\,\s*\"version\":\s*\"(?<calledversion>[^\"]+)?\"

This was the inline search I used to test it:

| makeresults | eval test="{\"info\": {\"eventSource\": \"RPA\", \"sourceType\": \"I\", \"status\": {\"code\": \"0000\", \"msg\": \"Inizio Schedulazione\", \"msgError\": \"\"}, \"transactionId\": \"66083\", \"traceId\": \"124021\", \"timestampStart\": \"2019-10-16T11:34:00.000Z\", \"timestampEnd\": \"null\", \"companyIDCode\": \"01\", \"channelIDCode\": \"\", \"branchCode\": \"\", \"searchFields\": [{\"VDI\": \"WPVRTM2004\"}, {\"PROCESSO\": \"Assegni\"}], \"annotation\": [{\"TIPO\": \"SCHEDULAZIONE\"}, {\"RISORSE POOL\": \"SI\"}], \"caller\": {\"id\": \"VWFM\", \"version\": \"1\", \"acronym\": \"WRPA0\"}, \"called\": {\"id\": \"Assegni\", \"version\": \"1\", \"acronym\": \"WRPA0\"}}, \"payLoad\": {\"output\": {\"encoding\": \"\", \"ccsid\": \"\", \"data\": \"\"}, \"input\": {\"encoding\": \"\", \"ccsid\": \"\", \"data\": \"\"}}}"
| rex field=test "called\":\s*\{\s*\"id\":\s*\"(?<calledid>[^\"]+)?\"\,\s*\"version\":\s*\"(?<calledversion>[^\"]+)?\""
| rex field=test "caller\"\s*:\s*\{\s*\"id\":\s*\"(?<callerid>[^\"]+)?\"\,\s*\"version\":\s*\"(?<callerversion>[^\"]+)?\""
0 Karma

piefragnisp
Explorer

@wenthold this is how I've configured based on your tweaks:

in transformation.conf

[caller]
REGEX = caller\"\s*:\s*\{\s*\"id\":\s*\"(?<callerid>[^\"]+)?\"\,\s*\"version\":\s*\"(?<callerversion>[^\"]+)?\"
[called]
REGEX = called\":\s*\{\s*\"id\":\s*\"(?<calledid>[^\"]+)?\"\,\s*\"version\":\s*\"(?<calledversion>[^\"]+)?\"

in fields.conf

[caller]
INDEXED = true
[called]
INDEXED = true

in props.conf

[evento_srctyp]
TRANSFORMS-caller = caller
TRANSFORMS-called = called

then I tried to perform a search like

| tstats values where index=nbp_index_application by callerid

but I had no results. Also if I search in the extracted fields I don't found the fields callerid, callerversion, calledid, calledversion

0 Karma

wenthold
Communicator

If you run btool from the command line do you see the configuration items listed?
/opt/splunk/bin/splunk cmd btool props list evento_srctyp --debug | less

Replace /opt/splunk with whatever your $SPLUNK_HOME path is. I don't know if "evento_srctyp" is the actual sourcetype or if you masked it, but "splunk cmd btool props list {SOURCETYPE} --debug" will dump out the configuration.

I wouldn't worry about indexing the field until you have the extractions working. When you can run the search:sourcetype=evento_srctyp | table _time host source callerid callerversion calledid calledversion and get results, then review link text and then work on the indexed extraction.

0 Karma

piefragnisp
Explorer

From the deployment server cli do you mean?

0 Karma

wenthold
Communicator

Use btool on a system where the configuration has been deployed, not on the deployment server.

I would probably test the field extraction part on a search head first, then when the extraction syntax has been deployed I would remove it from the testing search head and deploy it to the indexers or heavy forwarders and work on the indexed extraction.

0 Karma

piefragnisp
Explorer

@wenthold We fixed it adding WRITE_META = true in the transformation.conf; anyway running a search it extract only two fields (keeping the event I gave to you):

  • called (with the caller id): "Assegni"
  • caller (with the called id): "VWFM"

It seems to forget the callerversion and the calledversion

0 Karma

wenthold
Communicator

If two of the fields are working then my first guess would be that it's an issue with whitespacing in the regular expression.

Try running a search that will give you the raw json results and add the following:

| rex field=_raw "caller\"\s*:\s*\{\s*\"id\":\s*\"(?<test_callerid>[^\"]+)?\"\,\s*\"version\"\s*:\s*\"(?<test_callerversion>[^\"]+)?\""

See if you get the test_* fields extracted from the json. If the only thing at this point that needs to be tweaked is the regular expression, you might want to try tweaking it on regex101: https://regex101.com/r/4lGvKg/1

0 Karma
Get Updates on the Splunk Community!

Splunk Enterprise Security 8.0.2 Availability: On cloud and On-premise!

A few months ago, we released Splunk Enterprise Security 8.0 for our cloud customers. Today, we are excited to ...

Logs to Metrics

Logs and Metrics Logs are generally unstructured text or structured events emitted by applications and written ...

Developer Spotlight with Paul Stout

Welcome to our very first developer spotlight release series where we'll feature some awesome Splunk ...