Splunk Search

How to extract in Splunk at index time (with tstats) json field with same child-key from different father-key using regex?

piefragnisp
Explorer

We have to model a regex in order to extract in Splunk (at index time) some fileds from our event. These fields will be used in search using the tstats command. The regex will be used in a configuration file in Splunk settings transformation.conf.

The main aspect of the fields we want extract at index time is that they have the same json key but a different father json-key.

Is it possible modelling this extraction using regex?

This is an example of Splunk event having the structure described before (json by the way):

{
       "info":{
          "eventSource":"",
          "sourceType":"I/O",
          "status":{
             "code":"",
             "msg":"",
             "msgError":""
          },
          "transactionId":null,
          "traceId":null,
          "timestampStart":"2019-05-16T21:30:55.174Z",
          "timestampEnd":"2019-05-16T21:30:55.174Z",
          "companyIDCode":"",
          "channelIDCode":"",
          "branchCode":"",
          "searchFields":{
             "key_3":"value",
             "key_2":"value",
             "key_1":"value"
          },
          "annotation":{},
          "caller":{
             "id":"",
             "version":"",
             "acronym":""
          },
          "called":{
             "id":"",
             "version":"",
             "acronym":""
          },
             "storage":{
                "id":"",
                "start":"",
                "end":""
             }
          }
       },
       "headers":[],
       "payLoad":{
          "input":{
             "encoding":"1024",
             "ccsid":"1024",
             "data":"dati_in"
          },
          "output":{
             "encoding":"1024",
             "ccsid":"1024",
             "data":"dati_out"
          }
       }
    }

The attended result is something like that:

  • calledid -> aaa
  • callerversion -> 1
  • callerid -> bbb

We tried something like that

[calledid] 
REGEX =(?<=called).*"id":"(?P<calledid>.*?)(?=") 
FORMAT = calledid::"$1" 
WRITE_META =true

but it dowsn't work cause it matches until the last id he finds. Such as:

":{"id":"","version":"","acronym":""},"storage":{"id":"

Thanks in advance.

0 Karma
1 Solution

piefragnisp
Explorer

We solve it using these regex:

[called_id]
REGEX = (?:info":{)(?:[\s\S]*?)(?:"called":{)(?:[\s\S]*?)(?:"id":)(?:(?:(?:")([^"]*)(?:"))|(null))(?:(?:[\}])|,)(?:(?:[^}]*)(?:}))
FORMAT = chiamato::"$1"
WRITE_META = true

[caller_id]
REGEX = (?:info":{)(?:[\s\S]*?)(?:"caller":{)(?:[\s\S]*?)(?:"id":)(?:(?:(?:")([^"]*)(?:"))|(null))(?:(?:[\}])|,)(?:(?:[^}]*)(?:}))
FORMAT = chiamante::"$1"
WRITE_META = true

in transforms.conf

TRANSFORMS-callee_id = callee_id
TRANSFORMS-caller_id = caller_id

and enabling the fields in field.conf

Hope will help other people.

View solution in original post

0 Karma

piefragnisp
Explorer

We solve it using these regex:

[called_id]
REGEX = (?:info":{)(?:[\s\S]*?)(?:"called":{)(?:[\s\S]*?)(?:"id":)(?:(?:(?:")([^"]*)(?:"))|(null))(?:(?:[\}])|,)(?:(?:[^}]*)(?:}))
FORMAT = chiamato::"$1"
WRITE_META = true

[caller_id]
REGEX = (?:info":{)(?:[\s\S]*?)(?:"caller":{)(?:[\s\S]*?)(?:"id":)(?:(?:(?:")([^"]*)(?:"))|(null))(?:(?:[\}])|,)(?:(?:[^}]*)(?:}))
FORMAT = chiamante::"$1"
WRITE_META = true

in transforms.conf

TRANSFORMS-callee_id = callee_id
TRANSFORMS-caller_id = caller_id

and enabling the fields in field.conf

Hope will help other people.

0 Karma

wenthold
Communicator

*Update: Based on the data you provided in another comment I tweaked the regex.

I would avoid lookaheads and lookbehinds if possible, especially with a big payload. It's too easy to have an poorly performing or broken regex.

You also don't need to use the FORMAT command in transforms.conf if your regex is formatted to include the field names with the extractions.

You can extract the caller and called ID and version fields with two stanzas, one for caller and one for called.

[callerid]
REGEX = caller\"\s*:\s*\{\s*\"id\":\s*\"(?<callerid>[^\"]+)?\"\,\s*\"version\":\s*\"(?<callerversion>[^\"]+)?\"

[calledid]
REGEX = called\":\s*\{\s*\"id\":\s*\"(?<calledid>[^\"]+)?\"\,\s*\"version\":\s*\"(?<calledversion>[^\"]+)?\"

This was the inline search I used to test it:

| makeresults | eval test="{\"info\": {\"eventSource\": \"RPA\", \"sourceType\": \"I\", \"status\": {\"code\": \"0000\", \"msg\": \"Inizio Schedulazione\", \"msgError\": \"\"}, \"transactionId\": \"66083\", \"traceId\": \"124021\", \"timestampStart\": \"2019-10-16T11:34:00.000Z\", \"timestampEnd\": \"null\", \"companyIDCode\": \"01\", \"channelIDCode\": \"\", \"branchCode\": \"\", \"searchFields\": [{\"VDI\": \"WPVRTM2004\"}, {\"PROCESSO\": \"Assegni\"}], \"annotation\": [{\"TIPO\": \"SCHEDULAZIONE\"}, {\"RISORSE POOL\": \"SI\"}], \"caller\": {\"id\": \"VWFM\", \"version\": \"1\", \"acronym\": \"WRPA0\"}, \"called\": {\"id\": \"Assegni\", \"version\": \"1\", \"acronym\": \"WRPA0\"}}, \"payLoad\": {\"output\": {\"encoding\": \"\", \"ccsid\": \"\", \"data\": \"\"}, \"input\": {\"encoding\": \"\", \"ccsid\": \"\", \"data\": \"\"}}}"
| rex field=test "called\":\s*\{\s*\"id\":\s*\"(?<calledid>[^\"]+)?\"\,\s*\"version\":\s*\"(?<calledversion>[^\"]+)?\""
| rex field=test "caller\"\s*:\s*\{\s*\"id\":\s*\"(?<callerid>[^\"]+)?\"\,\s*\"version\":\s*\"(?<callerversion>[^\"]+)?\""
0 Karma

piefragnisp
Explorer

@wenthold this is how I've configured based on your tweaks:

in transformation.conf

[caller]
REGEX = caller\"\s*:\s*\{\s*\"id\":\s*\"(?<callerid>[^\"]+)?\"\,\s*\"version\":\s*\"(?<callerversion>[^\"]+)?\"
[called]
REGEX = called\":\s*\{\s*\"id\":\s*\"(?<calledid>[^\"]+)?\"\,\s*\"version\":\s*\"(?<calledversion>[^\"]+)?\"

in fields.conf

[caller]
INDEXED = true
[called]
INDEXED = true

in props.conf

[evento_srctyp]
TRANSFORMS-caller = caller
TRANSFORMS-called = called

then I tried to perform a search like

| tstats values where index=nbp_index_application by callerid

but I had no results. Also if I search in the extracted fields I don't found the fields callerid, callerversion, calledid, calledversion

0 Karma

wenthold
Communicator

If you run btool from the command line do you see the configuration items listed?
/opt/splunk/bin/splunk cmd btool props list evento_srctyp --debug | less

Replace /opt/splunk with whatever your $SPLUNK_HOME path is. I don't know if "evento_srctyp" is the actual sourcetype or if you masked it, but "splunk cmd btool props list {SOURCETYPE} --debug" will dump out the configuration.

I wouldn't worry about indexing the field until you have the extractions working. When you can run the search:sourcetype=evento_srctyp | table _time host source callerid callerversion calledid calledversion and get results, then review link text and then work on the indexed extraction.

0 Karma

piefragnisp
Explorer

From the deployment server cli do you mean?

0 Karma

wenthold
Communicator

Use btool on a system where the configuration has been deployed, not on the deployment server.

I would probably test the field extraction part on a search head first, then when the extraction syntax has been deployed I would remove it from the testing search head and deploy it to the indexers or heavy forwarders and work on the indexed extraction.

0 Karma

piefragnisp
Explorer

@wenthold We fixed it adding WRITE_META = true in the transformation.conf; anyway running a search it extract only two fields (keeping the event I gave to you):

  • called (with the caller id): "Assegni"
  • caller (with the called id): "VWFM"

It seems to forget the callerversion and the calledversion

0 Karma

wenthold
Communicator

If two of the fields are working then my first guess would be that it's an issue with whitespacing in the regular expression.

Try running a search that will give you the raw json results and add the following:

| rex field=_raw "caller\"\s*:\s*\{\s*\"id\":\s*\"(?<test_callerid>[^\"]+)?\"\,\s*\"version\"\s*:\s*\"(?<test_callerversion>[^\"]+)?\""

See if you get the test_* fields extracted from the json. If the only thing at this point that needs to be tweaked is the regular expression, you might want to try tweaking it on regex101: https://regex101.com/r/4lGvKg/1

0 Karma
Get Updates on the Splunk Community!

Thanks for the Memories! Splunk University, .conf24, and Community Connections

Thank you to everyone in the Splunk Community who joined us for .conf24 – starting with Splunk University and ...

.conf24 | Day 0

Hello Splunk Community! My name is Chris, and I'm based in Canberra, Australia's capital, and I travelled for ...

Enhance Security Visibility with Splunk Enterprise Security 7.1 through Threat ...

 (view in My Videos)Struggling with alert fatigue, lack of context, and prioritization around security ...