Getting Data In

help parsing mixed unstructured/JSON events coming from DBX

Communicator

Hello, I'm having trouble parsing this events for a client. here is the data route:

1 the data is stored into some DB
2 Splunk brings the data with DBX
3 in the same instance, splunk indexes the data, in this raw format, this data contains fields added from the DBX input, my field of interest is log_json, here is an example of my raw data:

2019-07-12 12:34:38 log_json="{ "Timestamp": "2019-07-12 11:30:15.320","Resultado": { "timestamp": "2019-07-12 11:30:15.320", "idSolicitud": 1168, "path": "/solicitudes/solicitudes" },"Swagger": {"tipoSolicitud":"07","idCanal":"O","estado":"01","segmento":"4000","asistente":"FDCZ","idSucursal":46,"sucursal":"0046","gestionDocumental":"DANY000000005120","fechaCreacion":"2018-01-10 16:57:11","alzada":"Z002","Cliente":[{"idSolicitud":0,"tipoDocumento":"3","numeroDocumento":"123467","relacionDependencia":"1","tipoPersona":"F","nombre":"XXXX","segundoNombre":"J.","apellido":"Fernanez","fechaNacimiento":"1984-01-01 16:57:11","cuil":"XXXXXX","sexo":"F","titular":true,"paisNacimiento":"ARG","nacionalidad":"ARG","residencia":"1","estadoCivil":"1","manejoBienes":"N","telefonoFijo":"4485-XXXX","telefonoCelular":"11XXXXXXX","email":"gaston.XXXXXX","aportaIngreso":"1","empleado":true,"esListaInformados":false,"listaInformadosPorcentaje":1,"esAdicional":false,"poseeResidenciaDistintaArgentina":true,"PEP":{"esPEP":true,"cargo":"Secretario","funcion":"Secretario de Salud","jerarquia":"Ministro\/Secretario"},"FATCA":[{"idFATCA":0,"fechaInicio":"2018-12-18 16:57:11","paisResidencia":"ARG","indicadorCRS":true,"indicadorDDJJ":false,"indicadorFatca":true,"indicadorNIFTIF":false,"indicadorExceptuado":false,"numeroNIFTIF":"1","pais":"AR","ciudad":"La Plata","calle":"XX","numero":0,"numeroIdentificacionFiscal":1}],"SujetoObligado":{"esSujetoObligado":false,"fechaInscripcion":"2018-12-18 16:57:11","fechaVencConstanciaUIF":"2018-12-18 16:57:11","indicadorDDJJ":false,"indicadorInscripcionUIF":false},"Referencia":{"nombreRef":"GASTON","apellidoRef":"TABARES","telefonoRef":"XXXXXX"},"DatosImpositivos":{"condicionesFrenteGanancias":"04","condicionFrenteIVA":"04","condicionIB":"04","numeroInscripcionIB":"XXXXXX","practicaAjusteInflacion":1},"Veraz":{"fechaConsulta":"2018-12-17 16:57:11","nivelSocioEconomico":"1","resultado":"1","compromiso":0E+0,"compromisoSinTarjeta":0E+0,"score":0E+0,"rangoScore":0E+0,"poblacion":"1","aplicaPersonalBank":"1","montoFinanciado":0E+0,"movistar":"NO"},"Direccion":[{"tipoDireccion":"01","contactoTelCaracteristica":"11","contactoTelNumero":"4215-0343","telCaracteristica":"11","telNumero":"XXXXX","orden":1,"calle":"XXXX","numero":"360","piso":"1","dpto":"1","barrio":"Guillermo ","manzana":"9X","lote":"24","torre":"-","entreCalles":"Entre A y B","cp":"1754","provincia":"02","localidad":"San XXXX"}],"DatosFamiliares":[{"parentesco":"CO","nombre":"Conyugue","apellido":"Apelldo","tipoDocumento":"3","numeroDocumento":"123456X78","fechaNacimiento":"1999-11-13 16:57:11","cuil":"27123456785","sexo":"F","ingresoNetosMensuales":4.5E+4,"paisNacimiento":"ARG","nacionalidad":"ARG"},{"parentesco":"PA","nombre":"nombrePadre","apellido":"apellidoPadre","tipoDocumento":"3","numeroDocumento":"12345678","fechaNacimiento":"1999-11-13 16:57:11","cuil":"1","sexo":"M","ingresoNetosMensuales":4.5E+4,"paisNacimiento":"ARG","nacionalidad":"ARG"},{"parentesco":"MA","nombre":"nombreMadre","apellido":"apellidoMadre","tipoDocumento":"3","numeroDocumento":"12345678","fechaNacimiento":"1999-11-13 16:57:11","cuil":"1","sexo":"F","ingresoNetosMensuales":4.5E+4,"paisNacimiento":"ARG","nacionalidad":"ARG"}],"DatosLaborales":[{"actividadPrincipal":"03","acreditaSueldo":true,"grupoPSI":"0","empresa":"EMPRESA X","cuit":XXXXXX1,"ramo":"000XXXXX","cargo":"Desarrollador","actividad":"03","ingresosNetosMensuales":4.5E+4,"fechaIngreso":"2000-01-01 16:57:11","profesion":"0334","otrosIngresos":false,"montoNetoMensual":4.5E+4,"origenFondos":"S"}]}]}}"

what I need to achieve is to see my events in JSON format with highlights and such stuff at index time, a second option could be spath at search time.
I wasnt able to do it so far. I was able to extract a field called json with transforms. but no hightlights at all.

Can you please help me out?

0 Karma
1 Solution

Esteemed Legend

To do spath at search time automatically, just set this in your props.conf on your Search Head:

[<Your sourcetype value here]
KV_MODE = json

But you need to make sure that your _raw is proper JSON so you need this on your indexers, too:

[<Your sourcetype value here]
SEDCMD-strip_non_JSON = s/^[^{]+|[^}]+$//g

View solution in original post

0 Karma

SplunkTrust
SplunkTrust

How about this wonderful regex that only has 80 steps compared to 800,000 (I kid you not) steps from @woodcock 's solution.

SEDCMD-strip_non_JSON_ending = s/^.+?\{(.+)}"$/\1/g

Esteemed Legend

To do spath at search time automatically, just set this in your props.conf on your Search Head:

[<Your sourcetype value here]
KV_MODE = json

But you need to make sure that your _raw is proper JSON so you need this on your indexers, too:

[<Your sourcetype value here]
SEDCMD-strip_non_JSON = s/^[^{]+|[^}]+$//g

View solution in original post

0 Karma

Communicator

Thanks a lot for your answer @woodcock

I was able to achieve the parsing and highlight in search time with this: | rex mode=sed "s/^[^{]+|[^}]$//" at the end of the query.

But still unable to do it at index time (so far I tried adding the data in my lab, with no forwarders, because I can go to my client environment on fridays only).

I put this on my lab instance's(single instance) props file in search app:

[testjson]
SEDCMD-strip_non_JSON = s/^[^{]+|[^}]$//
  KV_MODE = json

but no highlight or strip the non json part is being made. any tip?

thanks alot!

0 Karma

Esteemed Legend

I fixed it thanks to @xpac; I just need to add a trailing g. Now you can do it in one line.

Communicator

Thanks a lot, will try on friday

Esteemed Legend

If you are doing a sourcetype override/overwrite, you must use the ORIGINAL value, NOT the new value, then you must deploy this to the first full instance(s) of Splunk that handles the events (usually either the HF tier, if you use one, or else your Indexer tier), restart all Splunk instances there, send in new events (old events will stay broken), then test using _index_earliest=-5m to be absolutely certain that you are only examining the newly indexed events.

0 Karma

Communicator

Thanks for your answer @woodcock

Im doing |delete with my sample events, then I do a CRCSALT on my local input to re-ingest the events.
So far I managed to get rid of the extra " doing this on my props.conf:

[testjson]
SEDCMD-strip_non_JSON = s/^[^{]+|[^}]$//
SEDCMD-strip_non_JSON_ending = s/"$//
KV_MODE = json

IT works! but I'd like to invoke the sed script just once. I mean, it mustnt be the most performant thing to let the events be modified with sed twice instead of just once.

thanks!

0 Karma

Esteemed Legend

My SEDCMD does the front and back end BOTH at once. Why are you not using mine? I gave you a COMPLETE answer.

0 Karma

Esteemed Legend

In any case, if this got you a working solution, please do come back and click Accept to close it.

0 Karma

Communicator

When I try with this sed:

 SEDCMD-strip_non_JSON = s/^[^{]+|[^}]$//

it does not capture the final " character at the end of the string. the only way I managed to remove the " character is to make another sed sentence, to remove it with:

SEDCMD-strip_non_JSON_ending = s/"$//

along with your sed.

I will mark it as the answer anyway! as always, thanks a lot woodcock!

0 Karma

Esteemed Legend

You are correct; it should do both but doesn't and I am not sure why but I will get to the bottom of it shortly and post a correction here.

0 Karma

Communicator

I found the problem at index time:

this original event:
2019-07-05 16:34:57 timestamp="2019-07-05 16:34:57.497", log_json="{ "Timestamp": "2019-07-05 16:34:57.497","Resultado": { "timestamp": "2019-07-05 16:34:57.497", "idSolicitud": XXXX, "path": "/solicitudes/solicitudes" },"Swagger": {"tipoSolicitud":"08","idCanal":"O","estado":"01","segmento":"4000","asistente":"FDCZ","idSucursal":X6,"sucursal":"004X","gestionDocumental":"DANY0000000XXXX","fechaCreacion":"2018-01-10 16:57:11","alzada":"Z002","Cliente":[{"idSolicitud":0,"tipoDocumento":"3","numeroDocumento":"3XXXXXX","relacionDependencia":"1","tipoPersona":"F","nombre":"DXXXXX","segundoNombre":"J.","apellido":"FerXXez","fechaNacimiento":"XXXX-01-01 16:57:11","cuil":"3XXXXXX","sexo":"F","titular":true,"paisNacimiento":"ARG","nacionalidad":"ARG","residencia":"1","estadoCivil":"1","manejoBienes":"N","telefonoFijo":"4485-XXXX","telefonoCelular":"11324XXXXX","email":"XXXXab@XXX.com.XX","aportaIngreso":"1","empleado":true,"esListaInformados":false,"listaInformadosPorcentaje":1,"esAdicional":false,"poseeResidenciaDistintaArgentina":true,"PEP":{"esPEP":true,"cargo":"Secretario","funcion":"Secretario de Salud","jerarquia":"Ministro\/Secretario"},"FATCA":[{"idFATCA":0,"fechaInicio":"2018-12-18 16:57:11","paisResidencia":"ARG","indicadorCRS":true,"indicadorDDJJ":false,"indicadorFatca":true,"indicadorNIFTIF":false,"indicadorExceptuado":false,"numeroNIFTIF":"1","pais":"AR","ciudad":"La Plata","calle":"45","numero":0,"numeroIdentificacionFiscal":1}],"SujetoObligado":{"esSujetoObligado":false,"fechaInscripcion":"2018-12-18 16:57:11","fechaVencConstanciaUIF":"2018-12-18 16:57:11","indicadorDDJJ":false,"indicadorInscripcionUIF":false},"Referencia":{"nombreRef":"GASTON","apellidoRef":"TABARES","telefonoRef":"XXXXXXXX"},"DatosImpositivos":{"condicionesFrenteGanancias":"04","condicionFrenteIVA":"04","condicionIB":"04","numeroInscripcionIB":"100XXXXX","practicaAjusteInflacion":1},"Veraz":{"fechaConsulta":"2018-12-17 16:57:11","nivelSocioEconomico":"1","resultado":"1","compromiso":0E+0,"compromisoSinTarjeta":0E+0,"score":0E+0,"rangoScore":0E+0,"poblacion":"1","aplicaPersonalBank":"1","montoFinanciado":0E+0,"XXXXar":"NO"},"Direccion":[{"tipoDireccion":"01","contactoTelCaracteristica":"11","contactoTelNumero":"4XXXX3","telCaracteristica":"11","telNumero":"44XXX3","orden":1,"calle":"OCAMPO","numero":"36XXX0","piso":"X","dpto":"1","barrio":"Guillermo ","manzana":"92","lote":"XX","torre":"-","entreCalles":"Entre XXX","cp":"1XXX","provincia":"02","localidad":"San JXXXX"}],"DatosFamiliares":[{"parentesco":"CO","nombre":"Conyugue","apellido":"Apelldo","tipoDocumento":"3","numeroDocumento":"1XXXX","fechaNacimiento":"1999-11-13 16:57:11","cuil":"2XXXXXX5","sexo":"F","ingresoNetosMensuales":4.5E+4,"paisNacimiento":"ARG","nacionalidad":"ARG"},{"parentesco":"PA","nombre":"nombrePadre","apellido":"apellidoPadre","tipoDocumento":"3","numeroDocumento":"123XX","fechaNacimiento":"1999-11-13 16:57:11","cuil":"1","sexo":"M","ingresoNetosMensuales":4.5E+4,"paisNacimiento":"ARG","nacionalidad":"ARG"},{"parentesco":"MA","nombre":"nombreMadre","apellido":"apellidoMadre","tipoDocumento":"3","numeroDocumento":"1XXXX678","fechaNacimiento":"1999-11-13 16:57:11","cuil":"1","sexo":"X","ingresoNetosMensuales":4.5E+4,"paisNacimiento":"ARG","nacionalidad":"ARG"}],"DatosLaborales":[{"actividadPrincipal":"03","acreditaSueldo":true,"grupoPSI":"0","empresa":"EMPRESA X","cuit":XXXX991,"ramo":"000XXXX","cargo":"Desarrollador","actividad":"03","ingresosNetosMensuales":4.5E+4,"fechaIngreso":"2000-01-01 16:57:11","profesion":"0334","otrosIngresos":false,"montoNetoMensual":4.5E+4,"origenFondos":"S"}]}]}}"

becomes this after the ingest and the sedcmd i put up there:

{ "Timestamp": "2019-07-05 16:34:57.497","Resultado": { "timestamp": "2019-07-05 16:34:57.497", "idSolicitud": XXXX, "path": "/solicitudes/solicitudes" },"Swagger": {"tipoSolicitud":"08","idCanal":"O","estado":"01","segmento":"4000","asistente":"FDCZ","idSucursal":X6,"sucursal":"004X","gestionDocumental":"DANY0000000XXXX","fechaCreacion":"2018-01-10 16:57:11","alzada":"Z002","Cliente":[{"idSolicitud":0,"tipoDocumento":"3","numeroDocumento":"3XXXXXX","relacionDependencia":"1","tipoPersona":"F","nombre":"DXXXXX","segundoNombre":"J.","apellido":"FerXXez","fechaNacimiento":"XXXX-01-01 16:57:11","cuil":"3XXXXXX","sexo":"F","titular":true,"paisNacimiento":"ARG","nacionalidad":"ARG","residencia":"1","estadoCivil":"1","manejoBienes":"N","telefonoFijo":"4485-XXXX","telefonoCelular":"11324XXXXX","email":"XXXXab@XXX.com.XX","aportaIngreso":"1","empleado":true,"esListaInformados":false,"listaInformadosPorcentaje":1,"esAdicional":false,"poseeResidenciaDistintaArgentina":true,"PEP":{"esPEP":true,"cargo":"Secretario","funcion":"Secretario de Salud","jerarquia":"Ministro\/Secretario"},"FATCA":[{"idFATCA":0,"fechaInicio":"2018-12-18 16:57:11","paisResidencia":"ARG","indicadorCRS":true,"indicadorDDJJ":false,"indicadorFatca":true,"indicadorNIFTIF":false,"indicadorExceptuado":false,"numeroNIFTIF":"1","pais":"AR","ciudad":"La Plata","calle":"45","numero":0,"numeroIdentificacionFiscal":1}],"SujetoObligado":{"esSujetoObligado":false,"fechaInscripcion":"2018-12-18 16:57:11","fechaVencConstanciaUIF":"2018-12-18 16:57:11","indicadorDDJJ":false,"indicadorInscripcionUIF":false},"Referencia":{"nombreRef":"GASTON","apellidoRef":"TABARES","telefonoRef":"XXXXXXXX"},"DatosImpositivos":{"condicionesFrenteGanancias":"04","condicionFrenteIVA":"04","condicionIB":"04","numeroInscripcionIB":"100XXXXX","practicaAjusteInflacion":1},"Veraz":{"fechaConsulta":"2018-12-17 16:57:11","nivelSocioEconomico":"1","resultado":"1","compromiso":0E+0,"compromisoSinTarjeta":0E+0,"score":0E+0,"rangoScore":0E+0,"poblacion":"1","aplicaPersonalBank":"1","montoFinanciado":0E+0,"XXXXar":"NO"},"Direccion":[{"tipoDireccion":"01","contactoTelCaracteristica":"11","contactoTelNumero":"4XXXX3","telCaracteristica":"11","telNumero":"44XXX3","orden":1,"calle":"OCAMPO","numero":"36XXX0","piso":"X","dpto":"1","barrio":"Guillermo ","manzana":"92","lote":"XX","torre":"-","entreCalles":"Entre XXX","cp":"1XXX","provincia":"02","localidad":"San JXXXX"}],"DatosFamiliares":[{"parentesco":"CO","nombre":"Conyugue","apellido":"Apelldo","tipoDocumento":"3","numeroDocumento":"1XXXX","fechaNacimiento":"1999-11-13 16:57:11","cuil":"2XXXXXX5","sexo":"F","ingresoNetosMensuales":4.5E+4,"paisNacimiento":"ARG","nacionalidad":"ARG"},{"parentesco":"PA","nombre":"nombrePadre","apellido":"apellidoPadre","tipoDocumento":"3","numeroDocumento":"123XX","fechaNacimiento":"1999-11-13 16:57:11","cuil":"1","sexo":"M","ingresoNetosMensuales":4.5E+4,"paisNacimiento":"ARG","nacionalidad":"ARG"},{"parentesco":"MA","nombre":"nombreMadre","apellido":"apellidoMadre","tipoDocumento":"3","numeroDocumento":"1XXXX678","fechaNacimiento":"1999-11-13 16:57:11","cuil":"1","sexo":"X","ingresoNetosMensuales":4.5E+4,"paisNacimiento":"ARG","nacionalidad":"ARG"}],"DatosLaborales":[{"actividadPrincipal":"03","acreditaSueldo":true,"grupoPSI":"0","empresa":"EMPRESA X","cuit":XXXX991,"ramo":"000XXXX","cargo":"Desarrollador","actividad":"03","ingresosNetosMensuales":4.5E+4,"fechaIngreso":"2000-01-01 16:57:11","profesion":"0334","otrosIngresos":false,"montoNetoMensual":4.5E+4,"origenFondos":"S"}]}]}}"

Notice the extra " at the end of the event. At search time it works like a charm, in index time the events are left with the extra " that breaks the json highlight

0 Karma