Splunk Search

Is it possible to replace null fields at index-time?

bvivi57
Observer

Hi,

I have to search saved as quickly as possible. I CSV indexes whose columns are sometimes empty. I have to put a value by default with the fillnull command because the data is used by external software (Tableau )

The docs say (https://docs.splunk.com/Documentation/ODBC/2.1.0/UseODBC/Troubleshooting 😞

"Null fields are not handled in the same way as you might be used to with other database systems. For example, they might inconsistently appear when you add or remove columns to your query.
This behavior is expected. To prevent this from happening, add functionality to your report (saved search in Splunk Enterprise 5) that gives null fields a constant literal value—for example, the string "Null". This ensures that null fields appear consistently."

But the command fillnull slowed search. So I would like the empty fields or tagged it with a value by default to avoid calling the fillnull order. It is possible?

0 Karma

sjohnson_splunk
Splunk Employee
Splunk Employee

Your props.conf setting looks correct. This operation is performed at index time. Are you pushing it in an app to your indexer or heavy forwarder?

0 Karma

bvivi57
Observer

My apps is on Heavy Forwarder (Windows Server 2012 R2) and on Search Head (Centos 7). I have nothing on my Indexer (Centos 7).

0 Karma

somesoni2
Revered Legend

The fillnull done at search time will cause in-efficient searching and fillnull done at index time will cause in-efficient indexing. If you're willing to do that, you've something called SEDCMD in props.conf (to be put in the sourcetype definition at the indexers), using which you can replace blank values to something suiting your need. (e.g. for your csv data, replacing ,, with ,Null,.

E.g.

[yoursourcetype]
..other settings..
SEDCMD-replaceblanks = s/,,/,Null,/g

LIS
Path Finder

Hello @somesoni2 ,

Thank you for this approach. But it works only when we have one empty value in row, but if not looks like it doesnt replace every value properly.

Example: 

D,2,,200,00,8842,,USA,,1989,,2,320301120086,,,,,19899717024,,,320335100002,,,,,:,,,0,0,0,S,00000000,0,0.0,19899717024,104129,,,0,,,,,

 

Could you please suggest a solution.

 

0 Karma

LIS
Path Finder

solution:

SEDCMD-replaceblanks1 = s/,,/,-,/g
SEDCMD-replaceblanks2 = s/,,/,-,/g
SEDCMD-replaceblanks3 = s/,,/,-,/g
SEDCMD-replaceblanks4 = s/,,/,-,/g
SEDCMD-replaceblanks5 = s/,,/,-,/g
SEDCMD-replaceblanks6 = s/,,/,-,/g
SEDCMD-replaceblanks7 = s/,,/,-,/g
SEDCMD-replaceblanks8 = s/,,/,-,/g

 

0 Karma

isoutamo
SplunkTrust
SplunkTrust

This seems to work in GUI.

| makeresults 
| eval data="D,2,,200,00,8842,,USA,,1989,,2,320301120086,,,,,19899717024,,,320335100002,,,,,:,,,0,0,0,S,00000000,0,0.0,19899717024,104129,,,0,,,,,"
| rex mode=sed field=data "s/,,/,Null,/g"
| rex mode=sed field=data "s/,,/,Null,/g"
| rex mode=sed field=data "s/^,/Null,/g"
| rex mode=sed field=data "s/,$/,Null/g"
| table data

I don't know exact reason why this is needed twice

| rex mode=sed field=data "s/,,/,Null,/g"

 Somehow it related to handling always two continuous characters and this is reason why it needs to run twice. 

These two lines is needed to manage 1st and last pairs (,Null and Null,) correctly.

| rex mode=sed field=data "s/^,/Null,/g"
| rex mode=sed field=data "s/,$/,Null/g"

I think that you could add a new transforms.conf for index time changes based on above? 

0 Karma

bvivi57
Observer

Hi,
Thanks for your help. I can almost my goal. But the fields do not have the value "Null"
I have this configuration on my props.conf

SEDCMD-replaceblanks = s/;;/;Null;/g

Ans the result is
alt text

But I seek to have this result
alt text

0 Karma

somesoni2
Revered Legend

Seems like the field extraction is broken. Could you post the props/transforms in Search Head for your sourcetype?

0 Karma

bvivi57
Observer

Hi,
Thank for your help !

My props.conf :

[csv_report_tab]
DATETIME_CONFIG =
INDEXED_EXTRACTIONS = csv
KV_MODE = none
NO_BINARY_CHECK = true
SHOULD_LINEMERGE = false
TIMESTAMP_FIELDS = date
TIME_FORMAT = %d/%m/%Y
category = Structured
description = "Source type du fichier CSV"
disabled = false
pulldown_type = true
SEDCMD-replaceblanks = s/;;/;Null;/g
TRANSFORMS-id_source = trans_id_source

My transforms.conf

[trans_id_source]
SOURCE_KEY = MetaData:Source
REGEX = ^(?:[^\\\n]*\\){7}\w+_(?P<portefeuille_id>\d+)_(?P<date_trt>\d+)_(?P<id_dollaru>\d+)
FORMAT = portefeuille_id::$1 date_trt::$2 id_dollaru::$3 base::$1"_"$2
WRITE_META = true
0 Karma
Get Updates on the Splunk Community!

Exciting News: The AppDynamics Community Joins Splunk!

Hello Splunkers,   I’d like to introduce myself—I’m Ryan, the former AppDynamics Community Manager, and I’m ...

The All New Performance Insights for Splunk

Splunk gives you amazing tools to analyze system data and make business-critical decisions, react to issues, ...

Good Sourcetype Naming

When it comes to getting data in, one of the earliest decisions made is what to use as a sourcetype. Often, ...