Splunk Search

How to check extracted field data are consistent across the events and it can be used for data normalization?

Hemnaath
Motivator

Hi All,

Hey I had couple of  fields extracted and most of the field values are Null and contains lesser field value captured in it.

Example:

raw data 

{"name":"X-ABC-ConversationID","value":"79xxxxxxxxxxxxxxxxf76"}
{"name":"ABC-ConversationID","value":"3xxxxxxxxxxxxxxxxxxxxxb7a"}
{"name":"abc-conversationid","value":"cxxxxxxxxxxxxxxxxxee993d"}

 

Query: 

index=xxx sourcetype=xxx:xxx:xxx httpsourcename=xxx
| rex field=_raw "\{\"name\"\:\"(xxx|xxx|X)\-(ConversationID|conversationid|xxx)\-?(ConversationID|\")?(\"|,)(\"|,)\"value\"\:\"(?<xxx_ConversationID>[^\"]+)"
| table xxx_ConversationID | fillnull value=NULL

When I checked the statistic view , I could see that most of them have "NULL" value only few fields are filled with the exact value.

I do understand that there is a possibility that only few field might be having the value, but I wanted to check what is the percentage of values captured for that fields, so I followed the below steps.

In Splunk, I selected  the verbose mode and field name, executed the query for last 30 days and when checked the Percentage for the selected field I could see the below value 

Total Events captured for 30 days is 13,628,581 (All Events).

For selected extracted field value

24 Values, 0.732% of events

Question:

1) The Percentage of the event is less than 1% , so  whether should I consider this field value for data normalization .

2) Is there a query to find the amount of data captured for the extracted field containing the value alone, excluding the NULL value.

I had used the below  query for finding the unique value and their count.

index=xxx sourcetype=xxx:xxx:xxx httpsourcename=xxx
| rex field=_raw "\{\"name\"\:\"(xxx|xxx|X)\-(ConversationID|conversationid|xxx)\-?(ConversationID|\")?(\"|,)(\"|,)\"value\"\:\"(?<xxx_ConversationID>[^\"]+)"
| table xxx_ConversationID | fillnull value=NULL | where xxx_ConversationID!="NULL" | stats values(xxx_ConversationID),count(xxx_ConversationID)

But when I want to find for 36 fields, I am not sure how to write a query to check the same, can you guide me on the query please.

Thanks in advance. 

 

 

Labels (2)
Tags (2)
0 Karma

tscroggins
Builder

@Hemnaath 

For arbitrary JSON data, a simple way to summarize field values may be:

index=xxx sourcetype=xxx:xxx:xxx httpsourcename=xxx
| spath
| fieldsummary

0 Karma
Take the 2021 Splunk Career Survey

Help us learn about how Splunk has
impacted your career by taking the 2021 Splunk Career Survey.

Earn $50 in Amazon cash!