Hi All,
Hey I had couple of fields extracted and most of the field values are Null and contains lesser field value captured in it.
Example:
raw data
{"name":"X-ABC-ConversationID","value":"79xxxxxxxxxxxxxxxxf76"} |
{"name":"ABC-ConversationID","value":"3xxxxxxxxxxxxxxxxxxxxxb7a"} |
{"name":"abc-conversationid","value":"cxxxxxxxxxxxxxxxxxee993d"} |
Query:
index=xxx sourcetype=xxx:xxx:xxx httpsourcename=xxx
| rex field=_raw "\{\"name\"\:\"(xxx|xxx|X)\-(ConversationID|conversationid|xxx)\-?(ConversationID|\")?(\"|,)(\"|,)\"value\"\:\"(?<xxx_ConversationID>[^\"]+)"
| table xxx_ConversationID | fillnull value=NULL
When I checked the statistic view , I could see that most of them have "NULL" value only few fields are filled with the exact value.
I do understand that there is a possibility that only few field might be having the value, but I wanted to check what is the percentage of values captured for that fields, so I followed the below steps.
In Splunk, I selected the verbose mode and field name, executed the query for last 30 days and when checked the Percentage for the selected field I could see the below value
Total Events captured for 30 days is 13,628,581 (All Events).
For selected extracted field value
24 Values, 0.732% of events
Question:
1) The Percentage of the event is less than 1% , so whether should I consider this field value for data normalization .
2) Is there a query to find the amount of data captured for the extracted field containing the value alone, excluding the NULL value.
I had used the below query for finding the unique value and their count.
index=xxx sourcetype=xxx:xxx:xxx httpsourcename=xxx
| rex field=_raw "\{\"name\"\:\"(xxx|xxx|X)\-(ConversationID|conversationid|xxx)\-?(ConversationID|\")?(\"|,)(\"|,)\"value\"\:\"(?<xxx_ConversationID>[^\"]+)"
| table xxx_ConversationID | fillnull value=NULL | where xxx_ConversationID!="NULL" | stats values(xxx_ConversationID),count(xxx_ConversationID)
But when I want to find for 36 fields, I am not sure how to write a query to check the same, can you guide me on the query please.
Thanks in advance.
For arbitrary JSON data, a simple way to summarize field values may be:
index=xxx sourcetype=xxx:xxx:xxx httpsourcename=xxx
| spath
| fieldsummary