Getting Data In

One record, two values per field?

aaron_sakovich
Path Finder

We've been acquiring data for some time now via manual imports with CSV files. We're finishing up the process of automating that by importing JSON on a cron schedule. So far, it's been going simply great. Today, we hit a snag.
We have a source that has multiple date or date-time fields in it, so in order to ensure we get the right field to be used as the timestamp, we created a new sourcetype called dateTimeJSON that specifies the TimeStamp field as "DateDeleted", the field we're looking for.
If we search the index, we see the data and the correct number of event counts (13 in this test case). However, when we look at the data in a table, each field has two values in it -- this results in double the results in all of our searches and dashboards. Here's what we see from a search as simple as "index=koha_dcards":
alt text
What the heck can even cause something like this? How do we rectify it? All of our other indices and data inputs have never had a problem like this, and we've had to spec fields in the sourcetype before without issue.
I'm gobsmacked...

0 Karma
1 Solution

aaron_sakovich
Path Finder

Adding that one line to the local/props.conf worked. Thanks!!

View solution in original post

0 Karma

aaron_sakovich
Path Finder

Adding that one line to the local/props.conf worked. Thanks!!

0 Karma

aaron_sakovich
Path Finder

Would this setting have any pertinence? Should it be "none" instead of "json"? Will that accomplish the same as setting the AUTO_KV_JSON, but for this one sourcetype only?
alt text

0 Karma

maciep
Champion

You can certainly try to set indexed extractions to none. That may affect other things you are configuring though, like the Timestamp fields. Because if you don't have those fields when the data is ingested into Splunk, you probably can't reference it like you're doing now as the timestamp field. Indexed Extractions is sort of an easy button for parsing data (getting data into Splunk)

The other option would be to set the auto kv json setting to false for this sourcetype in props.conf. This would just turn off the extractions that try to run at search time.

[dateTimeJSON]
AUTO_KV_JSON = false

If you understand parse time and search time yet, i'd just suggest reading this wiki article

https://wiki.splunk.com/Where_do_I_configure_my_Splunk_settings%3F

maciep
Champion

not sure if something is different with 8.x, but this typically means that KV_MODE=json is set for this sourcetype somehow. So you the fields get indexed with indexed_extractions setting but then also extracted at search time with the kv_mode setting.

maybe at least rule it out with btool on your search head??

splunk btool props list dateTimeJSON --debug

And if not set on the sourcetype, maybe make sure it's not set in props for the source/host either?

aaron_sakovich
Path Finder

Here's the results of the btool command; I don't know how to parse this, hope you can let me know if there's anything pertinent:

/Applications/Splunk/etc/apps/search/local/props.conf [dateTimeJSON]
/Applications/Splunk/etc/system/default/props.conf    ADD_EXTRA_TIME_FIELDS = True
/Applications/Splunk/etc/system/default/props.conf    ANNOTATE_PUNCT = True
/Applications/Splunk/etc/system/default/props.conf    AUTO_KV_JSON = true
/Applications/Splunk/etc/system/default/props.conf    BREAK_ONLY_BEFORE = 
/Applications/Splunk/etc/system/default/props.conf    BREAK_ONLY_BEFORE_DATE = True
/Applications/Splunk/etc/system/default/props.conf    CHARSET = UTF-8
/Applications/Splunk/etc/apps/search/local/props.conf DATETIME_CONFIG = 
/Applications/Splunk/etc/system/default/props.conf    DEPTH_LIMIT = 1000
/Applications/Splunk/etc/system/default/props.conf    HEADER_MODE = 
/Applications/Splunk/etc/apps/search/local/props.conf INDEXED_EXTRACTIONS = json
/Applications/Splunk/etc/system/default/props.conf    LEARN_MODEL = true
/Applications/Splunk/etc/system/default/props.conf    LEARN_SOURCETYPE = true
/Applications/Splunk/etc/apps/search/local/props.conf LINE_BREAKER = ([\r\n]+)
/Applications/Splunk/etc/system/default/props.conf    LINE_BREAKER_LOOKBEHIND = 100
/Applications/Splunk/etc/system/default/props.conf    MATCH_LIMIT = 100000
/Applications/Splunk/etc/system/local/props.conf      MAX_DAYS_AGO = 5000
/Applications/Splunk/etc/system/default/props.conf    MAX_DAYS_HENCE = 2
/Applications/Splunk/etc/system/default/props.conf    MAX_DIFF_SECS_AGO = 3600
/Applications/Splunk/etc/system/default/props.conf    MAX_DIFF_SECS_HENCE = 604800
/Applications/Splunk/etc/system/default/props.conf    MAX_EVENTS = 256
/Applications/Splunk/etc/system/default/props.conf    MAX_TIMESTAMP_LOOKAHEAD = 128
/Applications/Splunk/etc/system/default/props.conf    MUST_BREAK_AFTER = 
/Applications/Splunk/etc/system/default/props.conf    MUST_NOT_BREAK_AFTER = 
/Applications/Splunk/etc/system/default/props.conf    MUST_NOT_BREAK_BEFORE = 
/Applications/Splunk/etc/apps/search/local/props.conf NO_BINARY_CHECK = true
/Applications/Splunk/etc/system/default/props.conf    SEGMENTATION = indexing
/Applications/Splunk/etc/system/default/props.conf    SEGMENTATION-all = full
/Applications/Splunk/etc/system/default/props.conf    SEGMENTATION-inner = inner
/Applications/Splunk/etc/system/default/props.conf    SEGMENTATION-outer = outer
/Applications/Splunk/etc/system/default/props.conf    SEGMENTATION-raw = none
/Applications/Splunk/etc/system/default/props.conf    SEGMENTATION-standard = standard
/Applications/Splunk/etc/system/default/props.conf    SHOULD_LINEMERGE = True
/Applications/Splunk/etc/apps/search/local/props.conf TIMESTAMP_FIELDS = DateDeleted
/Applications/Splunk/etc/system/default/props.conf    TRANSFORMS = 
/Applications/Splunk/etc/system/default/props.conf    TRUNCATE = 10000
/Applications/Splunk/etc/apps/search/local/props.conf category = Structured
/Applications/Splunk/etc/apps/search/local/props.conf description = Get the right date from record with multiple dates included.
/Applications/Splunk/etc/system/default/props.conf    detect_trailing_nulls = false
/Applications/Splunk/etc/apps/search/local/props.conf disabled = false
/Applications/Splunk/etc/system/default/props.conf    maxDist = 100
/Applications/Splunk/etc/system/default/props.conf    priority = 
/Applications/Splunk/etc/apps/search/local/props.conf pulldown_type = 1
/Applications/Splunk/etc/system/default/props.conf    sourcetype = 
0 Karma

maciep
Champion

Yeah, i would suggest setting AUTO_KV_JSON=false for your sourcetype as well.

From the docs....

AUTO_KV_JSON = <boolean>
* Used for search-time field extractions only.
* Specifies whether to try json extraction automatically.
* Default: true

somesoni2
SplunkTrust
SplunkTrust

Does the raw data in the source CSV file (assuming you have access to it) has single values in them? If yes, then it looks like the field extraction is done twice for that sourcetype. As @richgalloway suggested, find all props.conf stanza setup for that sourcetype, so we can see if there are any duplicate configurations for field extractions (could be the case where both indextime and search time field extraction is setup).

0 Karma

aaron_sakovich
Path Finder

Yes, all the CSV data which goes back 1 year have single entries. Only the JSON we input today is duplicated. I was only able to find the one props.conf with [dateTimeJSON] in it, in etc/apps/search/local/.
Should also mention, 8.0.1.

0 Karma

jscraig2006
Communicator

I have the same issue when it comes to JSON from our Azure blob. _raw will only indicate one field for one value but the extracted fields has double values for the single event. Only when I eval the field to rename does my reports remove the duplicate values. I only have one stanza per `sourcetype' as well.

0 Karma

richgalloway
SplunkTrust
SplunkTrust

Can you share the props.conf stanza for that sourcetype?

---
If this reply helps you, Karma would be appreciated.
0 Karma

aaron_sakovich
Path Finder
 [dateTimeJSON]
DATETIME_CONFIG = 
INDEXED_EXTRACTIONS = json
LINE_BREAKER = ([\r\n]+)
NO_BINARY_CHECK = true
TIMESTAMP_FIELDS = DateDeleted
category = Structured
description = Get the right date from record with multiple dates included.
disabled = false
pulldown_type = 1
0 Karma
Get Updates on the Splunk Community!

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...