Getting Data In

Duplicate entries in index with JSON and missing values

New Member

When I index JSON files I get duplicate entries in the Splunk index and some values are not indexed at.

Example of the JSON files:

{
"State": "value"
"TimeStarted": "03-jan-2018 10:13:29",
"RBName": "Value",
"Tower": "Value",
"RBType": "Value",
"ManualTimeToExecute": 20,
"RefGUID": "cad8efd8-58c4-4924-add7-78c8f9768b83",
"TicketDetails": {
"TimeData": "03-jan-2018 10:13:30",
"Description": "Value",
"TicketNo": "Value",
"TimeCreated": "03-jan-2018 10:13:12",
"ShortDescription": "Value",
"State": "Value",
"ClientRefNumber": "Value"
},
"Activities": [
{
"LogLevel": "Information",
"LogTime": "03-jan-2018 10:13:31",
"Completion": "Success",
"Severity": "GOOD",
"ImpactedUser": "Value",
"Condition": "GOOD",
"LogMessage": " Value",
"ActionTaskName": "Value"
},
],
"Comment": "Value",
"Completion": "Success",
"Condition": "BAD",
"EndTime": "03-jan-2018 10:13:57",
"Severity": "WARNING"
}

The JSON files contains one array which can contain upto 30 items and the file name of each JSON is unique.

The results of indexing the JSON files is:
alt text

I use Splunk 7.1 version and the default _json source type to index the files. The JSON files are hosted on the same server as Splunk is installed in a folder
alt text

Any idea how to fix the duplicate entries in the index and why some values are not indexed at all?

0 Karma

New Member

Screenshot nr. 3 , I started new index and it shows that each file is indexed twice

0 Karma

Champion

Are you sure the duplicate RefGUIDs are incorrect? That would make it sound like events were indexed twice, instead of parsed twice.

And the Condition may not necessarily be wrong either. Do you see a single event that has the Condition value in the JSON but not parsed out by splunk?

0 Karma

New Member

I tried a complete reinstall of SPlunk, same results. 😞
What I noticed that the JSON's missing some values are all indexed only the first +/-160 rows, somehow it doesn't index the complete JSON file. Is there somewhere a limit that I need to increase? Some of the JSON's are upto 500-600 rows in length.

0 Karma

New Member

I fixed the missing values by adding following settings to the json source type:
- TRUNCATE =0
- MAX_EVENTS=1000
Now the complete JSON's gets indexed but still twice.

Any idea how to get rid of the twice indexed JSON's?

0 Karma

New Member

I really looks like all files are indexed twice instead of parsed twice. I started over with clean index and right after the indexing starts you can see that same file is indexed twice (check file name GUID in screenshot 3).

Yes, I checked the original JSON files and they all contain a value in the Condition field.

0 Karma

Champion

Check the output of splunk list monitor to see if the file somehow shows up twice.

0 Karma

New Member

This outputs exactly the 78 JSON files that are in the folder

0 Karma

Champion

Will you add the output of:

splunk btool props list _json --debug

(From your screenshot it looks like the sourcetype is _json)

0 Karma

New Member

Hereby the output:
alt text

0 Karma

Champion

There is certainly nothing in there that I'd expect to be causing this. Can you also send the inputs.conf responsible for this data?

0 Karma

New Member

I am testing on clean install of Splunk.

Inputs.conf in splunk\etc\apps\search\default:

Version 7.0.1

Inputs.conf in splunk\etc\system\default:

Version 7.0.1

DO NOT EDIT THIS FILE!

Changes to default files will be lost on update and are difficult to

manage and support.

Please make any changes to system defaults by overriding them in

apps or $SPLUNK_HOME/etc/system/local

(See "Configuration file precedence" in the web documentation).

To override a specific setting, copy the name of the stanza and

setting to the file where you wish to override it.

This file contains possible attributes and values you can use to

configure inputs, distributed inputs and file system monitoring.

[default]
index = default
_rcvbuf = 1572864
host = $decideOnStartup
evt_resolve_ad_obj = 0
evt_dc_name=
evt_dns_name=

[blacklist:$SPLUNK_HOME\etc\auth]

[monitor://$SPLUNK_HOME\var\log\splunk]
index = _internal

[monitor://$SPLUNK_HOME\var\log\splunk\license_usage_summary.log]
index = _telemetry

[monitor://$SPLUNK_HOME\etc\splunk.version]
_TCP_ROUTING = *
index = _internal
sourcetype=splunk_version

[batch://$SPLUNK_HOME\var\spool\splunk]
move_policy = sinkhole
crcSalt =

[batch://$SPLUNK_HOME\var\spool\splunk...stash_new]
queue = stashparsing
sourcetype = stash_new
move_policy = sinkhole
crcSalt =

[fschange:$SPLUNK_HOME\etc]

poll every 10 minutes

pollPeriod = 600

generate audit events into the audit index, instead of fschange events

signedaudit=true
recurse=true
followLinks=false
hashMaxSize=-1
fullEvent=false
sendEventMaxSize=-1
filesPerDelay = 10
delayInMills = 100

[udp]
connection_host=ip

[tcp]
acceptFrom=*
connection_host=dns

[splunktcp]
route=has_key:_replicationBucketUUID:replicationQueue;has_key:_dstrx:typingQueue;has_key:_linebreaker:indexQueue;absent_key:_linebreaker:parsingQueue
acceptFrom=*
connection_host=ip

[script]
interval = 60.0
start_by_shell = false

[SSL]

SSL settings

The following provides modern TLS configuration that guarantees forward-

secrecy and efficiency. This configuration drops support for old Splunk

versions (Splunk 5.x and earlier).

To add support for Splunk 5.x set sslVersions to tls and add this to the

end of cipherSuite:

DHE-RSA-AES256-SHA:AES256-SHA:DHE-RSA-AES128-SHA:AES128-SHA

and this, in case Diffie Hellman is not configured:

AES256-SHA:AES128-SHA

sslVersions = tls1.2
cipherSuite = ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256
ecdhCurves = prime256v1, secp384r1, secp521r1

allowSslRenegotiation = true
sslQuietShutdown = false

[script://$SPLUNK_HOME\bin\scripts\splunk-wmi.path]
disabled = 0
interval = 10000000
source = wmi
sourcetype = wmi
queue = winparsing
persistentQueueSize=200MB

default single instance modular input restarts

[admon]
interval=60
baseline=0

[MonitorNoHandle]
interval=60

[WinEventLog]
interval=60
evt_resolve_ad_obj = 0
evt_dc_name=
evt_dns_name=

[WinNetMon]
interval=60

[WinPrintMon]
interval=60

[WinRegMon]
interval=60
baseline=0

[perfmon]
interval=300

[powershell]
interval=60

[powershell2]
interval=60

0 Karma