Getting Data In

Not all the JSON files from the same folder location being indexed?

Loves-to-Learn

Environment : Heavy forwarder -> Indexers cluster -> SH

ON HWF side :
I am fetching logs using the Curl command which goes to directory DIR-A and following files are created :
These files gets downloaded everyday 10:00 am and before that script clean up all the old files from both DIR-A and DIR-B

A1.json
B1.json
C1.json
D1.json

Now these files have header and footer which needs to be removed before they indexed as json.

so i have another script which schedule to run after 10 min these files are downloaded in DIR-A
This script remove the header and footer from these files and copy them to New Dir DIR-B as follow :

A2.json
B2.json
C2.json
D2.json

till here everything works fine.

The issue start when I see 3 files indexed in splunk out of 4 or sometimes 2 out of 4.
I dont see any error in internal logs for files which are not indexed.

here is my input.conf :

[monitor:///home/DIR-B/A2.json]
index = test
crcSalt = sourcetype = test1 disabled = false

[monitor:///home/DIR-B/B2.json]
index = test
crcSalt = sourcetype = test2 disabled = false

[monitor:///home/DIR-B/C2.json]
index = test
crcSalt = sourcetype = test3 disabled = false

[monitor:///home/DIR-B/D2.json]
index = test
crcSalt = sourcetype = test4 disabled = false

props.conf : for all the sourcetype test1,test2,test3,test4 is same as below :

DATETIME_CONFIG = CURRENT
INDEXED_EXTRACTIONS = json
KV_MODE = false
AUTO_KV_JSON = false
NO_BINARY_CHECK = true
category = Structured
disabled = false
pulldown_type = true

ON SH side settings :

props.conf for sourcetype test1,test2,test3,test4

KV_MODE = false
AUTO_KV_JSON = false
___-

The strange part is if i edit the file ( the file which is not indexed) and add something like #test at the beginning of file and restart splunk it will get indexed fine.

Here is the pattern of the file which is having issue.

[
{"AAA":"ZZZZ-000","lastSeen":XXXX,"hash":"XXXXXXXXXXXX"},
{"BBB":"MMMM-000","lastSeen":XXXX,"hash":"XXXXXXXXXXXX"},
{"CCC":"yyyy-000","lastSeen":XXXX,"hash":"XXXXXXXXXXXX"}
]

Please suggest if i need to use batch instead of monitor or any other suggestion ?

0 Karma

Path Finder

If your script is overwriting these files every time in your DIR-B, a batch input will work. Just keep in mind that Splunk will delete this file after it has been indexed. You could do something like the following:

# inputs.conf
[batch:///home/DIR-B]
index = test
move_policy = sinkhole
whitelist = .*\.json



# props.conf
# Add the following to your props on the forwarder
[source::...A2.json]
sourcetype = test1

[source::...B2.json]
sourcetype = test2

[source::...C2.json]
sourcetype = test3

[source::...D2.json]
sourcetype = test4
0 Karma

Communicator

Hi

Are you salting the files on pourpose using the crcSalt (I could not tell from the conf files)?
Do these files have any form of timestamp in them?
Do they by any chance generate the same hash value (possibly splunk thinks it has already indexed them)?

0 Karma

Loves-to-Learn

Hi,

Yes i am using crcSalt on purpose. This the settings i have placed .

crcSalt =

Yes, some of the files have timestamp but that is in future timestamp, Hence i am forcing the timestamp to current with the help of :
DATETIME_CONFIG = CURRENT

Nope, all the hash values are different.

0 Karma