Hi Team,
Because the data storage time of Splunk is limited, we have a scheduled task to export data from Splunk to AWS S3 through Splunk SDK.
SDK output mode: JSON
SPL:
search index=dput | fields - _raw date_* _cd _kv _bkt _si splunk_server punct timeendpos exectime index lang | table *
But recently I encountered a problem. When I batch query data within 10 minutes (about 400000 logs), I found that some logs will lose some fields, such as:
raw data:
"2022-03-01T20:47:04.435Z [XNIO-1 task-16] INFO c.m.assertservice.service.impl.NotebookServiceImpl env=\"PROD\" hostname=\"\" client_ip=\"\" service_name=\"assetservice\" service_version=\"release-1.12.0\" request_id=\"98ad59ad-e973-4258-b559-a5c82476f14d\" event_type=\"read\" event_status=\"success\" event_severity=\"low\" notebook_topics=\"[Manager Research]\" object_type=\"Notebook\" object_id=\"6bcb4ad5-596c-4738-90b9-4bdff9515f12\" component=\"\" event_id=\"98ad59ad-e973-4258-b559-a5c82476f14d\" application=\"\" user_id=\"\" notebook_title=\"Portfolio Manager Performance History\" action=\"GET\" details=\"Get a notebook,title:Portfolio Manager Performance History, type:[LIBRARY]\" eventtype=\"usage\" timestamp=\"2022-03-01T20:47:04.435348Z\" application_area=\"NONE\" event_description=\"Get Notebook By Id UsageTracking\""
search result:
{
"_indextime": "1646167627",
"_sourcetype": "dput_usage",
"_subsecond": ".435",
"_time": "2022-03-01T14:47:04.435-06:00",
"action": "GET",
"application": "",
"application_area": "NONE",
"component": "",
"details": "Get a notebook,title:Portfolio Manager Performance History, type:[LIBRARY]",
"env": "PROD",
"event_id": "98ad59ad-e973-4258-b559-a5c82476f14d",
"event_length": "899",
"event_status": "success",
"eventtype": "usage",
"extracted_sourcetype": "dput_usage",
"host": "",
"hostname": "",
"linecount": "1",
"object_id": "6bcb4ad5-596c-4738-90b9-4bdff9515f12",
"object_type": "Notebook",
"source": "",
"sourcetype": "dput_usage",
"timestamp": "2022-03-01T20:47:04.435348Z",
"timestartpos": "0",
"user_id": ""
}
You can see that the fields owned by raw data such as notebook_title, notebook_topics do not appear in the search result. (I also seem to have this problem exporting JSON on the Web UI.)
This happens when I query a lot of data at the same time. But when I go to query this log alone and return it through the SDK, this problem does not occur, it returns all the results:
{
"_indextime": "1646167627",
"_sourcetype": "dput_usage",
"_subsecond": ".435",
"_time": "2022-03-01T14:47:04.435-06:00",
"action": "GET",
"application": "",
"application_area": "NONE",
"client_ip": "",
"component": "",
"details": "Get a notebook,title:Portfolio Manager Performance History, type:[LIBRARY]",
"env": "PROD",
"event_description": "Get Notebook By Id UsageTracking",
"event_id": "98ad59ad-e973-4258-b559-a5c82476f14d",
"event_length": "899",
"event_severity": "low",
"event_status": "success",
"event_type": "read",
"eventtype": "usage",
"extracted_sourcetype": "dput_usage",
"host": "",
"hostname": "",
"linecount": "1",
"notebook_title": "Portfolio Manager Performance History",
"notebook_topics": "[Manager Research]",
"object_id": "6bcb4ad5-596c-4738-90b9-4bdff9515f12",
"object_type": "Notebook",
"request_id": "98ad59ad-e973-4258-b559-a5c82476f14d",
"service_name": "assetservice",
"service_version": "release-1.12.0",
"source": "",
"sourcetype": "dput_usage",
"timestamp": "2022-03-01T20:47:04.435348Z",
"timestartpos": "0",
"user_id": ""
}
The Java SDK version I am using is 1.8.0 and the C# SDK is 2.2.9
Can anyone answer my doubts? Thanks a lot!