CSV Field Extraction Cutting Off at 1,000 Characte...

Jornoh · ‎07-15-2024

I am trying to ingest data from a CSV file. One of the columns in the CSV file contain SQL queries. The header has field names that are comma-separated, but the field containing the SQL queries is not being extracted correctly. It seems that if any of the SQL queries are longer than 1,000 characters, Splunk will only extract the first 1,000 characters of the query to the field. How can I make it so that Splunk will extract the entire SQL query from the CSV file?

For example, here is the _raw for one of the rows in the CSV file:

2024-07-15 16:30:12.207504,job_name,24 9 * * *,Warehouse_ABCDEF,indexName,sourcetype,"SELECT Lowest_Tier_Rating, Application_Name, Base_Application_Name, Application_Status, Application_Short_Description, Application_Comments, Application_CDL_Associated_Legal_Entities, Application_Environment, Application_Function, Application_Platform, RegSCI, RegSCI_Indirect, RegSCI_Critical, Application_Title, Application_Support_Group, Application_Info_Classification, Application_Info_Risk, Sr_Systems_Director_Responsible_Name, Sr_Systems_Director_Responsible_ID, Team_Process_Lead_Responsible_Name, Team_Process_Lead_Responsible_ID, Executive_Director_Responsible_Name, Executive_Director_Responsible_ID, Systems_Director_Responsible_Name, Systems_Director_Responsible_ID, Managing_Director_Responsible_Name, Managing_Director_Responsible_ID, Team_Content_Lead_Responsible_Name, Team_Content_Lead_Responsible_ID, Area_Process_Lead_Responsible_Name, Area_Process_Lead_Responsible_ID, Area_Content_Lead_Responsible_Name, Area_Content_Lead_Responsible_ID, manager_Responsible_Name, Manager_Responsible_ID, Infrastructure_Team_Lead_Responsible_Name, Infrastructure_Team_Lead_Responsible_ID, Domain_Lead_Responsible_Name, Domain_Lead_Responsible_ID, ResolvedDeviceName, DeviceName, IP_Address, Common_Name, Host, Valid_From, Valid_To, Source, Validity_Period, Validity_Period_Months, Key_Size, Signature_Algorithm, Organizational_Unit, Issuer, Serial_Number, Contact, Installations, Nickname, Problems, NC, Port, Device, DN, Validity_Status, IsManaged\ FROM ""ab_cd_ef"".""Dashboards"".""ABC_Venafi_Certificate_MappedTo_ABCDEF_Applications"";",0,1

And here is the extracted SQL Query:

SELECT Lowest_Tier_Rating, Application_Name, Base_Application_Name, Application_Status, Application_Short_Description, Application_Comments, Application_CDL_Associated_Legal_Entities, Application_Environment, Application_Function, Application_Platform, RegSCI, RegSCI_Indirect, RegSCI_Critical, Application_Title, Application_Support_Group, Application_Info_Classification, Application_Info_Risk, Sr_Systems_Director_Responsible_Name, Sr_Systems_Director_Responsible_ID, Team_Process_Lead_Responsible_Name, Team_Process_Lead_Responsible_ID, Executive_Director_Responsible_Name, Executive_Director_Responsible_ID, Systems_Director_Responsible_Name, Systems_Director_Responsible_ID, Managing_Director_Responsible_Name, Managing_Director_Responsible_ID, Team_Content_Lead_Responsible_Name, Team_Content_Lead_Responsible_ID, Area_Process_Lead_Responsible_Name, Area_Process_Lead_Responsible_ID, Area_Content_Lead_Responsible_Name, Area_Content_Lead_Responsible_ID, manager_Responsible_Name, Manager_Respo

PickleRick · ‎07-16-2024

And your props for this sourcetype are...?

Jornoh · ‎07-16-2024

Hi, here are the props.conf for the CSV file:

DATETIME_CONFIG =
INDEXED_EXTRACTIONS = csv
KV_MODE = none
NO_BINARY_CHECK = true
SHOULD_LINEMERGE = false
category = Structured
description = This sourcetype stores all the DB connect information
disabled = false
pulldown_type = true

CSV Field Extraction Cutting Off at 1,000 Characters

CSV

field extraction

Unlock New Opportunities with Splunk Education: Explore Our Latest Courses!

Technical Workshop Series: Splunk Data Management and SPL2 | Register here!

Spotting Financial Fraud in the Haystack: A Guide to Behavioral Analytics with Splunk