Solved: Loading csv file with 39 columns, multiline and se...

jlaverick1 · ‎04-01-2014

Hi,

I have some data extracted from a table in an SQL database which has 39 columns and uses a semicolon as a field separator. Some of the larger fields seem to put a line feed instead of a semicolon as my field separator so i end up with rows of data over a number of lines.

Also my date which i found find useful is in column 16.

I can't seem to work out how to load this data (csv) in to splunk for analysis. Here is an example of 3 rows of data with a header row for each column.

I am very new to splunk and this is the first piece of data i have ever tried to load. I have been reading through the documentation but not really worked out how to do this.

Thanks in advance

James

AlertID;ConnectorID;MDRAlarmID;DeviceID;ModelElementID;SituationMessage;SituationType;AlertDetail;ClassID;CIID;Active;Acknowledged;AssignedTo;EscalationTime;LoggedTime;ReportedTime;EscalationStartTime;ClearedTime;SvcDeskTicket;Severity;AlertLICURL;TicketURL;AlertNamespaceMapID;MDRticketURL;SDTicketProps;UserAttribute1;UserAttribute2;UserAttribute3;UserAttribute4;UserAttribute5;ssaAcknowledged;ssaCleared;ssaAssigned;ElapsedTime;IsClearable;RepeatCount;RetireTimestamp;SeverityTrend;isExempt 1;23;d8992700-19b5-4145-8352-991f847745c5;dnsname,machinename1.com;0af1fb8d-c859-48b6-06ee-d8a160b4a2e0;NTFS - Delayed Write Lost;Risk;{Delayed Write Failed} Windows was unable to save all the data for the file . The data has been lost. This error may be caused by a failure of your computer hardware or network connection. Please try to save this file elsewhere. ;9;1;0;0;;;2013-09-09 12:36:07.320000000;2013-09-07 14:31:14;;2013-09-09 12:36:54.827000000;;1;;;;;;STORE;;;notops;;0;0;0;0;1;1672;2013-09-09 13:36:07.323000000;0;0 2;23;a0ec2c73-839d-4f37-9377-21e0e432ce9b;devicednsname,machinename2.com;fd0a5008-bd8b-65ef-3354-3abb35db5474;Alert generation was temporarily suspended due to too many alerts.;Risk;A rule has generated 50 alerts in the last 60 seconds. Usually, when a rule generates this many alerts, it is because the rule definition is misconfigured. Please examine the rule for errors. In order to avoid excessive load, this rule will be temporarily suspended until 2013-09-09T13:38:25.6666944+01:00.

Rule: Microsoft.Windows.Server.2003.OperatingSystem.NTFSDelayedWriteLost.Alert

Instance: Microsoft(R) Windows(R) Server 2003, Standard Edition

Instance ID: {419F0930-EB2F-B85B-E8D9-BE471D40CE86}

Management Group: ISS
;53;1;0;0;;;2013-09-09 12:36:07.347000000;2013-09-03 23:52:45;;2013-09-09 12:36:54.823000000;;1;;;;;;MAINTENANCE;;;notops;;0;0;0;0;1;788;2013-09-09 13:36:07.347000000;0;0
3;23;6bfa55dd-9383-48d3-b532-3a17179c6940;identifyingname,placeholder419f0930-eb2f-b85b-e8d9-be471d40ce86;419f0930-eb2f-b85b-e8d9-be471d40ce86;NTFS - Delayed Write Lost;Risk;The driver detected a controller error on \Device\Harddisk2.
;32;1;0;0;;;2013-09-09 12:36:07.367000000;2013-09-07 00:39:39;;2013-09-09 12:36:54.820000000;;1;;;;;;STORE;;;notops;;0;0;0;0;1;18198;2013-09-09 13:36:07.367000000;0;0

richgalloway · ‎04-01-2014

Too bad about the air gap. You'll have to it the import "manually". Based on your last comment, I assume you have some control over how the CSV file is created. If so, put your timestamp field first in your CSV. That will make it easier for Splunk to find it. Also, I suggest putting quotes around the text fields to avoid problems with any unexpected embedded semicolons.

Put this into your local transforms.conf file:

[mydata-transform]
REGEX=(?<ReportedTime>.*?);(?<AlertID>.*?);(?<ConnectorID>.*?);(?<MDRAlarmID>.*?);(?<DeviceID>.*?);(?<ModelElementID>.*?);(?<SituationMessage>.*?);(?<SituationType>.*?);\"(?<AlertDetail>.*?)\";(?<ClassID>.*?);(?<CIID>.*?);(?<Active>.*?);(?<Acknowledged>.*?);(?<AssignedTo>.*?);(?<EscalationTime>.*?);(?<LoggedTime>.*?);(?<EscalationStartTime>.*?);(?<ClearedTime>.*?);(?<SvcDeskTicket>.*?);(?<Severity>.*?);(?<AlertLICURL>.*?);(?<TicketURL>.*?);(?<AlertNamespaceMapID>.*?);(?<MDRticketURL>.*?);(?<SDTicketProps>.*?);(?<UserAttribute1>.*?);(?<UserAttribute2>.*?);(?<UserAttribute3>.*?);(?<UserAttribute4>.*?);(?<UserAttribute5>.*?);(?<ssaAcknowledged>.*?);(?<ssaCleared>.*?);(?<ssaAssigned>.*?);(?<ElapsedTime>.*?);(?<IsClearable>.*?);(?<RepeatCount>.*?);(?<RetireTimestamp>.*?);(?<SeverityTrend>.*?);(?<isExempt>.*?)

And put this into your local props.conf file:

[mydata]
REPORT-mydata = mydata-transform
SHOULD_LINEMERGE = true

You could do it with DELIM and FIELD statements, but that wouldn't allow for quoted fields.

---
If this reply helps you, Karma would be appreciated.

View solution in original post

richgalloway · ‎04-01-2014

Too bad about the air gap. You'll have to it the import "manually". Based on your last comment, I assume you have some control over how the CSV file is created. If so, put your timestamp field first in your CSV. That will make it easier for Splunk to find it. Also, I suggest putting quotes around the text fields to avoid problems with any unexpected embedded semicolons.

Put this into your local transforms.conf file:

[mydata-transform]
REGEX=(?<ReportedTime>.*?);(?<AlertID>.*?);(?<ConnectorID>.*?);(?<MDRAlarmID>.*?);(?<DeviceID>.*?);(?<ModelElementID>.*?);(?<SituationMessage>.*?);(?<SituationType>.*?);\"(?<AlertDetail>.*?)\";(?<ClassID>.*?);(?<CIID>.*?);(?<Active>.*?);(?<Acknowledged>.*?);(?<AssignedTo>.*?);(?<EscalationTime>.*?);(?<LoggedTime>.*?);(?<EscalationStartTime>.*?);(?<ClearedTime>.*?);(?<SvcDeskTicket>.*?);(?<Severity>.*?);(?<AlertLICURL>.*?);(?<TicketURL>.*?);(?<AlertNamespaceMapID>.*?);(?<MDRticketURL>.*?);(?<SDTicketProps>.*?);(?<UserAttribute1>.*?);(?<UserAttribute2>.*?);(?<UserAttribute3>.*?);(?<UserAttribute4>.*?);(?<UserAttribute5>.*?);(?<ssaAcknowledged>.*?);(?<ssaCleared>.*?);(?<ssaAssigned>.*?);(?<ElapsedTime>.*?);(?<IsClearable>.*?);(?<RepeatCount>.*?);(?<RetireTimestamp>.*?);(?<SeverityTrend>.*?);(?<isExempt>.*?)

And put this into your local props.conf file:

[mydata]
REPORT-mydata = mydata-transform
SHOULD_LINEMERGE = true

You could do it with DELIM and FIELD statements, but that wouldn't allow for quoted fields.

---
If this reply helps you, Karma would be appreciated.

jlaverick1 · ‎04-01-2014

Thanks Rich. You got me thinking about the database option too. So I have dumped the DB from one system and imported locally in to a local SQL instance and will go for the DBConnect option.

Thanks for your help 🙂

richgalloway · ‎04-01-2014

Consider using Splunk DB Connect to directly extract the data from your SQL database. DB Connect lets you specify which column to use for the timestamp. It should also handle multi-line fields.

---
If this reply helps you, Karma would be appreciated.

jlaverick1 · ‎04-01-2014

Thanks for the reply.

The server is on a remote network with an air gap so direct connection is not really possible. This was the reason I exported the table to csv file and was trying to load that way.

Loading csv file with 39 columns, multiline and semicolon separated

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Deep Dive: Accelerate threat investigation with Splunk’s AI Assistant in Security

Announcing Modern Navigation: A New Era of Splunk User Experience

Detection Engineering Office Hours: Real-World Troubleshooting & Q&A

Join the Conversation

Loading csv file with 39 columns, multiline and semicolon separated

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Deep Dive: Accelerate threat investigation with Splunk’s AI Assistant in Security

Announcing Modern Navigation: A New Era of Splunk User Experience

Detection Engineering Office Hours: Real-World Troubleshooting & Q&A