Getting Data In
Highlighted

Line splitting at a regular expression for ""

Splunk Employee
Splunk Employee

I'm needing to split a stream of data (from a REST API call) that is CSV data, variable line lengths at the initial set. The split should be when the stream of data has two double quotes together -> "" <-

Example data is:

"AmazonEC2","Asia Pacific (Sydney)","AWS Region","m3.2xlarge","Yes","General purpose","8","Intel Xeon E5-2670 v2 (Ivy Bridge/Sandy Bridge)","2.5 GHz","30 GiB","2 x 80 SSD","High","64-bit",,,,,,,,"Dedicated","Windows","No License required",,,,,,,,"APS2-DedicatedUsage:m3.2xlarge","RunInstances:0002",,,"26",,,,,,,,,,,,,,,"16",,"NA","Intel AVX; Intel Turbo","Amazon Elastic Compute Cloud""FGNPDK5ZFJP4S9NC","MZU6U2429S","FGNPDK5ZFJP4S9NC.MZU6U2429S.2TG2D8R56U","Reserved","Upfront Fee","2016-08-31",,,"Quantity","17896","USD","3yr","All Upfront","convertible","Compute Instance","AmazonEC2","US West (N. California)","AWS Region","c3.4xlarge","Yes","Compute optimized","16","Intel Xeon E5-2680 v2 (Ivy Bridge)","2.8 GHz","30 GiB","2 x 160 SSD","High","64-bit",,,,,,,,"Shared","RHEL","No License required",,,,,,,,"USW1-BoxUsage:c3.4xlarge","RunInstances:0010",,,"55","Yes",,,,,,,,,,,,,,"32",,"NA","Intel AVX; Intel Turbo","Amazon Elastic Compute Cloud""QD2X48Z37JG3VNFX","HU7G6KETJZ","QD2X48Z37JG3VNFX.HU7G6KETJZ.6YS6EN2CT7","Reserved","Windows with SQL Server Enterprise (Amazon VPC), r3.2xlarge reserved instance applied","2016-11-30","0","Inf","Hrs","1.9700000000","USD","1yr","Partial Upfront","standard","Compute Instance","AmazonEC2","Asia Pacific (Tokyo)","AWS Region","r3.2xlarge","Yes","Memory optimized","8","Intel Xeon E5-2670 v2 (Ivy Bridge)","2.5 GHz","61 GiB","1 x 160 SSD","High","64-bit",,,,,,,,"Dedicated","Windows","No License required",,,,,,,,"APN1-DedicatedUsage:r3.2xlarge","RunInstances:0102",,,"26","Yes",,,,,,,,,,,,,,"16",,"SQL Ent","Intel AVX; Intel Turbo","Amazon Elastic Compute Cloud""DCM8ZJ894B27CQ8G","4NA7Y494T4","DCM8ZJ894B27CQ8G.4NA7Y494T4.6YS6EN2CT7","Reserved","Linux/UNIX (Amazon VPC), g3.8xlarge reserved instance applied","2017-06-30","0","Inf","Hrs","2.1400000000","USD","1yr","No Upfront","standard","Compute Instance","AmazonEC2","US West (N. California)","AWS Region","g3.8xlarge","Yes","GPU instance","32","Intel Xeon E5-2686 v4 (Broadwell)","2.3 GHz","244 GiB","EBS only","10 Gigabit","64-bit",,,,,,,,"Shared","Linux","No License required",,,,,,,,"USW1-BoxUsage:g3.8xlarge","RunInstances",,"7000 Mbps","0","Yes","2",,,,,,,,,,"Yes","Yes","Yes","64",,"NA","Intel AVX, Intel AVX2, Intel Turbo","Amazon Elastic Compute Cloud""EX33FD39CKVCKNYQ","MZU6U2429S","EX33FD39CKVCKNYQ.MZU6U2429S.2TG2D8R56U","Reserved","Upfront Fee","2017-04-30",,,"Quantity","14074","USD","3yr","All Upfront","convertible","Compute Instance","AmazonEC2","US West (N. California)","AWS Region","m4.4xlarge","Yes","General purpose","16","Intel Xeon E5-2676 v3 (Haswell)","2.4 GHz","64 GiB","EBS only","High","64-bit",,,,,,,,"Dedicated","Linux","No License required",,,,,,,,"USW1-DedicatedUsage:m4.4xlarge","RunInstances",,"2000 Mbps","53.5","Yes",,,,,,,,,,,,,,"32",,"NA","Intel AVX; Intel AVX2; Intel Turbo","Amazon Elastic Compute Cloud""QGQ2W8XX4J2CGD82","4NA7Y494T4","QGQ2W8XX4J2CGD82.4NA7Y494T4.6YS6EN2CT7","Reserved","Red Hat Enterprise Linux (Amazon VPC), m4.xlarge reserved instance applied","2017-04-30","0","Inf","Hrs","0.2154000000","USD","1yr","No Upfront","standard","Compute Instance","AmazonEC2","Asia Pacific (Singapore)","AWS Region","m4.xlarge","Yes","General purpose","4","Intel Xeon E5-2676 v3 (Haswell)","2.4  GHz","16 GiB","EBS only","High","64-bit",,,,,,,,"Shared","RHEL","No License required",,,,,,,,"APS1-BoxUsage:m4.xlarge","RunInstances:0010",,"750 Mbps","13","Yes",,,,,,,,,,,,,,"8",,"NA","Intel AVX; Intel AVX2; Intel Turbo","Amazon Elastic Compute Cloud""DZS3NEJDE8E98442","4NA7Y494T4","DZS3NEJDE8E98442.4NA7Y494T4.6YS6EN2CT7","Reserved","Windows with SQL Server Standard (Amazon VPC), i3.4xlarge reserved instance applied","2017-06-30","0","Inf","Hrs","3.5970000000","USD","1yr","No Upfront","standard","Compute Instance","AmazonEC2","EU (Ireland)","AWS Region","i3.4xlarge","Yes","Storage optimized","16","Intel Xeon E5-2686 v4 (Broadwell)","2.3 GHz","122 GiB","2 x 1.9 NVMe SSD","Up to 10 Gigabit","64-bit",,,,,,,,"Shared","Windows","No License required",,,,,,,,"EU-BoxUsage:i3.4xlarge","RunInstances:0006",,"3500 Mbps","99","Yes",,,,,,,,,,,,,,"32",,"SQL Std","Intel AVX, Intel AVX2, Intel Turbo","Amazon Elastic Compute Cloud"
0 Karma
Highlighted

Re: Line splitting at a regular expression for ""

SplunkTrust
SplunkTrust

I'm assuming if you cannot do:

LINE_BREAKER = \"\"

Then something like:

LINE_BREAKER = \x22\x22

Perhaps?
I can convert this to an answer if it works.

0 Karma
Highlighted

Re: Line splitting at a regular expression for ""

Influencer

I believe the props.conf settings you want for your sourcetype on the splunk instance (indexer/hwf) that'll be doing the parsing of your data will be:

[yoursourcetype]
LINE_BREAKER = "()"
SHOULD_LINEMERGE = false

LINE_BREAKER should have a capturing group that it removes from the data as being between lines... by default it's any number of consecutive newline and carriage return characters, but in this case it'll remove the matching nothing between two consecutive double quotes.

You probably also want to configure the timestamp identification properties, as well as the search time properties for what the fields of your CSV mean, but those are different similar steps 🙂

View solution in original post

Highlighted

Re: Line splitting at a regular expression for ""

Splunk Employee
Splunk Employee

That set me on the right track. Having issues ignoring certain events from being ingested, so still working that front, but the events are breaking as desired at a "" within the data stream from the API.

Thanks!

0 Karma