Getting Data In

Ingest CSV removing Top X Lines from file

wilcoxj
New Member

I am ingesting a csv file from my server. I have tried many configurations on the props.conf to no success. Any assistance with what I am doing incorrectly.

Props.conf
[OpsCenter]
INDEXED_EXTRACTRIONS = csv
FIELD_DELIMITER = ,
FIELD_QUOTE = "
DATETIME_CONFIG = CURRENT
FIELD_HEADER_REGEX = ^Report
HEADER_FIELD_LINE_NUMBER = 4

file being ingested.

Daily Status Splunk

Customizable tabular report of all backup data (size, start time, status code etc) by client. In this report, accelerator optimization, savings, or factor is not considered as part of deduplication optimization, savings, or factor. In all other reports, the deduplication values include accelerator values.

Report Time Frame: Previous 24 Hours

Client Name Job Duration Job File Count Throughput (KB/sec) Job Primary ID Policy Name Post Deduplication Size(MB) Job Status Storage Unit Name
cpchiisi04.chi.cintas.com 4:31:00 1,439 106,432 161510 Isilon_SQL_Chi 824,210.78 Successful stu_disk_cpchibk002
cpmasisi04 23:40:57 1,595,462 55,103 952085 Isilon_Common 7,455.01 Successful stu_disk_cpmasbk006
cpmasisi04 24:00:58 1,051,251 101,290 952091 Isilon_Shares 3,440.43 Successful stu_disk_cpmasbk006
cpmasisi04 22:19:14 4,180,140 59,338 952093 Isilon_Shares 1,683.77 Successful stu_disk_cpmasbk006
cpmasisi04 35:13:56 9,271,112 51,439 952095 Isilon_Shares 11,470.67 Successful stu_disk_cpmasbk006
cpmasisi04 9:01:01 0 0 952807 Isilon_SQL 0 Failed UNKNOWN
crbo042.na.cintas.com 1:05:14 31,711,314 128,426 953499 VMDK_NonProd 7,785.33 Successful stu_disk_cpmasbk007
cdmasalc13.na.cintas.com 0:06:15 328,231 636,212 953575 VMDK_NonProd 297.52 Successful stu_disk_cpmasbk007
cpmasisi04 7:04:13 23,488 126,773 953825 Isilon_SQL 1,635,803.66 Successful stu_disk_cpmasbk007
cpmasfs01 1:30:53 53,000 24,894 953915 Creative_Marketing 126,956 Failed cpmasbk006-hcart-robot-tld-0
Report generated on May 20, 2018 8:00:17 AM

Tags (1)
0 Karma
1 Solution

poete
Builder

So I copied your data, added commas and suppressed the thousand separators. It looks like this :

Daily Status Splunk 
Customizable tabular report of all backup data (size, start time, status code etc) by client. In this report, accelerator optimization, savings, or factor is not considered as part of deduplication optimization, savings, or factor. In all other reports, the deduplication values include accelerator values. 
Report Time Frame: Previous 24 Hours 
Client Name,Job Duration,Job File Count, Throughput (KB/sec), Job Primary ID, Policy Name, Post Deduplication Size(MB,) Job Status, Storage Unit Name
cpchiisi04.chi.cintas.com,4:31:00,1439,106432,161510,Isilon_SQL_Chi,824210.78,Successful,stu_disk_cpchibk002
cpmasisi04,23:40:57,1595462,55103,952085,Isilon_Common,7455.01,Successful,stu_disk_cpmasbk006
cpmasisi04,24:00:58,1051251,101290,952091,Isilon_Shares,3440.43,Successful,stu_disk_cpmasbk006
cpmasisi04,22:19:14,4180140,59338,952093,Isilon_Shares,1683.77,Successful,stu_disk_cpmasbk006
cpmasisi04,35:13:56,9271112,51439,952095,Isilon_Shares,11470.67,Successful,stu_disk_cpmasbk006
cpmasisi04,9:01:01,0,0,952807,Isilon_SQL,0,Failed,UNKNOWN
crbo042.na.cintas.com,1:05:14,31711314,128426,953499,VMDK_NonProd,7785.33,Successful,stu_disk_cpmasbk007
cdmasalc13.na.cintas.com,0:06:15,328231,636212 953575,VMDK_NonProd,297.52,Successful,stu_disk_cpmasbk007
cpmasisi04,7:04:13,23488,126773,953825,Isilon_SQL,1635803.66,Successful,stu_disk_cpmasbk007
cpmasfs01,1:30:53, 53000,24894,953915,Creative_Marketing,126956,Failed,cpmasbk006-hcart-robot-tld-0

As it appears, the name of the fields is on line 4.

The following sourcetype configuration enables to index the data.

[sourcetype_answer]
DATETIME_CONFIG = CURRENT
HEADER_FIELD_LINE_NUMBER = 4
INDEXED_EXTRACTIONS = csv
KV_MODE = none
NO_BINARY_CHECK = true
SHOULD_LINEMERGE = false
category = Structured
description = Comma-separated value format. Set header and other settings in "Delimited Settings"
disabled = false
pulldown_type = true

Just a few comments:

  1. I did not use the regex config, as it is not necessary
  2. I did configure the timestamp to be the time of indexation, as there is not info in the file that could be used for this purpose.

I hope this answers your question.

View solution in original post

0 Karma

poete
Builder

So I copied your data, added commas and suppressed the thousand separators. It looks like this :

Daily Status Splunk 
Customizable tabular report of all backup data (size, start time, status code etc) by client. In this report, accelerator optimization, savings, or factor is not considered as part of deduplication optimization, savings, or factor. In all other reports, the deduplication values include accelerator values. 
Report Time Frame: Previous 24 Hours 
Client Name,Job Duration,Job File Count, Throughput (KB/sec), Job Primary ID, Policy Name, Post Deduplication Size(MB,) Job Status, Storage Unit Name
cpchiisi04.chi.cintas.com,4:31:00,1439,106432,161510,Isilon_SQL_Chi,824210.78,Successful,stu_disk_cpchibk002
cpmasisi04,23:40:57,1595462,55103,952085,Isilon_Common,7455.01,Successful,stu_disk_cpmasbk006
cpmasisi04,24:00:58,1051251,101290,952091,Isilon_Shares,3440.43,Successful,stu_disk_cpmasbk006
cpmasisi04,22:19:14,4180140,59338,952093,Isilon_Shares,1683.77,Successful,stu_disk_cpmasbk006
cpmasisi04,35:13:56,9271112,51439,952095,Isilon_Shares,11470.67,Successful,stu_disk_cpmasbk006
cpmasisi04,9:01:01,0,0,952807,Isilon_SQL,0,Failed,UNKNOWN
crbo042.na.cintas.com,1:05:14,31711314,128426,953499,VMDK_NonProd,7785.33,Successful,stu_disk_cpmasbk007
cdmasalc13.na.cintas.com,0:06:15,328231,636212 953575,VMDK_NonProd,297.52,Successful,stu_disk_cpmasbk007
cpmasisi04,7:04:13,23488,126773,953825,Isilon_SQL,1635803.66,Successful,stu_disk_cpmasbk007
cpmasfs01,1:30:53, 53000,24894,953915,Creative_Marketing,126956,Failed,cpmasbk006-hcart-robot-tld-0

As it appears, the name of the fields is on line 4.

The following sourcetype configuration enables to index the data.

[sourcetype_answer]
DATETIME_CONFIG = CURRENT
HEADER_FIELD_LINE_NUMBER = 4
INDEXED_EXTRACTIONS = csv
KV_MODE = none
NO_BINARY_CHECK = true
SHOULD_LINEMERGE = false
category = Structured
description = Comma-separated value format. Set header and other settings in "Delimited Settings"
disabled = false
pulldown_type = true

Just a few comments:

  1. I did not use the regex config, as it is not necessary
  2. I did configure the timestamp to be the time of indexation, as there is not info in the file that could be used for this purpose.

I hope this answers your question.

0 Karma

wilcoxj
New Member

help me understand why this worked. I had tried a config with

DATETIME_CONFIG = CURRENT
HEADER_FIELD_LINE_NUMBER = 4
INDEXED_EXTRACTIONS = CSV
FIELD_DELIMITER = ,

Not sure of the order I used but this didn't work. Is there a specific order you need to follow with props.conf? any insight as to why it may not have worked for me would be wonderful. Thank you so much for your response.

0 Karma

poete
Builder

Hi. Can you please share the content of the csv file (first 10 lines), as I did?
Did you take care of the thousand separator?

0 Karma

wilcoxj
New Member

I am not able to extract the header fields correctly. It only extracts EXTRA_FIELD_X

0 Karma

poete
Builder

Hello. Can you confirm that the name of the fields are 'Client Name Job Duration Job File Count Throughput (KB/sec) Job Primary ID Policy Name Post Deduplication Size(MB) Job Status Storage Unit Name'? If this is the case, why are there no ',' on this line? And also, why not set FIELD_HEADER_REGEX to ^Client?

0 Karma

wilcoxj
New Member

I copied it out of excel so it see's it as separate cells and not with the "," Those are the headers. I thought the FIELD_HEADER_REGEX was to the last line I wanted removed. I am still trying to learn about props.conf files.

0 Karma
Get Updates on the Splunk Community!

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...

Let’s Get You Certified – Vegas-Style at .conf24

Are you ready to level up your Splunk game? Then, let’s get you certified live at .conf24 – our annual user ...