Getting Data In

LINE_BREAKER & EXTRACT not working

wgawhh5hbnht
Communicator

I'm attempting to ingest Veracode data into Splunk, there isn't anything on splunkbase and based on Veracode's forums, the best way is to make API queries and output as a .csv file. The API calls come from a UF and send directly to our index cluster.
inputs.conf:

#veracode
[monitor:///opt/splunk/logs/veracode/.../*.csv]
index = veracode
sourcetype = veracode
crcSalt = <SOURCE>

The props (below) was created by Splunk using the "Add Data" & uploading a sample of the data in a file. When I do it this way, Splunk auto-parsed everything, without any combining of events and no need to create a EXTRACT-, with the following props (which I put on the indexers).
props.conf:

#[veracode]
[source::/opt/splunk/logs/veracode/.../*.csv]
DATETIME_CONFIG=CURRENT
SHOULD_LINEMERGE=false
LINE_BREAKER=([\r\n]+)
NO_BINARY_CHECK=true
CHARSET=UTF-8
INDEXED_EXTRACTIONS=csv
KV_MODE=none
category=Structured
description=Comma-separated value format. Set header and other settings in "Delimited Settings"
disabled=false
pulldown_type=true
EXTRACT-veracode=^"(?<app_id>[^"]*)","(?<app_name>[^"]*)","(?<app_business_unit>[^"]*)","(?<build_id>[^"]*)","(?<build_name>[^"]*)","(?<build_type>[^"]*)","(?<build_policy_updated_date>[^"]*)","(?<build_published_date>[^"]*)","(?<build_analysis_size_bytes>[^"]*)","(?<flaw_id>[^"]*)","(?<flaw_date_first_occurrence>[^"]*)","(?<flaw_severity>[^"]*)","(?<flaw_cweid>[^"]*)","(?<flaw_categoryname>[^"]*)","(?<flaw_affects_policy_compliance>[^"]*)","(?<flaw_remediationeffort>[^"]*)","(?<flaw_remediation_status>[^"]*)","(?<flaw_mitigation_status_desc>[^"]*)","(?<flaw_exploitLevel>[^"]*)","(?<flaw_module>[^"]*)","(?<flaw_sourcefile>[^"]*)","(?<flaw_line>[^"]*)"

Sample data:

"app_id","app_name","app_business_unit","build_id","build_name","build_type","build_policy_updated_date","build_published_date","build_analysis_size_bytes","flaw_id","flaw_date_first_occurrence","flaw_severity","flaw_cweid","flaw_categoryname","flaw_affects_policy_compliance","flaw_remediationeffort","flaw_remediation_status","flaw_mitigation_status_desc","flaw_exploitLevel","flaw_module","flaw_sourcefile","flaw_line"
"527625","Vault Microservice","Corp Development","4130495","10.4.104 Promoted","static","2019-05-16 22:05:39+00:00","2019-05-16 22:02:38+00:00","1125889","219","2019-04-10 23:22:12+00:00","3","259","Use of Hard-coded Password","false","4","New","Not Mitigated","1","Vault.MessageHandlerService.exe","connectionconfiguration.cs","1"
"527625","Vault Microservice","Corp Development","4130495","10.4.104 Promoted","static","2019-05-16 22:05:39+00:00","2019-05-16 22:02:38+00:00","1125889","220","2019-04-10 23:22:12+00:00","3","73","External Control of File Name or Path","false","2","New","Not Mitigated","0","Vault.MessageHandlerService.exe","exhibittrackervaultrepository.cs","518"
"527625","Vault Microservice","Corp Development","4130495","10.4.104 Promoted","static","2019-05-16 22:05:39+00:00","2019-05-16 22:02:38+00:00","1125889","222","2019-04-10 23:22:12+00:00","3","331","Insufficient Entropy","false","2","New","Not Mitigated","-1","Vault.Web.exe","auditwritelogger.cs","43"
"527625","Vault Microservice","Corp Development","4130495","10.4.104 Promoted","static","2019-05-16 22:05:39+00:00","2019-05-16 22:02:38+00:00","1125889","223","2019-04-10 23:22:12+00:00","3","73","External Control of File Name or Path","false","2","New","Not Mitigated","0","Vault.MessageHandlerService.exe","exhibittrackervaultrepository.cs","532"
"527625","Vault Microservice","Corp Development","4130495","10.4.104 Promoted","static","2019-05-16 22:05:39+00:00","2019-05-16 22:02:38+00:00","1125889","224","2019-04-10 23:22:12+00:00","3","73","External Control of File Name or Path","false","2","New","Not Mitigated","0","Vault.MessageHandlerService.exe","chunkconcierge.cs","67"
"527625","Vault Microservice","Corp Development","4130495","10.4.104 Promoted","static","2019-05-16 22:05:39+00:00","2019-05-16 22:02:38+00:00","1125889","225","2019-04-10 23:22:12+00:00","2","404","Improper Resource Shutdown or Release","false","2","New","Not Mitigated","0","Vault.MessageHandlerService.exe","vaultfilechunkstream.cs","8"
"527625","Vault Microservice","Corp Development","4130495","10.4.104 Promoted","static","2019-05-16 22:05:39+00:00","2019-05-16 22:02:38+00:00","1125889","227","2019-04-10 23:22:12+00:00","3","352","Cross-Site Request Forgery (CSRF)","false","4","New","Not Mitigated","0","Vault.Web.exe","itemscontroller.cs","24"
"527625","Vault Microservice","Corp Development","4130495","10.4.104 Promoted","static","2019-05-16 22:05:39+00:00","2019-05-16 22:02:38+00:00","1125889","228","2019-04-10 23:22:12+00:00","2","100","Technology-Specific Input Validation Problems","false","3","New","Not Mitigated","0","Vault.Web.exe","patchmetadatamodel.cs","8"
"527625","Vault Microservice","Corp Development","4130495","10.4.104 Promoted","static","2019-05-16 22:05:39+00:00","2019-05-16 22:02:38+00:00","1125889","229","2019-04-10 23:22:12+00:00","3","352","Cross-Site Request Forgery (CSRF)","false","4","New","Not Mitigated","0","Vault.Web.exe","aycoadvisorvaultuploadcontroller.cs","77"

Some of the files only contain the header, annoying but I whatever. The biggest problem is:

1 combining of events.

2 fields not working via EXTRACT.

0 Karma
1 Solution

maciep
Champion

Different settings in props take effect at different times during data ingestion. I would suggest checking out this article if you haven't yet: https://wiki.splunk.com/Where_do_I_configure_my_Splunk_settings%3F

That said, if you want to use the INDEXED_EXTRACTIONS=csv setting, then that needs to be on the universal forwarder, because that's where indexed extractions happen. If you use that, then the fields from the csv will be stored as indexed fields once the data gets to the indexers. If any other parse-time things need done...then they also need to happen on the forwarder.

At that point, you wouldn't need the EXTRACT setting since it's just creating those same fields at search time, which would just be redundant. But that said, if you ever want to use EXTRACT in props, just remember that belongs on your search head - not the forwarder or indexer.

So as a first step, I would simply put this in props.conf on the forwarder and see how it goes.

[veracode]
INDEXED_EXTRACTIONS=csv

View solution in original post

woodcock
Esteemed Legend

The INDEXED_EXTRACTIONS=csv must go on the UF, not the Indexers, and causes most things to have to happen there, when it would otherwise happen on the Indexers. This means that the EXTRACT stuff is most likely redundant.

0 Karma

maciep
Champion

Different settings in props take effect at different times during data ingestion. I would suggest checking out this article if you haven't yet: https://wiki.splunk.com/Where_do_I_configure_my_Splunk_settings%3F

That said, if you want to use the INDEXED_EXTRACTIONS=csv setting, then that needs to be on the universal forwarder, because that's where indexed extractions happen. If you use that, then the fields from the csv will be stored as indexed fields once the data gets to the indexers. If any other parse-time things need done...then they also need to happen on the forwarder.

At that point, you wouldn't need the EXTRACT setting since it's just creating those same fields at search time, which would just be redundant. But that said, if you ever want to use EXTRACT in props, just remember that belongs on your search head - not the forwarder or indexer.

So as a first step, I would simply put this in props.conf on the forwarder and see how it goes.

[veracode]
INDEXED_EXTRACTIONS=csv

marycordova
SplunkTrust
SplunkTrust

could you remove line 2 and 14 from your props and try it again and see what happens?

@marycordova
0 Karma

wgawhh5hbnht
Communicator

"#" on [veracode] didn't get copied over in props.conf, updated my post.
But to answer your question, I commented out lines 2 & 14 (& uncommented out line 1), same outcome, even with |extract reload=T in the search

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

Splunk is officially part of Cisco

Revolutionizing how our customers build resilience across their entire digital footprint.   Splunk ...

Splunk APM & RUM | Planned Maintenance March 26 - March 28, 2024

There will be planned maintenance for Splunk APM and RUM between March 26, 2024 and March 28, 2024 as ...