Splunk Search

Props.conf extract not working

milesmedboe
Explorer

Hi folks,

Recently onboarded a new sourcetype configured with search time extractions. Regex works when tested on sample data, however at search time, about 400 fields are extracted which are complete nonsense, the desired fields aren't extracted at all.

Config is on Heavy forwarder, and Search Head Cluster.

Any guidance would be much appreciated!

Thanks

[aam_wss]
DATETIME_CONFIG =
NO_BINARY_CHECK = true
category = Custom
disabled = false
KV_MODE = none
pulldown_type = true
TZ = UCT

    EXTRACT-wss = " ^(?<x_bluecoat_request_tenant_id>[^\s]+) (?<date>\d+\-\d+\-\d+) (?<time>\d+:\d+:\d+) \"(?<x_bluecoat_appliance_name>[^\s]+)\" (?<time_taken>[^\s]+) (?<c_ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}) (?<cs_userdn>[^\s]+) \"?(?<cs_auth_groups>[^\s\"]+)\"? (?<x_exception_id>[^\s]+) (?<sc_filter_result>[^\s]+) \"(?<cs_categories>.*?)\" (?<cs_Referer>[^\s]+) (?<sc_status>[^\s]+) (?<s_action>[^\s]+) (?<cs_method>[^\s]+) (?<rs_Content_Type>[^\s]+) (?<cs_uri_scheme>[^\s]+) (?<cs_host>[^\s]+) (?<cs_uri_port>[^\s]+) (?<cs_uri_path>[^\s]+) (?<cs_uri_query>[^\s]+) (?<cs_uri_extension>[^\s]+) \"?(?<cs_User_Agent>.*?)\"? (?<s_ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}) (?<sc_bytes>[^\s]+) (?<cs_bytes>[^\s]+) (?<x_data_leak_detected>[^\s]+) (?<x_virus_id>[^\s]+) (?<x_bluecoat_location_id>[^\s]+) \"(?<x_bluecoat_location_name>.*?)\" (?<x_bluecoat_access_type>[^\s]+) \"(?<x_bluecoat_application_name>.*?)\" \"(?<x_bluecoat_application_operation>.*?)\" (?<r_ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}) \"(?<r_supplier_country>.*?)\" (?<x_rs_certificate_validate_status>[^\s]+) (?<x_rs_certificate_observed_errors>[^\s]+) (?<x_cs_ocsp_error>[^\s]+) (?<x_rs_ocsp_error>[^\s]+) (?<ssl_version>[^\s]+) (?<negotiated_cipher>[^\s]+) (?<cipher_size>[^\s]+) (?<x_rs_certificate_hostname>[^\s]+) \"?(?<certificate_hostname_categories>.*?)\"? (?<x_cs_negotiated_ssl_version>[^\s]+) (?<x_cs_negotiated_cipher>[^\s]+) (?<x_cs_negotiated_cipher_size>[^\s]+) (?<x_cs_certificate_subject>[^\s]+) (?<cs_icap_status>[^\s]+) (?<cs_icap_error_details>[^\s]+) (?<rs_icap_status>[^\s]+) (?<rs_icap_error_details>[^\s]+) (?<s_supplier_ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}) (?<s_supplier_country>[^\s]+) (?<s_supplier_failures>[^\s]+) \"(?<x_cs_client_ip_country>.*?)\" (?<cs_threat_risk>[^\s]+) (?<x_rs_certificate_threat_risk>[^\s]+) (?<x_client_agent_type>[^\s]+) (?<x_client_os>[^\s]+) (?<x_client_agent_sw>[^\s]+) (?<x_client_device_id>[^\s]+) (?<x_client_device_name>[^\s]+) (?<x_client_device_type>[^\s]+) (?<x_client_security_details>[^\s]+) (?<x_client_security_risk_score>[^\s]+) (?<x_bluecoat_reference_id>[^\s]+) (?<x_sc_connection_issuer_keyring>[^\s]+) (?<x_scissuer_keyring_alias>[^\s]+) (?<x_cloud_rs>[^\s]+) (?<x_bluecoat_placeholder>[^\s]+) (?<cs_X_Requested_With>[^\s]+) (?<x_bluecoat_transaction_uuid>[^\s]+)"
0 Karma

woodcock
Esteemed Legend

The garbage fields are due to automatic key-value extraction so you need to set KV_MODE = none against your sourcetype on your Search Head. As far as the broken field extractions, that is the splunk life. You are just going to have to work through it. I like to use RegEx101.com. We could help more, but you did not post your broken events.

milesmedboe
Explorer

Thanks for the advice, Regex is tested and functional, KV mode is also set to none. Bit of a weird one I've not come up against before. Raising a support case with Splunk to see if I can get a resolution.

0 Karma

woodcock
Esteemed Legend

What did they say/find?

0 Karma

FrankVl
Ultra Champion

He already has KV_MODE = none and in the comments below my answer he also shared a sample event, which seems to match the regex (after removing the quotes surrounding the regex, which he claims he also tried already). He mentions he even used btool to confirm the config is correct.

So it is a bit of a mystery. Unless he is actually using the wrong sourcetype or so.

0 Karma

jnudell_2
Builder

Hi @milesmedboe ,

I have tested the following setting for props.conf and it works:

EXTRACT-wss = ^(?<x_bluecoat_request_tenant_id>[^\s]+) (?<date>\d+\-\d+\-\d+) (?<time>\d+:\d+:\d+) "(?<x_bluecoat_appliance_name>[^\s]+)" (?<time_taken>[^\s]+) (?<c_ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}) (?<cs_userdn>[^\s]+) "?(?<cs_auth_groups>[^\s"]+)"? (?<x_exception_id>[^\s]+) (?<sc_filter_result>[^\s]+) "(?<cs_categories>.*?)" (?<cs_Referer>[^\s]+) (?<sc_status>[^\s]+) (?<s_action>[^\s]+) (?<cs_method>[^\s]+) (?<rs_Content_Type>[^\s]+) (?<cs_uri_scheme>[^\s]+) (?<cs_host>[^\s]+) (?<cs_uri_port>[^\s]+) (?<cs_uri_path>[^\s]+) (?<cs_uri_query>[^\s]+) (?<cs_uri_extension>[^\s]+) "?(?<cs_User_Agent>.*?)"? (?<s_ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}) (?<sc_bytes>[^\s]+) (?<cs_bytes>[^\s]+) (?<x_data_leak_detected>[^\s]+) (?<x_virus_id>[^\s]+) (?<x_bluecoat_location_id>[^\s]+) "(?<x_bluecoat_location_name>.*?)" (?<x_bluecoat_access_type>[^\s]+) "(?<x_bluecoat_application_name>.*?)" "(?<x_bluecoat_application_operation>.*?)" (?<r_ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}) "(?<r_supplier_country>.*?)" (?<x_rs_certificate_validate_status>[^\s]+) (?<x_rs_certificate_observed_errors>[^\s]+) (?<x_cs_ocsp_error>[^\s]+) (?<x_rs_ocsp_error>[^\s]+) (?<ssl_version>[^\s]+) (?<negotiated_cipher>[^\s]+) (?<cipher_size>[^\s]+) (?<x_rs_certificate_hostname>[^\s]+) "?(?<certificate_hostname_categories>.*?)"? (?<x_cs_negotiated_ssl_version>[^\s]+) (?<x_cs_negotiated_cipher>[^\s]+) (?<x_cs_negotiated_cipher_size>[^\s]+) (?<x_cs_certificate_subject>[^\s]+) (?<cs_icap_status>[^\s]+) (?<cs_icap_error_details>[^\s]+) (?<rs_icap_status>[^\s]+) (?<rs_icap_error_details>[^\s]+) (?<s_supplier_ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}) (?<s_supplier_country>[^\s]+) (?<s_supplier_failures>[^\s]+) "(?<x_cs_client_ip_country>.*?)" (?<cs_threat_risk>[^\s]+) (?<x_rs_certificate_threat_risk>[^\s]+) (?<x_client_agent_type>[^\s]+) (?<x_client_os>[^\s]+) (?<x_client_agent_sw>[^\s]+) (?<x_client_device_id>[^\s]+) (?<x_client_device_name>[^\s]+) (?<x_client_device_type>[^\s]+) (?<x_client_security_details>[^\s]+) (?<x_client_security_risk_score>[^\s]+) (?<x_bluecoat_reference_id>[^\s]+) (?<x_sc_connection_issuer_keyring>[^\s]+) (?<x_scissuer_keyring_alias>[^\s]+) (?<x_cloud_rs>[^\s]+) (?<x_bluecoat_placeholder>[^\s]+) (?<cs_X_Requested_With>[^\s]+) (?<x_bluecoat_transaction_uuid>[^\s]+)

If that doesn't work, I would look at your props.conf with btool to see if something is taking precedence over your setting.

0 Karma

FrankVl
Ultra Champion

Try remove the " around the REGEX, that's copy pasted from search bar I guess (where you do need those)? Also no need to do \" inside the regex, just " should do.

milesmedboe
Explorer

Thanks for the advice, had attempted this in the first instance, thought it might need to be formatted the same as it needs to be in Splunk search as it was not working. Have reverted as per your suggestions to no avail.

KV_mode is set to none, yet Splunk is attempting to automatically hundreds of fields. Have used btool to ensure the correct config is in memory, bit stumped!

Thanks again!

0 Karma

FrankVl
Ultra Champion

Any chance you can share some screenshots of what the data looks like and the kind of fields that get extracted?

0 Karma

milesmedboe
Explorer

Unfortunately don't have the required Karma yet required to upload anything

This is a scrubbed example from the logs -

26111 1007-03-27 15:00:41 "BV1-ZC0_VvsbkBI" 20 125.20.105.50 EVERETTE\Naida%00Ldbrljloh "EVERETTE\ROLE-U-ILA-QujqvtyGucatk" - OBSERVED "Business/Economy;Web Ads/Annamaria" https://app.jackqueline.com/player?course=call-monitoring-measure-quality&author=shaunte-miller&name... 200 TCP_BY_MISS GET text/plain https tim-ei00-g0.czmrorwaya01.com 131 /ping ?michAela=00523&bitrate=-1&throughput=-1&playhead=261.3046330&hldxyqaPsczrp=0&playrate=1&timemark=1001312020210&system=anlbjtthfrjtbfk&guillerMina=U_20000036_renf5gzemojr05fo_1530010403312&joaqUina=02&code=U_20000036_renf5gzemojr05fo_1530010403312 - "Mozilla/5.0 (Windows NT 6.1; DOZ04; Kennith/7.0; fm:01.0) like Gecko" 042.047.1.2 051 605 no - 310211 "Dannielle Jonelle Data Iraida (IDA)" explicit_proxy "-" "-" 00.200.105.023 "Charlesetta" RONI_VALID none - - CVRq0.2 VELMA-LEA-WEZ145-JJG202 255 *.czmrorwaya01.com "Business/Economy" CVRq0.2 VELMA-LEA-WEZ145-JJG202 255 - LENA_NOT_SCANNED - LENA_NO_MODIFICATION - 00.200.105.023 - - "United Kingdom" 3 2 sep-windows Windows%207%00Tbvtpgqngo 14.2.1023.0100 020NPG02I10P0S5E002I101B4G00B002 OX2-P-GSU1004 FW - - - - - - - - i0erfy049100v30m-0000000022uqo0o1-000000001p012d53

The selected fields area on the left hand-side displays the following

Selected Fields

a 29

acc 1

aaction 9
aapp 3
aArchitecture 1
aatyp 2

c 27

cd 21

acharset 22
acolor 1
acomponent 2
act 7
aculture 4

date_hour 1

date_mday 1

date_minute 1

adate_month 1

date_second 18

adate_wday 1

date_year 1

date_zone 1

adomain 4

dst 1

aei 27
aeventtype 1

expires 2

f_dir 1

afactoryName 1
afname 6

h 18

ahash 3
ahl 10
ahost 1

ht 2

aid 36
aidclient 2

ima 8

imn 5

aindex 1
aip 2

linecount 1

alng 2
aloc 2
alocation 2
amode 1
aname 11
ap 18

pid 16

aproduct 1
aptag 1
apunct 100+
aq 69

r 22

are 4
aresourceGroupName 1

s 100+

aSID 11

size 2

asource 1
asourcetype 1
asplunk_server 1

src 7

asrc_is_expected 1
asrc_pci_domain 1
asrc_requires_av 1
asrc_should_timesync 1
asrc_should_update 1
astatus 1
asubscriptionId 1
asysparm_auto_request 1
at 55
atag 1
atag::eventtype 1

time 9

timeendpos 1

timestartpos 1

ts 22

aTYPE 1
atype 8
auid 7
aurl 22
av 80
aved 7
aVersion 2
avtag 1

zx 47

Thanks again for your assistance!

0 Karma

FrankVl
Ultra Champion

You can upload screenshots elsewhere (e.g. imgur) and share the links here 🙂

But looks like auto kv is not disabled for starters.

0 Karma

milesmedboe
Explorer

Not really possible in this corporate environment, sorry 😞

I agree, it definitely looks like auto kv is being applied. Btool however only shows "KV_MODE = none" for this sourcetype.

Can you think of anywhere else this could be getting overridden?

Thanks again

0 Karma

FrankVl
Ultra Champion

And the events actually have the correct sourcetype assigned (and only 1)?

0 Karma

milesmedboe
Explorer

It is indeed getting the correct sourcetype. The extractions work well (over >99% of events anyway) when tested as part of a search.

Have raised a support case with Splunk, will update here if I get a resolution.

Thanks for your help!

0 Karma

skalliger
SplunkTrust
SplunkTrust

What does the data look like? Did you try setting KV_MODE = none? Did you do a | extract reload=T after setting that regex on the SH?

Skalli

0 Karma

milesmedboe
Explorer

Thanks Skalli, already had KV_MODE = none, not sure why Splunk is still attempting to extract fields itself.

| extract reload=T didn't help either, wasn't aware of this command though so thanks for bringing it to my attention!

0 Karma
Get Updates on the Splunk Community!

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...