Hi,
I am trying to extract a value from one of the existing fields. REGEX works fine when used with "rex" directly on the search string but the same regex fails when tried from configuration files. Here's an example
| rex field=threat "^(?<cat>.+?)/"
But When used via props / transforms, this doesn't work. Tried several methods.
using EXTRACT in props.conf
Using REPORT in props with a corresponding transforms.conf stanza
REPORT-category_for_sophos_central = category_for_sophos_central
Using TRANSFORMS in props with a corresponding transforms.conf stanza
TRANSFORMS-category_for_sophos_central = category_for_sophos_central
For transforms.conf, tried two methods to use the REGEX but neither combination works.
[category_for_sophos_central]
SOURCE_KEY= threat
REGEX = "^(?<cat>.+?)/"
FORMAT = cat::$1
[category_for_sophos_central]
REGEX = threat= "^(?<cat>.+?)/"
FORMAT = cat::$1
Cant figure out what am I missing here. As per the documentation, a simple EXTRACT should have worked because "threat" is not a search-time extracted field.
Thanks in advance,
~ Abhi
If the extraction of the threat field relies on auto key value extractions, I guess explicit extractions from that field won't work?
So you either need to write an explicit extraction for the cat field that works on _raw instead of threat, or use a calculated field.
The earlier suggestion should work, but is missing the name of the capturing group (probably disappeared because it wasn't posted as code and then the splunk answers board software strips out stuff between <>
.
Try this: EXTRACT-cat = \"threat":\s+\"(?<cat>\w+)\/
See also: https://regex101.com/r/Ki6GpX/2
you could have _raw in source_key and have the appropriate regex OR did you try with \/?
REGEX = "^(?.+?)\/"
In addition to this I suspect that FORMAT =
is require in your case because based on Documentation if you are extracting fields in REGEX only then no need to define FORMAT =
* REGEX and the FORMAT setting:
* Name-capturing groups in the REGEX are extracted directly to fields.
This means that you do not need to specify the FORMAT setting for
simple field extraction cases (see the description of FORMAT, below).
* If the REGEX extracts both the field name and its corresponding field
value, you can use the following special capturing groups if you want to
skip specifying the mapping in FORMAT:
_KEY_<string>, _VAL_<string>.
* For example, the following are equivalent:
* Using FORMAT:
* REGEX = ([a-z]+)=([a-z]+)
* FORMAT = $1::$2
* Without using FORMAT
* REGEX = (?<_KEY_1>[a-z]+)=(?<_VAL_1>[a-z]+)
this may have to do with the order of precedence:
https://docs.splunk.com/Documentation/Splunk/7.2.3/Knowledge/Searchtimeoperationssequence#Search-tim...
as commented, we would need to know how threat is being derived
you could also write your regex to extract from _raw, and just omit the in threat
part of the props config
a calculated field would be an eval statement in props instead of an extract
Hi Mary,
I tried applying the EXTRACT directly on _raw but results were quite different. It ended up matching few other strings in the RAW text before it could reach the "threat" part.
e.g.
cat={"appCerts": null, "id": "e3f4c2a10a24", "origin": null, "endpoint_type": "server", "name": "Manual cleanup required: 'Mal
cat={"expiration_date": "02
cat={"customer_name": "xxxxxxx", "threat": "ML
Thanks,
~ Abhi
How is field threat being extracted?
Can you try using calculated fields?
Hi,
I could not find any extractions in props/transforms that are specific to "threat" field. Could it be possible that Splunk is applying some default key:value settings on thw raw data to extract this field. It is definitely part of the RAW event.
{ [-]
appCerts: null
appSha256: null
core_remedy_items: null
created_at: 2019-01-15T23:04:44.229Z
customer_id: 9beb-578eabb1179c
customer_name:
endpoint_id: 7dac66fc3eaf
endpoint_type: server
expiration_date: 02/27/2019
group: MALWARE
id: e3f4c2a10a24
location:
name: Manual cleanup required: 'Mal/Generic-S' at 'xxxxxxxx'
origin: null
severity: high
source: n/a
source_info: { [+]
}
threat: Mal/Generic-S
type: Event::Endpoint::Threat::CleanupFailed
user_id: null
when: 2019-01-15T23:04:39.000Z
}
As raw text
{"appCerts": null, "id": "e3f4c2a10a24", "origin": null, "endpoint_type": "server", "name": "Manual cleanup required: 'Mal/Generic-S' at 'xxxxx'", "created_at": "2019-01-15T23:04:44.229Z", "location": "", "core_remedy_items": null, "source": "n/a", "group": "MALWARE", "endpoint_id": "7dac66fc3eaf", "source_info": {"ip": "x.x.x.x"}, "user_id": null, "customer_id": "578eabb1179c", "type": "Event::Endpoint::Threat::CleanupFailed", "appSha256": null, "customer_name": "", "when": "2019-01-15T23:04:39.000Z", "expiration_date": "02/27/2019", "severity": "high", "threat": "Mal/Generic-S"}
Tried using EVAL with split but it adds both values(before and after delimiter) to the field. Not sure how to just keep the first part.
e.g. If I do | eval cat= split(threat,"/") on threat=Mal/Generic-S then i'll end up with cat=Mal & cat=Generic-S
Thanks,
~ Abhi
Are you using Splunk_TA_sophos add-on? we had issues with some extracts, so had to create custom extract/apps
can you try?
EXTRACT-cat = ^{.*\"threat":\s+?\"(?\w+)\/
Hi Lakshman,
Tried using that EXTRACT (Under props.conf) but it did not create any new field.
Thanks,
~ Abhi