Getting Data In

REGEX works in search but not as a field extraction.

Contributor

Hi,

I am trying to extract a value from one of the existing fields. REGEX works fine when used with "rex" directly on the search string but the same regex fails when tried from configuration files. Here's an example

| rex field=threat "^(?<cat>.+?)/" 

alt text

But When used via props / transforms, this doesn't work. Tried several methods.

  1. using EXTRACT in props.conf
    alt text

  2. Using REPORT in props with a corresponding transforms.conf stanza
    REPORT-categoryforsophoscentral = categoryforsophoscentral

  3. Using TRANSFORMS in props with a corresponding transforms.conf stanza
    TRANSFORMS-categoryforsophoscentral = categoryforsophoscentral

For transforms.conf, tried two methods to use the REGEX but neither combination works.

[category_for_sophos_central]
SOURCE_KEY= threat
REGEX = "^(?<cat>.+?)/"
FORMAT = cat::$1

[category_for_sophos_central]
REGEX = threat= "^(?<cat>.+?)/"
FORMAT = cat::$1

Cant figure out what am I missing here. As per the documentation, a simple EXTRACT should have worked because "threat" is not a search-time extracted field.

Thanks in advance,

~ Abhi

0 Karma

Ultra Champion

If the extraction of the threat field relies on auto key value extractions, I guess explicit extractions from that field won't work?

So you either need to write an explicit extraction for the cat field that works on _raw instead of threat, or use a calculated field.

The earlier suggestion should work, but is missing the name of the capturing group (probably disappeared because it wasn't posted as code and then the splunk answers board software strips out stuff between <>.

Try this: EXTRACT-cat = \"threat":\s+\"(?<cat>\w+)\/
See also: https://regex101.com/r/Ki6GpX/2

0 Karma

SplunkTrust
SplunkTrust

you could have raw in sourcekey and have the appropriate regex OR did you try with \/?

REGEX = "^(?.+?)\/"

0 Karma

SplunkTrust
SplunkTrust

In addition to this I suspect that FORMAT = is require in your case because based on Documentation if you are extracting fields in REGEX only then no need to define FORMAT =

* REGEX and the FORMAT setting:
  * Name-capturing groups in the REGEX are extracted directly to fields.
    This means that you do not need to specify the FORMAT setting for
    simple field extraction cases (see the description of FORMAT, below).
  * If the REGEX extracts both the field name and its corresponding field
    value, you can use the following special capturing groups if you want to
    skip specifying the mapping in FORMAT:
      _KEY_<string>, _VAL_<string>.
  * For example, the following are equivalent:
    * Using FORMAT:
      * REGEX  = ([a-z]+)=([a-z]+)
      * FORMAT = $1::$2
    * Without using FORMAT
      * REGEX  = (?<_KEY_1>[a-z]+)=(?<_VAL_1>[a-z]+)
0 Karma

SplunkTrust
SplunkTrust

this may have to do with the order of precedence:
https://docs.splunk.com/Documentation/Splunk/7.2.3/Knowledge/Searchtimeoperationssequence#Search-tim...

as commented, we would need to know how threat is being derived

you could also write your regex to extract from _raw, and just omit the in threat part of the props config

a calculated field would be an eval statement in props instead of an extract

0 Karma

Contributor

Hi Mary,

I tried applying the EXTRACT directly on _raw but results were quite different. It ended up matching few other strings in the RAW text before it could reach the "threat" part.
e.g.

cat={"appCerts": null, "id": "e3f4c2a10a24", "origin": null, "endpoint_type": "server", "name": "Manual cleanup required: 'Mal 

cat={"expiration_date": "02 

cat={"customer_name": "xxxxxxx", "threat": "ML 

Thanks,

~ Abhi

0 Karma

SplunkTrust
SplunkTrust

How is field threat being extracted?
Can you try using calculated fields?

Contributor

Hi,

I could not find any extractions in props/transforms that are specific to "threat" field. Could it be possible that Splunk is applying some default key:value settings on thw raw data to extract this field. It is definitely part of the RAW event.

    {   [-] 
   appCerts: null   
   appSha256: null  
   core_remedy_items: null  
   created_at: 2019-01-15T23:04:44.229Z 
   customer_id: 9beb-578eabb1179c   
   customer_name: 
   endpoint_id: 7dac66fc3eaf    
   endpoint_type: server    
   expiration_date: 02/27/2019  
   group: MALWARE   
   id: e3f4c2a10a24 
   location:    
   name: Manual cleanup required: 'Mal/Generic-S' at 'xxxxxxxx' 
   origin: null 
   severity: high   
   source: n/a  
   source_info: {   [+] 
   }    
   threat: Mal/Generic-S    
   type: Event::Endpoint::Threat::CleanupFailed 
   user_id: null    
   when: 2019-01-15T23:04:39.000Z   
}   

As raw text

{"appCerts": null, "id": "e3f4c2a10a24", "origin": null, "endpoint_type": "server", "name": "Manual cleanup required: 'Mal/Generic-S' at 'xxxxx'", "created_at": "2019-01-15T23:04:44.229Z", "location": "", "core_remedy_items": null, "source": "n/a", "group": "MALWARE", "endpoint_id": "7dac66fc3eaf", "source_info": {"ip": "x.x.x.x"}, "user_id": null, "customer_id": "578eabb1179c", "type": "Event::Endpoint::Threat::CleanupFailed", "appSha256": null, "customer_name": "", "when": "2019-01-15T23:04:39.000Z", "expiration_date": "02/27/2019", "severity": "high", "threat": "Mal/Generic-S"}

Tried using EVAL with split but it adds both values(before and after delimiter) to the field. Not sure how to just keep the first part.

e.g. If I do | eval cat= split(threat,"/") on threat=Mal/Generic-S then i'll end up with cat=Mal & cat=Generic-S

Thanks,

~ Abhi

0 Karma

SplunkTrust
SplunkTrust

Are you using SplunkTAsophos add-on? we had issues with some extracts, so had to create custom extract/apps

can you try?

EXTRACT-cat = ^{.*\"threat":\s+?\"(?\w+)\/

0 Karma

Contributor

Hi Lakshman,

Tried using that EXTRACT (Under props.conf) but it did not create any new field.

Thanks,

~ Abhi

0 Karma