Getting Data In

REGEX works in search but not as a field extraction.

att35
Builder

Hi,

I am trying to extract a value from one of the existing fields. REGEX works fine when used with "rex" directly on the search string but the same regex fails when tried from configuration files. Here's an example

| rex field=threat "^(?<cat>.+?)/" 

alt text

But When used via props / transforms, this doesn't work. Tried several methods.

  1. using EXTRACT in props.conf
    alt text

  2. Using REPORT in props with a corresponding transforms.conf stanza
    REPORT-category_for_sophos_central = category_for_sophos_central

  3. Using TRANSFORMS in props with a corresponding transforms.conf stanza
    TRANSFORMS-category_for_sophos_central = category_for_sophos_central

For transforms.conf, tried two methods to use the REGEX but neither combination works.

[category_for_sophos_central]
SOURCE_KEY= threat
REGEX = "^(?<cat>.+?)/"
FORMAT = cat::$1

[category_for_sophos_central]
REGEX = threat= "^(?<cat>.+?)/"
FORMAT = cat::$1

Cant figure out what am I missing here. As per the documentation, a simple EXTRACT should have worked because "threat" is not a search-time extracted field.

Thanks in advance,

~ Abhi

0 Karma

FrankVl
Ultra Champion

If the extraction of the threat field relies on auto key value extractions, I guess explicit extractions from that field won't work?

So you either need to write an explicit extraction for the cat field that works on _raw instead of threat, or use a calculated field.

The earlier suggestion should work, but is missing the name of the capturing group (probably disappeared because it wasn't posted as code and then the splunk answers board software strips out stuff between <>.

Try this: EXTRACT-cat = \"threat":\s+\"(?<cat>\w+)\/
See also: https://regex101.com/r/Ki6GpX/2

0 Karma

lakshman239
Influencer

you could have _raw in source_key and have the appropriate regex OR did you try with \/?

REGEX = "^(?.+?)\/"

0 Karma

harsmarvania57
Ultra Champion

In addition to this I suspect that FORMAT = is require in your case because based on Documentation if you are extracting fields in REGEX only then no need to define FORMAT =

* REGEX and the FORMAT setting:
  * Name-capturing groups in the REGEX are extracted directly to fields.
    This means that you do not need to specify the FORMAT setting for
    simple field extraction cases (see the description of FORMAT, below).
  * If the REGEX extracts both the field name and its corresponding field
    value, you can use the following special capturing groups if you want to
    skip specifying the mapping in FORMAT:
      _KEY_<string>, _VAL_<string>.
  * For example, the following are equivalent:
    * Using FORMAT:
      * REGEX  = ([a-z]+)=([a-z]+)
      * FORMAT = $1::$2
    * Without using FORMAT
      * REGEX  = (?<_KEY_1>[a-z]+)=(?<_VAL_1>[a-z]+)
0 Karma

marycordova
SplunkTrust
SplunkTrust

this may have to do with the order of precedence:
https://docs.splunk.com/Documentation/Splunk/7.2.3/Knowledge/Searchtimeoperationssequence#Search-tim...

as commented, we would need to know how threat is being derived

you could also write your regex to extract from _raw, and just omit the in threat part of the props config

a calculated field would be an eval statement in props instead of an extract

@marycordova
0 Karma

att35
Builder

Hi Mary,

I tried applying the EXTRACT directly on _raw but results were quite different. It ended up matching few other strings in the RAW text before it could reach the "threat" part.
e.g.

cat={"appCerts": null, "id": "e3f4c2a10a24", "origin": null, "endpoint_type": "server", "name": "Manual cleanup required: 'Mal 

cat={"expiration_date": "02 

cat={"customer_name": "xxxxxxx", "threat": "ML 

Thanks,

~ Abhi

0 Karma

somesoni2
Revered Legend

How is field threat being extracted?
Can you try using calculated fields?

att35
Builder

Hi,

I could not find any extractions in props/transforms that are specific to "threat" field. Could it be possible that Splunk is applying some default key:value settings on thw raw data to extract this field. It is definitely part of the RAW event.

    {   [-] 
   appCerts: null   
   appSha256: null  
   core_remedy_items: null  
   created_at: 2019-01-15T23:04:44.229Z 
   customer_id: 9beb-578eabb1179c   
   customer_name: 
   endpoint_id: 7dac66fc3eaf    
   endpoint_type: server    
   expiration_date: 02/27/2019  
   group: MALWARE   
   id: e3f4c2a10a24 
   location:    
   name: Manual cleanup required: 'Mal/Generic-S' at 'xxxxxxxx' 
   origin: null 
   severity: high   
   source: n/a  
   source_info: {   [+] 
   }    
   threat: Mal/Generic-S    
   type: Event::Endpoint::Threat::CleanupFailed 
   user_id: null    
   when: 2019-01-15T23:04:39.000Z   
}   

As raw text

{"appCerts": null, "id": "e3f4c2a10a24", "origin": null, "endpoint_type": "server", "name": "Manual cleanup required: 'Mal/Generic-S' at 'xxxxx'", "created_at": "2019-01-15T23:04:44.229Z", "location": "", "core_remedy_items": null, "source": "n/a", "group": "MALWARE", "endpoint_id": "7dac66fc3eaf", "source_info": {"ip": "x.x.x.x"}, "user_id": null, "customer_id": "578eabb1179c", "type": "Event::Endpoint::Threat::CleanupFailed", "appSha256": null, "customer_name": "", "when": "2019-01-15T23:04:39.000Z", "expiration_date": "02/27/2019", "severity": "high", "threat": "Mal/Generic-S"}

Tried using EVAL with split but it adds both values(before and after delimiter) to the field. Not sure how to just keep the first part.

e.g. If I do | eval cat= split(threat,"/") on threat=Mal/Generic-S then i'll end up with cat=Mal & cat=Generic-S

Thanks,

~ Abhi

0 Karma

lakshman239
Influencer

Are you using Splunk_TA_sophos add-on? we had issues with some extracts, so had to create custom extract/apps

can you try?

EXTRACT-cat = ^{.*\"threat":\s+?\"(?\w+)\/

0 Karma

att35
Builder

Hi Lakshman,

Tried using that EXTRACT (Under props.conf) but it did not create any new field.

Thanks,

~ Abhi

0 Karma
Get Updates on the Splunk Community!

How to Monitor Google Kubernetes Engine (GKE)

We’ve looked at how to integrate Kubernetes environments with Splunk Observability Cloud, but what about ...

Index This | How can you make 45 using only 4?

October 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with this ...

Splunk Education Goes to Washington | Splunk GovSummit 2024

If you’re in the Washington, D.C. area, this is your opportunity to take your career and Splunk skills to the ...