Splunk Search

how to remove spaces in xml data after field extraction

michaelrosello
Path Finder

I am trying to make a field extraction from xml data and but I am having a problem with special ascii characters being captures as well.

what is the best approach to exclude this characters?

alt text

This is what the data looks like and when I try to capture "Process Name" using regex below.

index=main | rex field=_raw "Process Name\:(?<process_name>.+)Permissions\W

this is the result I am getting for process_name field

\tC:\Windows\System32\svchost.exe\n\n

Tags (3)
0 Karma

koshyk
Super Champion

Looking into the data, it seems it is Windows Eventlog? Then there is already ready-made Official addon (Splunk_TA_windows) to do this and extract fields. Just install them into your Search Heads & indexers & HF
The Logic from the TA is

[system_props_xml_kv]
# Extracts anything in the form of <tag>value</tag> as tag::value
SOURCE_KEY = System_Props_Xml
REGEX = (?ms)<(\w*)>([^<]*)<\/\1>
FORMAT = $1::$2
MV_ADD = 1

[system_props_xml_attributes]
# Extracts values from following fields:
# Provider: Name, Guid
# TimeCreated: SystemTime, RawTime
# Correlation: ActivityID, RelativeActivityID
# Execution: ProcessID, ThreadID, ProcessorID, SessionID, KernelTime, UserTime, ProcessorTime
# Security: UserID
SOURCE_KEY = System_Props_Xml
REGEX = (?ms)([^\s=]+)\s*=\s*(\'[^<\']*\'|"[^<"]*")
FORMAT = $1::$2
MV_ADD = 1

But if you want to do it manually, the regex should be:

 index=main |  rex field=_raw "Process Name:\s*(?<process_name>.+)"
0 Karma

niketn
Legend

@michaelrosello if you don't need spaces (white-spaces) in xml data, best approach would be to removed while indexing the data using SEDCMD : https://docs.splunk.com/Documentation/Splunk/latest/Data/Anonymizedata

1) Search time you would not need extra formatting to remove spaces from data
2) You will index only required data.

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma

DavidHourani
Super Champion

Hi @michaelrosello,

You can go for something like this :

index=main | rex field=_raw "Process Name\:[\s]+(?<process_name>.+)\n\nPermissions\W

The right way to go about extracting fields from XML at search time is using the spath command :
https://docs.splunk.com/Documentation/Splunk/7.2.6/SearchReference/spath

In your case since it's windows logs I wouldn't bother with the extractions, just grab the windows TA from splunkbase because all the field extractions are already done there :
https://splunkbase.splunk.com/app/742/

Cheers,
David

0 Karma
Get Updates on the Splunk Community!

Now Available: Cisco Talos Threat Intelligence Integrations for Splunk Security Cloud ...

At .conf24, we shared that we were in the process of integrating Cisco Talos threat intelligence into Splunk ...

Preparing your Splunk Environment for OpenSSL3

The Splunk platform will transition to OpenSSL version 3 in a future release. Actions are required to prepare ...

Easily Improve Agent Saturation with the Splunk Add-on for OpenTelemetry Collector

Agent Saturation What and Whys In application performance monitoring, saturation is defined as the total load ...