Splunk Search

how to remove spaces in xml data after field extraction

michaelrosello
Path Finder

I am trying to make a field extraction from xml data and but I am having a problem with special ascii characters being captures as well.

what is the best approach to exclude this characters?

alt text

This is what the data looks like and when I try to capture "Process Name" using regex below.

index=main | rex field=_raw "Process Name\:(?<process_name>.+)Permissions\W

this is the result I am getting for process_name field

\tC:\Windows\System32\svchost.exe\n\n

Tags (3)
0 Karma

koshyk
Super Champion

Looking into the data, it seems it is Windows Eventlog? Then there is already ready-made Official addon (Splunk_TA_windows) to do this and extract fields. Just install them into your Search Heads & indexers & HF
The Logic from the TA is

[system_props_xml_kv]
# Extracts anything in the form of <tag>value</tag> as tag::value
SOURCE_KEY = System_Props_Xml
REGEX = (?ms)<(\w*)>([^<]*)<\/\1>
FORMAT = $1::$2
MV_ADD = 1

[system_props_xml_attributes]
# Extracts values from following fields:
# Provider: Name, Guid
# TimeCreated: SystemTime, RawTime
# Correlation: ActivityID, RelativeActivityID
# Execution: ProcessID, ThreadID, ProcessorID, SessionID, KernelTime, UserTime, ProcessorTime
# Security: UserID
SOURCE_KEY = System_Props_Xml
REGEX = (?ms)([^\s=]+)\s*=\s*(\'[^<\']*\'|"[^<"]*")
FORMAT = $1::$2
MV_ADD = 1

But if you want to do it manually, the regex should be:

 index=main |  rex field=_raw "Process Name:\s*(?<process_name>.+)"
0 Karma

niketn
Legend

@michaelrosello if you don't need spaces (white-spaces) in xml data, best approach would be to removed while indexing the data using SEDCMD : https://docs.splunk.com/Documentation/Splunk/latest/Data/Anonymizedata

1) Search time you would not need extra formatting to remove spaces from data
2) You will index only required data.

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma

DavidHourani
Super Champion

Hi @michaelrosello,

You can go for something like this :

index=main | rex field=_raw "Process Name\:[\s]+(?<process_name>.+)\n\nPermissions\W

The right way to go about extracting fields from XML at search time is using the spath command :
https://docs.splunk.com/Documentation/Splunk/7.2.6/SearchReference/spath

In your case since it's windows logs I wouldn't bother with the extractions, just grab the windows TA from splunkbase because all the field extractions are already done there :
https://splunkbase.splunk.com/app/742/

Cheers,
David

0 Karma
Get Updates on the Splunk Community!

What You Read The Most: Splunk Lantern’s Most Popular Articles!

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...

See your relevant APM services, dashboards, and alerts in one place with the updated ...

As a Splunk Observability user, you have a lot of data you have to manage, prioritize, and troubleshoot on a ...

Index This | What goes away as soon as you talk about it?

May 2025 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with this month’s ...