Splunk Search

how to remove spaces in xml data after field extraction

michaelrosello
Path Finder

I am trying to make a field extraction from xml data and but I am having a problem with special ascii characters being captures as well.

what is the best approach to exclude this characters?

alt text

This is what the data looks like and when I try to capture "Process Name" using regex below.

index=main | rex field=_raw "Process Name\:(?<process_name>.+)Permissions\W

this is the result I am getting for process_name field

\tC:\Windows\System32\svchost.exe\n\n

Tags (3)
0 Karma

koshyk
Super Champion

Looking into the data, it seems it is Windows Eventlog? Then there is already ready-made Official addon (Splunk_TA_windows) to do this and extract fields. Just install them into your Search Heads & indexers & HF
The Logic from the TA is

[system_props_xml_kv]
# Extracts anything in the form of <tag>value</tag> as tag::value
SOURCE_KEY = System_Props_Xml
REGEX = (?ms)<(\w*)>([^<]*)<\/\1>
FORMAT = $1::$2
MV_ADD = 1

[system_props_xml_attributes]
# Extracts values from following fields:
# Provider: Name, Guid
# TimeCreated: SystemTime, RawTime
# Correlation: ActivityID, RelativeActivityID
# Execution: ProcessID, ThreadID, ProcessorID, SessionID, KernelTime, UserTime, ProcessorTime
# Security: UserID
SOURCE_KEY = System_Props_Xml
REGEX = (?ms)([^\s=]+)\s*=\s*(\'[^<\']*\'|"[^<"]*")
FORMAT = $1::$2
MV_ADD = 1

But if you want to do it manually, the regex should be:

 index=main |  rex field=_raw "Process Name:\s*(?<process_name>.+)"
0 Karma

niketn
Legend

@michaelrosello if you don't need spaces (white-spaces) in xml data, best approach would be to removed while indexing the data using SEDCMD : https://docs.splunk.com/Documentation/Splunk/latest/Data/Anonymizedata

1) Search time you would not need extra formatting to remove spaces from data
2) You will index only required data.

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma

DavidHourani
Super Champion

Hi @michaelrosello,

You can go for something like this :

index=main | rex field=_raw "Process Name\:[\s]+(?<process_name>.+)\n\nPermissions\W

The right way to go about extracting fields from XML at search time is using the spath command :
https://docs.splunk.com/Documentation/Splunk/7.2.6/SearchReference/spath

In your case since it's windows logs I wouldn't bother with the extractions, just grab the windows TA from splunkbase because all the field extractions are already done there :
https://splunkbase.splunk.com/app/742/

Cheers,
David

0 Karma
Get Updates on the Splunk Community!

Webinar Recap | Revolutionizing IT Operations: The Transformative Power of AI and ML ...

The Transformative Power of AI and ML in Enhancing Observability   In the realm of IT operations, the ...

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...