Hi Community,
Trying to build regex that can help me reduce the size of an EventCode in my case this is 4627
The idea is to use props and transforms:
props.conf
[XmlWinEventLog:Security]
TRANSFORMS-reduce_raw = reduce_event_raw
transforms.conf
[reduce_event_raw]
REGEX = <Event[^>]*>.*?<System>.*?<Provider\s+Name='(?<ProviderName>[^']*)'\s+Guid='(?<ProviderGuid>[^']*)'.*?<EventID>(?<EventID>\d+)</EventID>.*?<Version>(?<Version>\d+)</Version>.*?<Level>(?<Level>\d+)</Level>.*?<Task>(?<Task>\d+)</Task>.*?<Opcode>(?<Opcode>\d+)</Opcode>.*?<Keywords>(?<Keywords>[^<]*)</Keywords>.*?<TimeCreated\s+SystemTime='(?<SystemTime>[^']*)'.*?<EventRecordID>(?<EventRecordID>\d+)</EventRecordID>.*?<Correlation\s+ActivityID='(?<ActivityID>[^']*)'.*?<Execution\s+ProcessID='(?<ProcessID>\d+)'\s+ThreadID='(?<ThreadID>\d+)'.*?<Channel>(?<Channel>[^<]*)</Channel>.*?<Computer>(?<Computer>[^<]*)</Computer>.*?<EventData>.*?<Data\s+Name='SubjectUserSid'>(?<SubjectUserSid>[^<]*)</Data>.*?<Data\s+Name='SubjectUserName'>(?<SubjectUserName>[^<]*)</Data>.*?<Data\s+Name='SubjectDomainName'>(?<SubjectDomainName>[^<]*)</Data>.*?<Data\s+Name='SubjectLogonId'>(?<SubjectLogonId>[^<]*)</Data>.*?<Data\s+Name='TargetUserSid'>(?<TargetUserSid>[^<]*)</Data>.*?<Data\s+Name='TargetUserName'>(?<TargetUserName>[^<]*)</Data>.*?<Data\s+Name='TargetDomainName'>(?<TargetDomainName>[^<]*)</Data>.*?<Data\s+Name='TargetLogonId'>(?<TargetLogonId>[^<]*)</Data>.*?<Data\s+Name='LogonType'>(?<LogonType>[^<]*)</Data>.*?<Data\s+Name='EventIdx'>(?<EventIdx>[^<]*)</Data>.*?<Data\s+Name='EventCountTotal'>(?<EventCountTotal>[^<]*)</Data>.*?<Data\s+Name='GroupMembership'>(?<GroupMembership>.*?)</Data>.*?</EventData>.*?</Event>
FORMAT = ProviderName::$1 ProviderGuid::$2 EventID::$3 Version::$4 Level::$5 Task::$6 Opcode::$7 Keywords::$8 SystemTime::$9 EventRecordID::$10 ActivityID::$11 ProcessID::$12 ThreadID::$13 Channel::$14 Computer::$15 SubjectUserSid::$16 SubjectUserName::$17 SubjectDomainName::$18 SubjectLogonId::$19 TargetUserSid::$20 TargetUserName::$21 TargetDomainName::$22 TargetLogonId::$23 LogonType::$24 EventIdx::$25 EventCountTotal::$26 GroupMembership::$27
DEST_KEY = _raw
Then I will be able to pick which bits from the raw data to be indexed
It looks like the regex would not pick up on fields correctly
There is the raw event:
<Event xmlns='http://schemas.microsoft.com/win/2004/08/events/event'><System><Provider Name='Microsoft-Windows-Security-Auditing' Guid='{54849625-5478-4994-a5ba-3e3bxxxxxx}'/><EventID>4627</EventID><Version>0</Version><Level>0</Level><Task>12554</Task><Opcode>0</Opcode><Keywords>0x8020000000000000</Keywords><TimeCreated SystemTime='2024-11-27T11:27:45.6695363Z'/><EventRecordID>2177113</EventRecordID><Correlation ActivityID='{01491b93-40a4-0002-6926-4901a440db01}'/><Execution ProcessID='1196' ThreadID='1312'/><Channel>Security</Channel><Computer>Computer1</Computer><Security/></System><EventData><Data Name='SubjectUserSid'>S-1-5-18</Data><Data Name='SubjectUserName'>CXXXXXX</Data><Data Name='SubjectDomainName'>CXXXXXXXX</Data><Data Name='SubjectLogonId'>0x3e7</Data><Data Name='TargetUserSid'>S-1-5-18</Data><Data Name='TargetUserName'>SYSTEM</Data><Data Name='TargetDomainName'>NT AUTHORITY</Data><Data Name='TargetLogonId'>0x3e7</Data><Data Name='LogonType'>5</Data><Data Name='EventIdx'>1</Data><Data Name='EventCountTotal'>1</Data><Data Name='GroupMembership'>
%{S-1-5-32-544}
%{S-1-1-0}
%{S-1-5-11}
%{S-1-16-16384}</Data></EventData></Event
Any help t-shoot the problem will be highly valued.
Thank you in advance!
Ouch.
1. If you're using numbered capture groups you don't have to name them. (I'm not even sure if index-time extractions support named capture groups).
2. Assuming your regex was right you'd get a key::value pairs in your raw event. Are you sure that's what you want? Also, this will cause "interesting" side effects since that data would get split into terms at major breakers and would get indexed as indexed fields.
3. Manipulating structured data with regexes is asking for trouble. You have no guarantee that the fields will always be in the same order (and they might not always contain full data). That's why you use structured data format.
Hi @PickleRick,
Thank you for your valuable feedback.
Index-Time Extractions: You're right that named capture groups might not be supported at index time. I'll modify my configurations to use numbered capture groups to ensure they function correctly.
Rewriting _raw: I appreciate you highlighting the potential issues with rewriting _raw to contain key-value pairs. My intention was to reduce the size of the events by removing unnecessary data, but I see how this could lead to unintended side effects during indexing. I'll reconsider this approach.
Structured Data Parsing: Your point about the risks of using regex to parse XML is well-taken. Given that XML fields may vary in order and presence, relying on regex could indeed cause problems. Utilizing Splunk's structured data parsing capabilities seems like a better solution.
Next steps:
To achieve my goal of reducing the indexed data volume for EventID=4627 events, I'd like to leverage Splunk's XML parsing features. Specifically, I'm thinking of using INDEXED_EXTRACTIONS = xml and configuring EXCLUDE rules in props.conf to omit the unwanted fields at index time.
Example Configuration BEFORE:
[reduce_event_raw]
REGEX = (?ms)<Event[^>]*>.*?<System>.*?<EventID>4627<\/EventID>.*?<Computer>(?<Computer>[^<]*)<\/Computer>.*?<Data\s+Name='SubjectUserName'>(?<SubjectUserName>[^<]*)<\/Data>.*?<Data\s+Name='TargetUserName'>(?<TargetUserName>[^<]*)<\/Data>.*?<Data\s+Name='LogonType'>(?<LogonType>[^<]*)<\/Data>
FORMAT = Computer::$1 SubjectUserName::$2 TargetUserName::$3 LogonType::$4
DEST_KEY = _raw
Example Configuration AFTER:
[XmlWinEventLog:Security]
INDEXED_EXTRACTIONS = xml
KV_MODE = none
EXCLUDE = (?i)(SubjectUserSid|SubjectDomainName|SubjectLogonId|TargetUserSid|TargetDomainName|TargetLogonId|EventIdx|EventCountTotal|GroupMembership)
Do you think this approach would effectively remove the unnecessary fields before indexing while maintaining reliable field extraction for the essential data? If you have any suggestions or best practices for this method, I'd greatly appreciate your guidance.
Regards,
Dan
Try something like this
(?ms)<Event[^>]*>.*?<System>.*?<Provider\s+Name='(?<ProviderName>[^']*)'\s+Guid='(?<ProviderGuid>[^']*)'.*?<EventID>(?<EventID>\d+)<\/EventID>.*?<Version>(?<Version>\d+)<\/Version>.*?<Level>(?<Level>\d+)<\/Level>.*?<Task>(?<Task>\d+)<\/Task>.*?<Opcode>(?<Opcode>\d+)<\/Opcode>.*?<Keywords>(?<Keywords>[^<]*)<\/Keywords>.*?<TimeCreated\s+SystemTime='(?<SystemTime>[^']*)'.*?<EventRecordID>(?<EventRecordID>\d+)<\/EventRecordID>.*?<Correlation\s+ActivityID='(?<ActivityID>[^']*)'.*?<Execution\s+ProcessID='(?<ProcessID>\d+)'\s+ThreadID='(?<ThreadID>\d+)'.*?<Channel>(?<Channel>[^<]*)<\/Channel>.*?<Computer>(?<Computer>[^<]*)<\/Computer>.*?<EventData>.*?<Data\s+Name='SubjectUserSid'>(?<SubjectUserSid>[^<]*)<\/Data>.*?<Data\s+Name='SubjectUserName'>(?<SubjectUserName>[^<]*)<\/Data>.*?<Data\s+Name='SubjectDomainName'>(?<SubjectDomainName>[^<]*)<\/Data>.*?<Data\s+Name='SubjectLogonId'>(?<SubjectLogonId>[^<]*)<\/Data>.*?<Data\s+Name='TargetUserSid'>(?<TargetUserSid>[^<]*)<\/Data>.*?<Data\s+Name='TargetUserName'>(?<TargetUserName>[^<]*)<\/Data>.*?<Data\s+Name='TargetDomainName'>(?<TargetDomainName>[^<]*)<\/Data>.*?<Data\s+Name='TargetLogonId'>(?<TargetLogonId>[^<]*)<\/Data>.*?<Data\s+Name='LogonType'>(?<LogonType>[^<]*)<\/Data>.*?<Data\s+Name='EventIdx'>(?<EventIdx>[^<]*)<\/Data>.*?<Data\s+Name='EventCountTotal'>(?<EventCountTotal>[^<]*)<\/Data>.*?<Data\s+Name='GroupMembership'>(?<GroupMembership>.*?)<\/Data>.*?<\/EventData>.*?<\/Event>
https://regex101.com/r/19eJtB/1
Hi @ITWhisperer,
Thank you for your feedback.
The Regex works, but according to @PickleRick I will need to adjust my approach.
Kind regards,
Dan