Hi community,
I need help identifying where I got wrong.
The following is my testing SPL:
| makeresults
| fields - _time
| eval _raw="<Event xmlns='http://schemas.microsoft.com/win/2004/08/events/event'><System><Provider Name='Microsoft-Windows-Security-Auditing' Guid='{54849625-5478-4994-a5ba-3xxxxxxxxx}'/><EventID>4662</EventID><Version>0</Version><Level>0</Level><Task>12804</Task><Opcode>0</Opcode><Keywords>0x8020000000000000</Keywords><TimeCreated SystemTime='2020-09-01T07:00:18.999999800Z'/><EventRecordID>35xxxx65</EventRecordID><Correlation ActivityID='{5xxxxxxxx-b61d-0004-afc0-ac531db6d901}'/><Execution ProcessID='1520' ThreadID='1628'/><Channel>Security</Channel><Computer>XXXXXXXXXXXXXX.riv</Computer><Security/></System><EventData><Data Name='SubjectUserSid'>NT AUTHORITY\SYSTEM</Data><Data Name='SubjectUserName'>XXXXXXX$</Data><Data Name='SubjectDomainName'>XXXXXXXX</Data><Data Name='SubjectLogonId'>0x3e7</Data><Data Name='ObjectServer'>WMI</Data><Data Name='ObjectType'>WMI Namespace</Data><Data Name='ObjectName'>ROOT\CIMV2\Security\MicrosoftTpm</Data><Data Name='OperationType'>Object Access</Data><Data Name='HandleId'>0x0</Data><Data Name='AccessList'>%%1552 %%1553 </Data><Data Name='AccessMask'>0x3</Data><Data Name='Properties'>-</Data><Data Name='AdditionalInfo'>Local Execute (ExecMethod)</Data><Data Name='AdditionalInfo2'>ROOT\CIMV2\Security\MicrosoftTpm:Win32_Tpm=@::GetOwnerAuthForEscrow</Data></EventData></Event>"
| rex mode=sed "s/.*(?<eventId><EventID>4662<\/EventID>).*(?<userName><[Data Name='SubjectUserName']>*.*<\/Data>).*/\1\2/g"
The result differs from what I want. I need data for the SubjectUserName, not the AditionalInfo2 data
Can anyone help me with this, please?
Thank you!
Why not just use spath or xpath? Manipulating structured data with just regexes might not give best results. Especially if at some points the fields got reordered (still retaining the logical structure).
Thaks for the reply @PickleRick
Can those work at SEDCMD level? I would like to reduce logs without damaging the logic.
Regards,
Dan
No. They are search-time commands. In fact, manipulating structured data (like XML or json) on ingest is not Splunk's stong suit and if you have the possibility I'd advise you do this in other tool before sending to Splunk.
Having said that - I'm always cautious when I hear that someone wants to cut some data from the event "because noone needs that and it only consumes license". I can understand data privacy concerns and data masking - that's of course a legitimate use case. But chopping half of an event... that's tricky because it can break stuff and can leave you in a situation when you think you have some data but actually you don't.
I'd rather go for verifying which events as a whole I need and which I don't.
Thanks for the @PickleRick
I totally get your point and cutting/trimming events would be something Splunk would not appreciate.
I am a firm believer that using SEDCMD is the best way to conduct control over the forwarded logs during the parsing phase of data streams before they hit the indexers.
I agree with you that only and only after a careful study of what and how to approach your logging is the right approach and of course in agreement with your peers.
Regards,
Dan
Remember that most of what happens in Splunk's internals happens during search time so it's not that easy that you affect the parsing only "before the data hits indexers". That's the problem. And much can depend on how the data is extracted in the end.
For example, let's take an event from my home firewall logs:
firewall,info DEFAULT_OUTGOING accept forward: in:bridge1 out:pppoe-out1, src-mac 52:54:00:a3:cc:92, proto TCP (ACK), 172.16.0.6:51372->6.211.152.94:22, NAT (172.16.0.3:51372->89.15.25.178:51372)->6.211.152.94:22, len 40
Let's assume I don't care for the mac address and want to cut it out of the event.
If the fields were defined as several small extractions, each anchored just to the preceding field name tag, like
in:(?<in_interface>\S+)out:(?<out_interface>\S+)
[...]
extractions would still be working. But if someone decided to write extractions as one big regex with several capture groups,
firewall,info\s(?<matching_rule>.*)\s(?<action>accept|block)\sforward:\sin:(?<in_interface>\S+)\sout:(?<out_interface>\S+),\ssrc-macs(?<src_mac>[\d:]+),\sproto\s(?<proto>\S+)[...]
because the event is always supposed to be in that format, well... it stops working when you chop away part of the event from the middle.
So it's not always as easy as it seems.
Hello @PickleRick thanks for the feedback.
As I am extensively involved in logs analysis (unfortunately not very skilled in regex ATM but getting there :-)) I have a different view on this subject.
Please follow one of my threads: Solved: Re: SEDCMD log filtering regex needed - Splunk Community
There is an example where I with my colleagues have managed to be very granular and selective while applying SEDCMD
Most of the data changes do happen during search time agree. However, a "small" but extremely important part happens during parsing time where we can apply full control of what flows around well before the indexing time.
More than happy to continue investigating together as although I accumulated knowledge around data filtering while being supported by SPLUNK consultants, I am sure I might be missing something important I want to understand.
Thank you for taking time to discuss!
I always say - if you can make strong assumptions about the format of your data, you can try to manipulate structured data with simple text operations (like cutting out a field) but usually you still can hit a border case in which your regex will fail (I love nested strings/comments/escaped symbols - they can be so nasty).
The remark about cutting data away was a bit meant as a side note. Of course you can do that. And I understand that sometimes you might want to do that. But sometimes there are reasons why you should not do that. That's all. 🙂