I'm trying to get a new sourcetype (NetApp user-level audit logs, exported as XML) to work, and I think my fields.conf tokenizer is breaking things. But I'm not really sure how, or why, or what to do...
See more...
I'm trying to get a new sourcetype (NetApp user-level audit logs, exported as XML) to work, and I think my fields.conf tokenizer is breaking things. But I'm not really sure how, or why, or what to do about it. The raw data is XML, but I'm not using KV_MODE=xml because that doesn't properly handle all the attributes. So, I've got a bunch of custom regular expressions, the true backbone of all enterprise software. Here's a single sample event (but you can probably disregard most of it, it's just here for completeness): <Event><System><Provider Name="NetApp-Security-Auditing" Guid="{guid-edited}"/><EventID>4656</EventID><EventName>Open Object</EventName><Version>101.3</Version><Source>CIFS</Source><Level>0</Level><Opcode>0</Opcode><Keywords>0x8020000000000000</Keywords><Result>Audit Success</Result><TimeCreated SystemTime="2022-01-12T15:42:41.096809000Z"/><Correlation/><Channel>Security</Channel><Computer>server-name-edited</Computer><ComputerUUID>guid-edited</ComputerUUID><Security/></System><EventData><Data Name="SubjectIP" IPVersion="4">1.2.3.4</Data><Data Name="SubjectUnix" Uid="1234" Gid="1234" Local="false"></Data><Data Name="SubjectUserSid">S-1-5-21-3579272529-1234567890-2280984729-123456</Data><Data Name="SubjectUserIsLocal">false</Data><Data Name="SubjectDomainName">ACCOUNTS</Data><Data Name="SubjectUserName">davidsmith</Data><Data Name="ObjectServer">Security</Data><Data Name="ObjectType">Directory</Data><Data Name="HandleID">00000000000444;00;002a62a7;0d3d88a4</Data><Data Name="ObjectName">(Shares);/LogTestActivity/dsmith/wordpress-shared/plugins-shared</Data><Data Name="AccessList">%%4416 %%4423 </Data><Data Name="AccessMask">81</Data><Data Name="DesiredAccess">Read Data; List Directory; Read Attributes; </Data><Data Name="Attributes">Open a directory; </Data></EventData></Event> My custom app's props.conf has a couple dozen lines like this, for each element I want to be able to search on: EXTRACT-DesiredAccess = <Data Name="DesiredAccess">(?<DesiredAccess>.*?)<\/Data> EXTRACT-HandleID = <Data Name="HandleID">(?<HandleID>.*?)<\/Data> EXTRACT-InformationRequested = <Data Name="InformationRequested">(?<InformationRequested>.*?)<\/Data> This works as you'd expect, except for a couple of fields where they're composites. This is most noticeable in the DesiredAccess element, which in our example looks like: <Data Name="DesiredAccess">Read Data; List Directory; Read Attributes; </Data> Thus you get a single field with "Read Data; List Directory; Read Attributes; " and if you only need to look for, say, "List Directory," you have to get clever with your searches. So, I added a fields.conf file with this in it: [DesiredAccess] TOKENIZER = \s?(.*?); When I paste the 'raw' contents of that field, and that regex, into a tool like regex101.com, it works and returns the expected results. Similarly, it also works if I remove it from fields.conf, and put it in as a makemv command: index=nonprod_pe | makemv tokenizer="\s?(.*?);" DesiredAccess With the TOKENIZER element in fields.conf, the DesiredAccess attribute just doesn't populate, period. So I assume it's the problem. (Since this is in an app, the app's metadata does contain explicit "export = system" lines for both [props] and [fields]. And the app is on indexers and search heads. Probably doesn't need to be in both places, but hey I'm still learning...) So, what am I doing wrong with my fields.conf tokenizer, that's caused it to fail completely to identify any elements?