Hi,
I have a request from a client to index the .aud files generated by Oracle. I have been searching Splunk Answers but for all questions I read the answers point out to syslog audit. The client specified the .aud files so I cannot use Oracle Audit Trail app. I have also tried to manually extract the fields to no avail, main reason being the files have a header that messes up with the interpretation of the fields, plus not all field values come in the same format, for instance, LENGTH values come in double quotes and single quotes, which confuses Splunk. I found a partial solution in this question:
Problem is, apart from the header of the file, it seems there are two audit event formats:
Format 1 - line breaks after event's values
Mon Jun 30 17:55:59 2014 -03:00
LENGTH : '152'
ACTION :[7] 'CONNECT'
DATABASE USER:[1] '/'
PRIVILEGE :[6] 'SYSDBA'
CLIENT USER:[6] 'oracle'
CLIENT TERMINAL:[0] ''
STATUS:[1] '0'
DBID:[8] '33405459'
Format 2 - single line for all events
Sun Jun 29 09:51:00 2014 -03:00
LENGTH: "369"
SESSIONID:[5] "61298" ENTRYID:[1] "1" STATEMENT:[1] "1" USERID:[23] "EDITED TO PRESERVE PRIVACY" USERHOST:[19] "EDITED TO PRESERVE PRIVACY" TERMINAL:[7] "SERVINT" ACTION:[3] "100" RETURNCODE:[1] "0" COMMENT$TEXT:[97] "Authenticated by: DATABASE; Client address: (ADDRESS=(PROTOCOL=tcp)(HOST=X.X.X.X)(PORT=64423))" OS$USERID:[7] "iadabas" DBID:[8] "33405459" PRIV$USED:[1] "5"
As I am not an regex expert, the format #2 is giving me a headache. How can I get rid of the header on each file? And how can I extract the the key/value pairs from format #2? And finally how should I deal with LENGTH values that are enclosed in single or double quotes without creating one regex for each?
I see a few problems with the input and will offer suggestions for each of them.
Hope this helps,
chanfoli
The two formats are two different Oracle audits:
Format 1
Administrative SYSDBA audit - always in OS when *Nix, always in Event Viewer when *Win.
When initialization parameter audit_sys_operations=TRUE you will get all SYSDBAs Top-SQL commands, other then normal administrative audits, like instance startup-shutdown and others
Format 2
Standard audit. These can go in OS (and is this case) when initialization parameter audit_trail=OS
Given the new information about both formats being multi-line events, I might take a different approach of not splitting the events and just writing appropriate regexes to extract the fields. Some of them seem like they could be single regexes since the fieldnames precede the field values with the len in brackets and only quote characters being different, this could be accomplished in the regex extraction with the alternation operator (|).
I did a quick test by pasting a few of your example events together and used the following props (I did not have an example header to drop so I skipped that part):
[db-audit-test]
MAX_TIMESTAMP_LOOKAHEAD=35
NO_BINARY_CHECK=1 TIME_FORMAT=%a %b %d
%H:%M:%S %Y
After building extractions in the interactive field extractor, I eventually used the following search with rex to test the extraction of LENGTH and ACTION:
index=main sourcetype="db-audit-test"
| head 10000 | rex "(?i)LENGTH.?:
('|\")(?P\d+)(?=(\"|'))" |
rex
"(?i)ACTION\s:\[\d+\]\s*(\"|')(?P\w*)(\"|')"
|table LENGTH ACTION
The actual regex displayed in the IFE for ACTION after I built it to extract for both formats was like this:
(?i)ACTION\s*:[\d+]\s*("|')(?P\w*)("|')
The header for both formats is as follow:
Audit file /path_to_audit_files/oraprod_dm00_11796784_20140626182611606394143795.aud
Oracle Database 11g Release 11.2.0.4.0 - 64bit Production
ORACLE_HOME = /oracle/11.2
System name: AIX
Node name: node_name_here
Release: 1
Version: 7
Machine: aabbccddeeff
Instance name: instance_name_here
Redo thread mounted by this instance: 1
Oracle process number: 40
Unix process pid: 11796784, image: oracle@node_name_here (DM00)
All events (including the header) are set apart by a blank line, in the above example, after the Unix...
line, there is a blank line, then the events follow, beginning with a timestamp-lenght-event data, another blank line, another block of events, and so on.
Looks to me like all of these lines could be sent to the null queue with simple regex considering that all the lines start with words which do not start lines in any events. You could either set up a filter to discard all lines which start with the words Audit, Oracle, ORACLE, System, Node, Release, Version, Machine, Instance, Redo or Unix OR you could discard everything which does not have either 3 colons (:) or any number of single or double quotes.
I've tried, based on the links you posted in your first answer, could not check straight away due to a license problem; today I was able to reset my license and test it, did not seem to work. Here is an extract of my props.conf
and transforms.conf
:
props.conf
[oracle_audit]
SHOULD_LINEMERGE = False
LINE_BREAKER = ([\r\n]+)\w{3}\s+\w{3}\s+\d+\s+\d+\:\d+\:\d+\s+\d{4}
KV_MODE = None
...
TRANSFORMS-Audit = eliminate-audit
TRANSFORMS-Oracle-Database = eliminate-oracle-database
...
transforms.conf
[eliminate-audit]
REGEX=^Audit;.*$
DEST_KEY=queue
FORMAT=nullQueue
[eliminate-oracle-database]
REGEX=^Oracle Database;.*$
DEST_KEY=queue
FORMAT=nullQueue
I've tried to understand the example in the link, then adapt it to my case. As the header is still showing up in search, it seems I did not understand it...
What is the intent of the semicolons in your regexes above?
I was not sure, then I adapted the example and left it there as it was. I have been running tests with regex online testers and found out the semicolon has no use there, thus I got rid of it. I also changed the regex to:
REGEX=^Oracle.*$
...meaning "a line beginning with ( ^
) 'Oracle' plus whatever comes after it (any char any length - .*
) till the end of the line ( $
)". But it is still not working. More tweaking needed here.
On the other hand, I found out how to insert the "|'
pair in transforms.conf
, it should be in the format:
[\'|\"]
This way, both LENGTH
and ACTION
fields are being correctly extracted.
Hi, chanfoli,
Regarding format 1, I was able to use props.conf and transforms.conf, following some models I gathered from other answers, to extract most of the key:value pairs. Here is an example of what I am using now:
in props.conf, new stanza oracle_audit created
[oracle_audit]
SHOULD_LINEMERGE = False
LINE_BREAKER = ([\r\n]+)\w{3}\s+\w{3}\s+\d+\s+\d+:\d+:\d+\s+\d{4}
KV_MODE = None
REPORT-ACTION_for_oracle = ACTION_for_oracle
...
in transforms.conf, one of the rex I am using to extract the pairs from format 1
[ACTION_for_oracle]
REGEX = ACTION[\s+]?:[\d+]\s+\'(.*?)\'([\r\n]+)(DATABASE\s+USER)
FORMAT = ACTION::"$1"
I couldn't help but noticing the regex you came by is simpler than mine, thus I'll give yours a try in order to see if it does the same and then replace mine by yours in case the answer is positive.
Nevertheless, the ("|')
solution I will use for sure, since it solves my single quote|double quote problem.
I see a few problems with the input and will offer suggestions for each of them.
Hope this helps,
chanfoli
I was able to, using the info from the links you sent, extract all necessary key:value pairs, after testing each rex in http://www.regexr.com/. Only thing is, now the client wants NOT to ditch the header but quite de opposite, store it somewhere and associate it to each event in the file it came from, in order to have the events fully identified. The reason is the fact that, since they have many servers/instances/databases, they want to know from which server/instance/database each even came. I do understand the "why", but don't have the "how" yet. More tweaking.
Edited format 2 in question and added the missing timestamp
Hi, chanfoli,
Many thanks for the prompt answers! I'll give it a go and let you know as soon as I make it work. Regarding format 2, my bad, it does have timestamp indeed, I just copy-pasted it wrongly, so no issues there, the timestamp is the same as the one in format 1 😉
Thanks again!