Control File: /dir/dir/dir/file_name
Data File: /dir/dir/dir/file_name.dat
Bad File: /dir/dir/dir/file_name.log
Discard File: /dir/dir/dir/file_name.log
(Allow all discards)
Number to load: ALL
Number to skip: 0
Errors allowed: 50000
Bind array: 1 rows, maximum of 256000 bytes
Continuation: none specified
Path used: Conventional
Silent options: FEEDBACK
Table TABLE_NAME, loaded from every logical record.
Insert option in effect for this table: APPEND
TRAILING NULLCOLS option in effect
Column Name Position Len Term Encl Datatype
NAME_ID FIRST * | O(") CHARACTER
Table TABLE:
1 Row successfully loaded.
0 Rows not loaded due to data errors.
0 Rows not loaded because all WHEN clauses were failed.
0 Rows not loaded because all fields were null.
Space allocated for bind array: 1542 bytes(1 rows)
Read buffer bytes: 1048576
Total logical records skipped: 0
Total logical records read: 1
Total logical records rejected: 0
Total logical records discarded: 0
Run began on Wed Sep 09 08:50:36 2015
Run ended on Wed Sep 09 08:50:36 2015
Elapsed time was: 00:00:00.22
CPU time was: 00:00:00.05
log file is above and need to search for and send email for each log file that comes in during 24 hour period the following summary (fields)
Data File:
Table
Number of Rows loaded -> 1 Row successfully loaded.
Number of Rows failed -> 0 Rows not loaded because all WHEN clauses were failed.
START -> Run began on Wed Sep 09 08:57:36 2015
END -> Run ended on Wed Sep 09 08:57:37 2015
Elapsed time was: 00:00:00.46
CPU time was: 00:00:00.10
I wrote the above questions, then realized it may not matter much though there are several ways to do this. They're probably separate events, so I think we first combine them. We can use transaction to do that. I'm picking a 1 minute max interval between the first line and the last line of the log file to keep it more efficient - adjust as necessary.
... | transaction startswith="Control File" endswith="CPU time was" maxspan=1m
That should group the events together. Now, let's extract the data you need with rex. To the end of the above...
... | rex "Data File: (?<data_file>[^\s]+)" | rex "Table (?<target_table>[^:]+)"
I took your string Data File: /dir/dir/dir/file_name.dat
and made a field called "data_file" out of everything that isn't a space that followed the "Data File: " string. Right after that, I used rex to create a field called "target_table" out of everything that comes after the word "Table " up to the colon. Several of the other strings/captures will be much like that, I'm leaving it as an exercise for you to build them, but if you have any problems add a comment to this and I or someone will try to help with that particular problem!
One that's different will be 1 Row successfully loaded.
... | rex "(?<rows_success_string>\d+ Row successfully loaded.)" | rex field=rows_success_string "(?<rows_success_count>\d+)"
The creates a field called "rows_success_string" that is the full 1 Row successfully loaded.
then immediately does a rex on that new field and pulls out the digits in the front and creates a field out of that called "rows_success_count", which you don't mention you need, but I thought I'd show the technique. You can easily do this in one rex, but this seemed like it should be easier to understand.
Those are the pieces I think you may need to get your data into fields. Next, you need to format it. I think the easiest way to format it might be to create a table out of your fields that you want, then "transpose" them to make it vertically oriented instead of left-right oriented. That would be something like
... | table data_file, target_table, rows_success, Field3, Field4, ... FieldN | transpose
Obviously, fill in the rest of the fields you need to show.
Last, create an alert from the "Save As" menu. Maybe have it run once per hour (or every 5 minutes, or once per day - whatever you want, keeping in mind system load) and alert when the result is greater than 1 and send an email with the contents sent in-line, and perhaps attached too.
A couple of quick questions:
Is that log file being ingested already into Splunk?
Does each line come in as a separate event or does the entire log come in as a single event?
About how may of these log files get ingested each day? Less than 100? More than 1000?