I am trying to ingest the structured logs from our main Perforce server. I have the structured logs split out to multiple files (Commands.csv, Errors.csv, Audit.csv, Track.csv, User.csv, Events.csv, Integrity.csv, and Auth.csv). The issue with this is that each file is not coming in the same file format. The way Perforce does it is by specifying an "event type" as the first character of every single event (1-16). Each number specifies a different set of fields for that single event. The Commands and Track CSVs come with multiple event types. Has anyone encountered this yet or does anyone know how to split out the field values based on that first character? Below are some test files from Track.csv and as you can see, the event types 7, 8, 9, 14 all represent different field values in the event.
9,1464803105,104007315,2016/06/01 18:45:05 104007315,144447,1,user,server,user-sync,IP,p4,2015.2/NTX64/1311674,file,db,db.counters,3,0,2,0,0,0,0,1,0,0,0,0
8,1464803105,104007315,2016/06/01 18:45:05 104007315,144447,1,user,server,user-sync,IP,p4,2015.2/NTX64/1311674,file,rpc,2,4,222,483,318788,523588,92,0
7,1464803105,104007315,2016/06/01 18:45:05 104007315,144447,1,user,server,user-sync,IP,p4,2015.2/NTX64/1311674,file,usage,.098s,3,2,0,8,0,0,4332,0
14,1464803105,103663508,2016/06/01 18:45:05 103663508,144447,1,user,server,user-sync,IP,p4,2015.2/NTX64/1311674,file,0,0,0,0,0,0
14,1464803105,100381835,2016/06/01 18:45:05 100381835,144447,1,user,server,user-sync,IP,p4,2015.2/NTX64/1311674,file,0,0,0,0,0,0
If I understood you correctly, it sounds like you need to configure a per-event sourcetype. Have a look and let us know if it makes sense.
About your older comment: First up, I'm not familiar with REPORT-fields and I have no time to google right now, but did it work at all? Did the Eventtype field get populated with numbers? I wouldn't be surprised if the other fields didn't get populated as I think there is an issue in your regexes. For instance in:
[schema_14]
REGEX = ^(14)([^,]*)
FORMAT = Eventtype::$1 timestamp::$2 timestamp2::$3 date::$4 pid::$5 cmdno::$6 user::$7 client::$8 func::$9 host::$10 prog::$11 version::$12 args::$13 filesAdded::$14 fileUpdated::$15 filesDeleted::$16 bytesAdded::$17 bytesUpdated::$18 bytesDeleted::$19
Your regex will match the 14 at the start of the line and assign that to $1, which then should go into Eventtype. But after the 14, there is a comma in the data, therefore $2 will be empty. And of course there is no $3, $4, etc... I believe the regex line should be something like:
REGEX = ^(14)(?:,([^,]*)){18}
You basically need to match the comma as a separator and then repeat the whole comma-followed-by-some-non-comma-chars pattern as many times as there are other fields.
Now, about your second comment, this is closer to the kind of things I've seen before. If I'm not mistaken this should set the sourcetype field to "command_log" if the event line starts with 7. Did that work? If yes, then you're mostly there, as you found a way to separate your initial data into bunches with the same schema. Just create as many TRANSFORMS as there are schemas. The next step is to define the format of each sourcetype somewhere. I'm not sure how to do that but that's a much simpler issue than your initial problem.
I appreciate I'm not being extremely helpful here... I'm afraid it's the blind leading the blind! Don't hesitate to comment back. Hopefully you'll be able to make some progress...
So from my understanding of the per-event sourcetype this is my new props and transforms. I am still confused where i would set the field names for the events:
transforms.conf
[schema_7]
REGEX = ^(7)([^,]*)
FORMAT = sourcetype::command_log
DEST_KEY = MetaData:Sourcetype
props.conf
[source::/p4rotatedlogs/structuredlogs/track.csv]
TRANSFORMS-fields = schema_7
Hi JScordo, did you manage to get it working in the end?
Yes i guess a per-event sourcetype is the direction i would need to go. I will read into that. The road i have gone down is based off the answer/comments here:
https://answers.splunk.com/answers/316273/field-extracting-lines-from-a-single-file-based-on.html
and this helped me get to these props and transforms. Clearly this isn't working though
Transforms.conf
[schema_7]
REGEX = ^(7)([^,]*)
FORMAT = Eventtype::$1 timestamp::$2 timestamp2::$3 date::$4 pid::$5 cmdno::$6 user::$7 client::$8 func::$9 host::$10 prog::$11 version::$12 args::$13 tracktype::$14 timer::$15 utime::$16 stime::$17 io_in::$18 io_out::$19 net_in::$20 net_out::$21 maxrss::$22 page_faults::$23
[schema_8]
REGEX = ^(8)([^,]*)
FORMAT = Eventtype::$1 timestamp::$2 timestamp2::$3 date::$4 pid::$5 cmdno::$6 user::$7 client::$8 func::$9 host::$10 prog::$11 version::$12 args::$13 tracktype::$14 recvCount::$15 SendCount::$16 recvBytes::$17 sendBytes::$18 rpc_hi_mark_fwd::$19 rpc_hi_mark_rev::$20 recvTime::$21 sendTime::$22
[schema_9]
REGEX = ^(9)([^,]*)
FORMAT = Eventtype::$1 timestamp::$2 timestamp2::$3 date::$4 pid::$5 cmdno::$6 user::$7 client::$8 func::$9 host::$10 prog::$11 version::$12 args::$13 tracktype::$14 dbName::$15 pagesIn::$16 pagesOut::$17 pagesCached::$18 reorderIntl::$19 reorderLeaf::$20 readLocks::$21 writeLocks::$22 f_gets::$23 f_positions::$24 f_scans::$25 f_puts::$26 f_deletes::$27
[schema_14]
REGEX = ^(14)([^,]*)
FORMAT = Eventtype::$1 timestamp::$2 timestamp2::$3 date::$4 pid::$5 cmdno::$6 user::$7 client::$8 func::$9 host::$10 prog::$11 version::$12 args::$13 filesAdded::$14 fileUpdated::$15 filesDeleted::$16 bytesAdded::$17 bytesUpdated::$18 bytesDeleted::$19
Props.conf
[track_log]
SHOULD_LINEMERGE=false
KV_MODE = none
TIME_FORMAT = %Y/%m/%d %H:%M:%S
TZ = +1:00
REPORT-fields = schema_7, schema_8, schema_9, schema_14