I am wondering it it's possible to take two entirely different source file formats (containing the same data) and be able to report against them. Real life scenario, I have a mail server that writes logs in tab a tab delimited format. I have another mail server that uses CSV. They contain the same general fields.
I would like to take these two different sources, consolidated into one sourcetype in splunk, and do logical analysis from that at that point.
Is it possible, and what are some broad guidelines to achieve this?
My concerns are:
Parsing the timestamps, as they are different. Report time extractions. How will splunk extract the fields when the formats are completely different?
Thanks for any tips!
Be sure to configure your sourcetypes properly -- after that the rest should fall into place.
Make sure that each of your formats is assigned a unique sourcetype. You can either assign the sourcetype based on each input or source to splunk, or you can use a transform to do it based on value in the event data. For more information on sourcetypes, look here.
Timestamping will usually work out-of-the-box, even for unknown data formats. If it doesn't, you can customize the timestamp extraction.
Create field extractions for each sourcetype. You'll need a unique extraction per-field, per-sourcetype. Take a look at the Common Information Model for suggested naming conventions. For delimited data, you may also want to look at
DELIMS in transforms.conf (link).
Search across all of your sourcetypes together, and pipe them to reporting commands to get values based on both.
sourcetype=format1 OR sourcetype=format2 | table user, src_ip, action sourcetype=format1 OR sourcetype=format2 | stats count by user sourcetype=format* | stats count by user
The first example just gives you nice formatting. The second and third example both gather basic usage stats. Note the second example -- if you name your sourcetypes appropriately, you can use wildcards to catch all of the variations at once. More examples of the various reporting commands are in the manual or the Cheat Sheet.
Okay, so I do want different source types. I just need to use the same field names in the extractions and then things should work. Okay, Thanks! Makes perfect sense!