Solved: Merging different log formats

jgauthier · ‎03-29-2011

All,

I am wondering it it's possible to take two entirely different source file formats (containing the same data) and be able to report against them. Real life scenario, I have a mail server that writes logs in tab a tab delimited format. I have another mail server that uses CSV. They contain the same general fields.

I would like to take these two different sources, consolidated into one sourcetype in splunk, and do logical analysis from that at that point.

Is it possible, and what are some broad guidelines to achieve this?

My concerns are:
Parsing the timestamps, as they are different. Report time extractions. How will splunk extract the fields when the formats are completely different?

Thanks for any tips!

southeringtonp · ‎03-29-2011

Be sure to configure your sourcetypes properly -- after that the rest should fall into place.

Configuration Steps:

Make sure that each of your formats is assigned a unique sourcetype. You can either assign the sourcetype based on each input or source to splunk, or you can use a transform to do it based on value in the event data. For more information on sourcetypes, look here.
Timestamping will usually work out-of-the-box, even for unknown data formats. If it doesn't, you can customize the timestamp extraction.
Create field extractions for each sourcetype. You'll need a unique extraction per-field, per-sourcetype. Take a look at the Common Information Model for suggested naming conventions. For delimited data, you may also want to look at FIELDS and DELIMS in transforms.conf (link).
Search across all of your sourcetypes together, and pipe them to reporting commands to get values based on both.

Example searches:

sourcetype=format1 OR sourcetype=format2 | table user, src_ip, action
sourcetype=format1 OR sourcetype=format2 | stats count by user
sourcetype=format* | stats count by user

The first example just gives you nice formatting. The second and third example both gather basic usage stats. Note the second example -- if you name your sourcetypes appropriately, you can use wildcards to catch all of the variations at once. More examples of the various reporting commands are in the manual or the Cheat Sheet.

View solution in original post

southeringtonp · ‎03-29-2011

Be sure to configure your sourcetypes properly -- after that the rest should fall into place.

Configuration Steps:

Make sure that each of your formats is assigned a unique sourcetype. You can either assign the sourcetype based on each input or source to splunk, or you can use a transform to do it based on value in the event data. For more information on sourcetypes, look here.
Timestamping will usually work out-of-the-box, even for unknown data formats. If it doesn't, you can customize the timestamp extraction.
Create field extractions for each sourcetype. You'll need a unique extraction per-field, per-sourcetype. Take a look at the Common Information Model for suggested naming conventions. For delimited data, you may also want to look at FIELDS and DELIMS in transforms.conf (link).
Search across all of your sourcetypes together, and pipe them to reporting commands to get values based on both.

Example searches:

sourcetype=format1 OR sourcetype=format2 | table user, src_ip, action
sourcetype=format1 OR sourcetype=format2 | stats count by user
sourcetype=format* | stats count by user

The first example just gives you nice formatting. The second and third example both gather basic usage stats. Note the second example -- if you name your sourcetypes appropriately, you can use wildcards to catch all of the variations at once. More examples of the various reporting commands are in the manual or the Cheat Sheet.

jgauthier · ‎03-29-2011

Okay, so I do want different source types. I just need to use the same field names in the extractions and then things should work. Okay, Thanks! Makes perfect sense!

Merging different log formats

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Laser Bananas and Edge Hubs: Exploring Operational Technology (OT) Data Through a ...

Event Series: Mastering AI Tokenomics and Splunk Agent Observability

span_metrics: The OpenTelemetry-Idiomatic Way to See Inside Your Services

Join the Conversation