I'm in the process of converting old unstructured log files with a different format than Splunk and do not know how to convert the below into a format that Splunk would parse and be able to create field tokens:
29/11/07 08:36:59 4064 (xxxx) : Sending....: 0099"recordtype=ZZ" "user=aaa123" "counterparty=384" "device=aaa123" "system=winner" "time=31019"
29/11/07 08:36:59 4064 (xxxx) : Read: "recordtype=ZZR" "user=aaa123" "counterparty=384" "device=aaa123" "system=winner" "time=30886" "reply_reason=0"
Example 1 of a log entry containing double quotes is:
29/11/07 08:36:59 4064 (xxxx) : Field1="record_type=ZZR;user=aaa123" Field2="counterparty=384;device=aaa123"
Example 2 of a log entry containing double quotes is:
10/01/12 14:06:30 ["NAME"]="XSHE.L", ["AK"]="966.93", ["BD"]="960.66"
These already try to represent the fields and data in a way that breaks Splunk's formating.
Any suggestions for the above would be highly appreciated.
Well there are a couple of ways. Firstly I can't tell from your question if you can do this but you suggest about converting formats. Splunk will automatically extract key/value pairs.
E.g. if you convert the log output to record_type=ZZ without the speech marks then it will automatically extract the values to the correct field.
Option two, lets assume you have to work with what you have, using the props.conf and transforms.conf files you could create a search time extraction. What this means is that when a user fires off a search on your data Splunk looks inside props and then transforms for any regular expressions that relate to the data being searched. If there is it applies these regular expressions which allow you to extract fields at search time. For example, if you have the above logs indexed in Splunk already try running this search;
searchqueryforlogs | rex "Field1="record_type=(?<record_type>[^;]+);user=(?<user>[^"]+)"
On data such as; 29/11/07 08:36:59 4064 (xxxx) : Field1="recordtype=ZZR;user=aaa123" Field2="counterparty=384;device=aaa123" it will extract the recordtype and user fields correctly (it assumes that everything after record_type= and the ; is the field value.
This is a regex, you can apply this in the transforms to do this automatically when you run a search like this;
[extract_record] REGEX = Field1="record_type=([^;]+);user=([^"]+) FORMAT = record_type::$1 user::$2
If you read the configs via the links above you can learn more about what you can do 🙂
Also, the above isn't tested so you may need to make slight alterations, feel free to feedback if this isn't what you were after or if you hit problems.
Thanks for your response, it does answer my question partly, but I have an issue with changing the format of some of the data I receive from other sources (data not formated by my programs). Many a times its data from another provider and best to preserve the format as it is.
I'll try to post examples of these as and when I come across them while using Splunk.
[Note: I converted your note from an answer to a comment so that it would appear threaded properly and can be further replied to. Only create answers when you are actually answering the initial top level question. thanks!]
In the case of @drainy's answer, the format of the data was not changed! The props and transforms configurations are performing search-time field extractions which leave the original raw data as-is. In fact, this is always how Splunk works.