Getting data into splunk


How to change the format of the input data to our need before indexing in splunk. My original lof is in the format.
SNM4 YAHOO3SN.#### :: 03/03/13 00:00:07 :: User yahoo3sn logged in
SNM4 YAHOO3SN.871F :: 03/03/13 00:00:07 :: User logged off, Processing will begin
SNM4 YAHOO3SN.871F :: 03/03/13 00:00:07 :: Autoforward profile found for site YAHOO3SN
i want to change the format of the above log before indexing starts in splunk t tp the below format

YAHOO3SN.871F|logged in|03/03/2013|00:00:07

Tags (1)


Well, if you are not familiar with RegExes you should use a tool like QuickREx, which is available as portable version also:

^         find he beginning of the line.
(.*?)\s   find some text followed by a space and store this to variable $1
(.*?)\s   find some text followed by a space and store this to variable $2
::\s      find two colons followed by a space
(\d\d)\/  find 2 numbers followed by a slash and store this to variable $3 (day)
(\d\d)\/  find 2 numbers followed by a slash and store this to variable $4 (month)
(\d\d)\s  find 2 numbers followed by a space and store this to variable $5 (2-digit year)
          find the hour, minutes and seconds, followed by space, colon, colon, space and store this to variable $6 
(.*?)((logged off)|(logged in))(.*)
          find some text followed by either "logged in" or "logged out" and store this to variable $8

Write the following text to the _raw event:

Content of variable $2 followed by pipe, then the day ($3) followed by slash, the month ($4) followed by slash, the "20" followed by the 2-digit year ($5) to have a proper year, then the time ($6) followed by "logged in" or "logged out" ($8)
0 Karma


thanks a lot

0 Karma


You can use a combination of props.conf and transforms.conf on your Indexer for that. In this example, the props.conf will inform your Splunk to use the transformation called "rewrite-MyLogs" for the sourcetype "MySourceType". The transformation will use a regular expression on the input and find the terms "logged in" or "logged off" and create the new data for the Indexer. For the date the "20" is added to the short format of the year. The case that none of the 2 terms can be found is not yet covered in this snippet.

Note that this rewriting of logs requires additional system resources and therefore may impact the performance of your Splunk installation. In order to solve that you could place this part as well on a Heavy Forwarder in front of the Indexer(s).

Note2: When you are rewriting the date/time anyway, you should consider to use a standard time format like ISO 8601, this may avoid troubles in the future 🙂

TRANSFORMS-MyLogs = rewrite_MyLogs

REGEX = ^(.*?)\s(.*?)\s::\s(\d\d)\/(\d\d)\/(\d\d)\s(\d\d\:\d\d\:\d\d)\s::\s(.*?)((logged off)|(logged in))(.*)
FORMAT = $2|$3/$4/20$5|$6|$8
DEST_KEY = _raw


Can you explain the format and regex pattern in detail .

0 Karma

Ultra Champion

well, there are some ways to 'change' data prior to indexing, like described in;

The steps described there are mostly for removing unwanted pieces of information, such as credit card numbers etc. For more extensive rewriting of log data, it might be better to look at the logging application, and see what output options it provides.


Take the 2021 Splunk Career Survey

Help us learn about how Splunk has
impacted your career by taking the 2021 Splunk Career Survey.

Earn $50 in Amazon cash!