I've different log FTP files coming in. Each log file will be in a differnt format but with some common data across the log files. I wanted to extract those common information across the files and transform it into one source file AT INDEX TIME (not at search time).
I'm able to transform the data if it's from one source.
Please advise on how to combine the data from different files into one source file AT INDEX TIME.
This is what we are trying to do AT THE INDEX TIME. The requirement is to use Splunk to only transform the data from different logs into a specified format(as mentioned below) only. We don't want to search any information. Additionally, please confirm in the below mentioned scenario if Splunk will be used as a Forwarder.
I have mentioned sample events from six different log files where each log file contains each information in it that . For example log 1 may contain the "IP address",log2 may contain the "No of bytes transferred" and such like that .I want to combine the logs together so that I can change the format of the logs to the below Ouput format
2013-11-22 00:03:06,124  INFO: Secure storage keystore is disabled. [.utils.security.KeystoreReader]
2013-11-22 00:03:59,585 [68ea68ea] LOGON: There have been 1 logon failures for unrecognized userids in 0 seconds for these userids, IP addresses, and times: [null 188.8.131.52 11/22/13 0:3:59]. [.config.clientitf.FuncUser]
2013-11-22 06:32:41,057  FINE: Processing socket READ event [com.maverick.nio.SocketConnection]
2013-11-22 06:32:41,057  FINE: 127845 bytes transferred
13-11-22 05:26:54  FINE: Received file c:/demo.csv
13-11-22 05:26:54 [5f7e5f7e] FINE: Processing socket READ event
13-11-22 00:18:03 [69b269b2] INFO: Secure storage keystore is disabled.
13-11-22 00:33:03  INFO: file c:/demo.csv in ascii mode.
2013-11-22 00:03:19,224  SUCCESS: Compressed the file demo.csv: systemid=27, machinename=rsbe.bnymellon.net, module=Transformation Service, description=Transformation Service [.alarms.heartbeatgenerator.GeneratorThread]
2013-11-22 00:10:19,523 [6d886d88] SUCCESS: Success, Alarm Heartbeat updated: systemid=27, machinename=rsbe.bnymellon.net, module=Transformation Service, description=Transformation Service [.alarms.heartbeatgenerator.GeneratorThread]
2013-11-22 00:00:29,024 [51d451d4] SUCCESS: Scan successful: There are no documents for throttler Throttling Integration with owner ENCRYPT_DECRYPT to output. [.io.agents.throttle.ThrottleAgent]
2013-11-22 00:00:36,018 [516c516c] FINE: Scanning directory [/ftxprd1/BNYM_NONPROD_01/biz_outbound] [.io.agents.dirmon.DirMonInbound]
2013-11-22 00:00:36,018 [516c516c] FINE: Scanning dir [/ftxprd1/BNYM_NONPROD_01/biz_outbound] [.io.agents.dirmon.DirMonInbound]
2013-11-22 00:03:07,734  SUCCESS: Success, Alarm Heartbeat updated: systemid=25, machinename=rsbe.bnymellon.net, module=Inbound Listeners, description=Inbound Listeners [.alarms.heartbeatgenerator.GeneratorThread]
2013-11-22 00:10:08,023  SUCCESS: Success, Alarm Heartbeat updated: systemid=25, machinename=rsbe.bnymellon.net, module=Inbound Listeners, description=Inbound Listeners [.alarms.heartbeatgenerator.GeneratorThread]
You should be able to install a full Splunk instance and configure it to;
a) read some files
b) transform event data on a per event basis.
c) not index any events locally, and
d) forward them as syslog traffic
On your syslog server you can write the incoming data into a file.
I have had mixed results with sending syslog out of splunk, but that was a long time ago.
Thanks for the response. If Splunk can extract specific data from different logs during index time, we thought we can continue with Splunk. We are in the learning process only.
Also can you please confirm if the same can be achieved if Splunk were to be used as Forwarder. In data routing is it possible to select the data based on a pattern and send it to a receiver? (http://docs.splunk.com/Splexicon:Datarouting http://docs.splunk.com/Splexicon:Datarouting )
Can the receiver be a database / a file (ie) can the routed data be exported.
There one thing that I wonder about is that if you do not want to search the data, are you sure that you picked the right tool? While splunk certainly can transform data on a per event basis, the primary benefit of Splunk is to be able to collect vast amounts of logs and search through them.
What Ayn said. Splunk does not combine the files. What you can do in Splunk
1 - place all the data in the same index (probably a good idea)
2 - give all the inputs the same
sourcetype (possibly a good idea)
3 - give all the inputs the same
source (probably a bad idea)
4 - transform the incoming data based on source, sourcetype, host or regex pattern matching - at index time
5 - extract fields at search time or index time - based on regex pattern matching plus source, sourcetype or host
I don't understand your statement that you can "transform the data if it's from one source". Having all the data be from "one source" is not a requirement of Splunk.
I am not sure what you are extracting, so I cannot give specific advice about index time parsing. However, if you want to extract fields at index time, I will tell you now that it is a bad idea to do it that way 99.9% of the time. Search time field extraction is almost always better. Everyone else in the community will say the same thing. But with more information about the data, we might see it differently and be able to give clearer advice.
So why do you need one source file? And why do you need to do this at index time?
What do you mean by "into one source file"? Splunk doesn't keep files that way. Splunk will however add some metadata about where it got events from (which is recorded in the "source" field). Is this what you mean? That you want to write the same source metadata for several different sources?