How to extract data from the raw data of each even...

lsmkelvin · ‎05-22-2013

Hi all,

I am new to Splunk. I was stuck on how to extract data from the original log before indexing them.

Below is my original log

160.19.104.25 2013-05-21 15:46:50 160.80.38.178:15010 GET /lbHealthMon/index.jsp HTTP/1.1 200 322 - 5249409c3873c79f:-7d44c4c8:13e78cbed15:-7fc5-000000000002f28c 28 1369122410946 0.0020
160.19.104.25 2013-05-21 15:46:50 160.80.38.178:15010 GET /lbHealthMon/index.jsp HTTP/1.1 200 322 - 5249409c3873c79f:-7d44c4c8:13e78cbed15:-7fc5-000000000002f28c 28 1369122410946 0.0020

I want to extract some of the information(let say "IP address", "date" and "time") and use Heavy Forwarder to send to indexer for index.

Can anyone please kindly help me to figure it out?

Best regards,

Kelvin

lsmkelvin · ‎05-22-2013

Yes, you are right, i want to remove some us-use data before forwarding to indexer.

For example:
Original Log
160.19.104.25 2013-05-21 15:46:50 160.80.38.178:15010 GET /lbHealthMon/index.jsp HTTP/1.1 200 322 - 5249409c3873c79f:-7d44c4c8:13e78cbed15:-7fc5-000000000002f28c 28 1369122410946 0.0020

After forward to indexer:
2013-05-21 15:46:50 /lbHealthMon/index.jsp 0.0020

dwaddle · ‎05-22-2013

If I understand your question, I might suggestion you are thinking too much in a 'relational database' mindset. You do not need to do any preprocessing of the data prior to indexing it in order to associate "160.19.104.25" with the name "IP address", same with date and time.

Splunk will by default produce a full text index of all of the "tokens" from your event. "160.19.104.25" is a token, as are "2013-05-21", "15:46:50", and "GET" (and so on). There is no need to tell it in advance that 160.19.104.25 is the IP address.

When you run a search, Splunk will apply various rules at that time to associate field names with values. These rules can include regular expressions, searching for key=value, or delimeter-based operations.

The date and time are parsed at index time in order to create an epoch time for the event, which is stored in the index. This is key to Splunk's whole time-series data approach.

The net of it is that you can still do searches on stuff like ip_address=160.19.104.25, and Splunk will use the full-text index in combination with your rules for field extraction to find your results. But, it does not require you to define at index time the rules (or schema) for finding these results.

It's also possible I have entirely misinterpreted your question. If so, please elaborate/clarify. 🙂

lsmkelvin · ‎05-23-2013

Anyway, thanks all of you take attention on my question.
^^

dwaddle · ‎05-23-2013

Yeah, I completely misunderstood what you were saying. In typical Splunk lingo the word "extract" is strongly related with the idea of pulling data out of an event and giving it a name. I jumped to the wrong conclusion. Glad you were able to get sedcmd to work.

lsmkelvin · ‎05-23-2013

Some data is not useful or meaningful to Splunk for analysis, however, the un-use data which is useful for other purpose.
In my case, i just want to analysis the every URL's response time with the time stamp. If i index every single line, the costs is expensive.

However, it seem i got the answer with using "sedcmd".
http://docs.splunk.com/Documentation/Splunk/latest/Data/Anonymizedatausingconfigurationfiles

amiritc · ‎09-21-2018

Hi dear splunker.
if you have solved your problem , could you please help with your solution. I have your problem now

Ayn · ‎05-22-2013

Not that it can't be done, but why would you need to remove data? To save license costs?...

lsmkelvin · ‎05-22-2013

Maybe my question is not clear, anyway, thanks for you reply. Let me try to explain with the example in below.

Original Log before forward:
"160.19.104.25 2013-05-21 15:46:50 160.80.38.178:15010 GET /lbHealthMon/index.jsp HTTP/1.1 200 322 - 5249409c3873c79f:-7d44c4c8:13e78cbed15:-7fc5-000000000002f28c 28 1369122410946 0.0020"

After forward to indexer:
"2013-05-21 15:46:50 /lbHealthMon/index.jsp 0.0020"

I just want to remove the un-used data before indexing.

Thanks so much for your kindly help!

Best regards.

sbrant_splunk · ‎05-22-2013

By extract, do you mean that you want to remove the data prior to indexing?

How to extract data from the raw data of each event before sent to indexer?

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Unlocking Unified Insights: New Gigamon Federated Search App for Splunk

GA: New Data Management App in Splunk Platform

Announcing Modern Navigation: A New Era of Splunk User Experience

Join the Conversation