Getting Data In

How to extract data from the raw data of each event before sent to indexer?

lsmkelvin
New Member

Hi all,

I am new to Splunk. I was stuck on how to extract data from the original log before indexing them.

Below is my original log

160.19.104.25 2013-05-21 15:46:50 160.80.38.178:15010 GET /lbHealthMon/index.jsp HTTP/1.1 200 322 - 5249409c3873c79f:-7d44c4c8:13e78cbed15:-7fc5-000000000002f28c 28 1369122410946 0.0020
160.19.104.25 2013-05-21 15:46:50 160.80.38.178:15010 GET /lbHealthMon/index.jsp HTTP/1.1 200 322 - 5249409c3873c79f:-7d44c4c8:13e78cbed15:-7fc5-000000000002f28c 28 1369122410946 0.0020

I want to extract some of the information(let say "IP address", "date" and "time") and use Heavy Forwarder to send to indexer for index.

Can anyone please kindly help me to figure it out?

Best regards,

Kelvin

Tags (1)
0 Karma

lsmkelvin
New Member

Yes, you are right, i want to remove some us-use data before forwarding to indexer.

For example:
Original Log
160.19.104.25 2013-05-21 15:46:50 160.80.38.178:15010 GET /lbHealthMon/index.jsp HTTP/1.1 200 322 - 5249409c3873c79f:-7d44c4c8:13e78cbed15:-7fc5-000000000002f28c 28 1369122410946 0.0020

After forward to indexer:
2013-05-21 15:46:50 /lbHealthMon/index.jsp 0.0020

0 Karma

dwaddle
SplunkTrust
SplunkTrust

If I understand your question, I might suggestion you are thinking too much in a 'relational database' mindset. You do not need to do any preprocessing of the data prior to indexing it in order to associate "160.19.104.25" with the name "IP address", same with date and time.

Splunk will by default produce a full text index of all of the "tokens" from your event. "160.19.104.25" is a token, as are "2013-05-21", "15:46:50", and "GET" (and so on). There is no need to tell it in advance that 160.19.104.25 is the IP address.

When you run a search, Splunk will apply various rules at that time to associate field names with values. These rules can include regular expressions, searching for key=value, or delimeter-based operations.

The date and time are parsed at index time in order to create an epoch time for the event, which is stored in the index. This is key to Splunk's whole time-series data approach.

The net of it is that you can still do searches on stuff like ip_address=160.19.104.25, and Splunk will use the full-text index in combination with your rules for field extraction to find your results. But, it does not require you to define at index time the rules (or schema) for finding these results.


It's also possible I have entirely misinterpreted your question. If so, please elaborate/clarify. 🙂

lsmkelvin
New Member

Anyway, thanks all of you take attention on my question.
^^

0 Karma

dwaddle
SplunkTrust
SplunkTrust

Yeah, I completely misunderstood what you were saying. In typical Splunk lingo the word "extract" is strongly related with the idea of pulling data out of an event and giving it a name. I jumped to the wrong conclusion. Glad you were able to get sedcmd to work.

0 Karma

lsmkelvin
New Member

Some data is not useful or meaningful to Splunk for analysis, however, the un-use data which is useful for other purpose.
In my case, i just want to analysis the every URL's response time with the time stamp. If i index every single line, the costs is expensive.

However, it seem i got the answer with using "sedcmd".
http://docs.splunk.com/Documentation/Splunk/latest/Data/Anonymizedatausingconfigurationfiles

0 Karma

amiritc
New Member

Hi dear splunker.
if you have solved your problem , could you please help with your solution. I have your problem now

0 Karma

Ayn
Legend

Not that it can't be done, but why would you need to remove data? To save license costs?...

0 Karma

lsmkelvin
New Member

Maybe my question is not clear, anyway, thanks for you reply. Let me try to explain with the example in below.

Original Log before forward:
"160.19.104.25 2013-05-21 15:46:50 160.80.38.178:15010 GET /lbHealthMon/index.jsp HTTP/1.1 200 322 - 5249409c3873c79f:-7d44c4c8:13e78cbed15:-7fc5-000000000002f28c 28 1369122410946 0.0020"

After forward to indexer:
"2013-05-21 15:46:50 /lbHealthMon/index.jsp 0.0020"

I just want to remove the un-used data before indexing.

Thanks so much for your kindly help!

Best regards.

0 Karma

sbrant_splunk
Splunk Employee
Splunk Employee

By extract, do you mean that you want to remove the data prior to indexing?

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Unlocking Unified Insights: New Gigamon Federated Search App for Splunk

In today’s data-heavy environment, organizations are caught in a data distribution dilemma. As data volumes ...

GA: New Data Management App in Splunk Platform

Streamlining Data Management: Introducing a unified experience in Splunk Managing data at scale shouldn’t feel ...

Announcing Modern Navigation: A New Era of Splunk User Experience

We are excited to introduce the Modern Navigation feature in the Splunk Platform, available to both cloud and ...