Getting Data In

collect not storing the extracted fields into new index and what is way to save all extracted fields into new index

splunk_worker
Path Finder

Topic: collect not storing the extracted fields into new index and what is way to save all extracted fields into new index

Description :
Currently I'm experimenting the below task in test environment with stand alone splunk instance. Once tested, I have to move to production.
The original log data has partial JSON data which is sent over syslog. So the format is " _time server-name ipaddress INFO .... { <> }

What I need ?????
Search query extracting JSON key-values and loading the values is very slow and can't be accepted. Hence, I want to extract all key-values pairs and store into a new index. So that I can write my queries on my new index which should be having plain log format with name-value pairs.

QUESTION-1: Does my new index should be summary index or regular index?
QUESTION-2: How to save all the extracted field-values into new index?

.
.
.

Since if the log contain partial data, generally Splunk takes lots of time to extract key-values from JSON with 'spath' command. Hence, I decided to create a schedule search for every 5min and write the name/value pair into new summary index using 'collect' command.

Below is the sample query where the original log is indexed into index web_analytics. In this query I'm extracting the key-values pairs from JSON value
index=web_analytics | rex max_match=10 "(?{[^}]+})" | mvexpand json_field | spath input=json_field | rename active_features{} as A_Features, actions{} as A_Actions | collect index=si_web_analytics

QUESTION-3: After the above schedule search exectued, I'm still seeing the same JSON log even in si_web_analytics. The collect command has not stored the extracted key-values from JSON.

QUESTION-4: In a distributed search, where does si_web_analytics needs to be created? on search head or indexer?

Thanks in advance.

0 Karma

rsennett_splunk
Splunk Employee
Splunk Employee

I don't think collect is doing what you think it's doing... so set that aside for now.

see this answer http://answers.splunk.com/answers/131911/collect-command

To extract the JSON fields, add
REPORT-myJson = grabJsonField , breakUpJson

in props.conf

And in the matching stanza in transforms.conf, extract your JSON key value pairs.
You can do that with many techniques.
something like

[grabJsonField]
REGEX= ^[\{](?<json_field>{[^\}]+})
FORMAT= json_field::$1

(or whatever you need to get the whole field...

[breakUpJson]
SOURCE_KEY=json_field
DELIMS= "," , ":"

This is just an example because often when JSON is embedded in a message, it isn't necessarily properly structured, so you can allow for whatever nuances there are.

Take a look at this:

http://docs.splunk.com/Documentation/Splunk/6.1.1/Admin/Transformsconf

I don't think you really want a summary index here since it isn't for a subset of data, and you aren't summarizing anything.

There are also a number of examples in answers of different techniques for extracting JSON from within a "mixed" event.

When you are reading these, note the following nuance:

Note, my answer says "REPORT" will extract fields for search (index time would be TRANSFORMS)

Since you are looking for an alert,
Take the fields from your JSON and create a lookup. Query the lookup.

With Splunk... the answer is always "YES!". It just might require more regex than you're prepared for!
0 Karma

splunk_worker
Path Finder

Thanks for response. Any help on step # 2 in the above post?

Also, I might store only subset of data aswell since only 30-40 fields from index_1 are useful out of 60+ fields.

0 Karma

rsennett_splunk
Splunk Employee
Splunk Employee

I've amended the answer to reflect search time extractions only.
(there is nothing wrong with index time extractions. Best Practice is to consider it carefully... because it is FOREVER)

I have also suggested that you may want to create a lookup from your search and then query the lookup for faster response.

With Splunk... the answer is always "YES!". It just might require more regex than you're prepared for!
0 Karma

splunk_worker
Path Finder

In brief, this is what I'm looking for
1. Index the _raw data as it is to say index_1.
2. Extract the name & values which are in JSON format from index_1 into new index say index_2. I'm able to extract all 60 fields from index_1, but how to save the extracted fields into new index?
3. Write query on index_2 and search query will be faster.
4. Create real time alerts based on pre-defined pattern from index_1 (hence customer dont want to exaction during index time). This is not the problem now.

I'm facing problem in step-2 and if any other ways of solving this problem.

0 Karma

splunk_worker
Path Finder

Thanks for the response.

But, I dont want to extract fields during indexing (customers also dont want) as the latency of events arrival increased. Even Splunk document suggest not to do extracting during index time.

0 Karma
Get Updates on the Splunk Community!

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...

Let’s Get You Certified – Vegas-Style at .conf24

Are you ready to level up your Splunk game? Then, let’s get you certified live at .conf24 – our annual user ...