Topic: collect not storing the extracted fields into new index and what is way to save all extracted fields into new index
Description :
Currently I'm experimenting the below task in test environment with stand alone splunk instance. Once tested, I have to move to production.
The original log data has partial JSON data which is sent over syslog. So the format is " _time server-name ipaddress INFO .... { <
What I need ?????
Search query extracting JSON key-values and loading the values is very slow and can't be accepted. Hence, I want to extract all key-values pairs and store into a new index. So that I can write my queries on my new index which should be having plain log format with name-value pairs.
QUESTION-1: Does my new index should be summary index or regular index?
QUESTION-2: How to save all the extracted field-values into new index?
.
.
.
Since if the log contain partial data, generally Splunk takes lots of time to extract key-values from JSON with 'spath' command. Hence, I decided to create a schedule search for every 5min and write the name/value pair into new summary index using 'collect' command.
Below is the sample query where the original log is indexed into index web_analytics. In this query I'm extracting the key-values pairs from JSON value
index=web_analytics | rex max_match=10 "(?
QUESTION-3: After the above schedule search exectued, I'm still seeing the same JSON log even in si_web_analytics. The collect command has not stored the extracted key-values from JSON.
QUESTION-4: In a distributed search, where does si_web_analytics needs to be created? on search head or indexer?
Thanks in advance.
I don't think collect
is doing what you think it's doing... so set that aside for now.
see this answer http://answers.splunk.com/answers/131911/collect-command
To extract the JSON fields, add
REPORT-myJson = grabJsonField , breakUpJson
in props.conf
And in the matching stanza in transforms.conf, extract your JSON key value pairs.
You can do that with many techniques.
something like
[grabJsonField]
REGEX= ^[\{](?<json_field>{[^\}]+})
FORMAT= json_field::$1
(or whatever you need to get the whole field...
[breakUpJson]
SOURCE_KEY=json_field
DELIMS= "," , ":"
This is just an example because often when JSON is embedded in a message, it isn't necessarily properly structured, so you can allow for whatever nuances there are.
Take a look at this:
http://docs.splunk.com/Documentation/Splunk/6.1.1/Admin/Transformsconf
I don't think you really want a summary index here since it isn't for a subset of data, and you aren't summarizing anything.
There are also a number of examples in answers of different techniques for extracting JSON from within a "mixed" event.
When you are reading these, note the following nuance:
Note, my answer says "REPORT" will extract fields for search (index time would be TRANSFORMS)
Since you are looking for an alert,
Take the fields from your JSON and create a lookup. Query the lookup.
Thanks for response. Any help on step # 2 in the above post?
Also, I might store only subset of data aswell since only 30-40 fields from index_1 are useful out of 60+ fields.
I've amended the answer to reflect search time extractions only.
(there is nothing wrong with index time extractions. Best Practice is to consider it carefully... because it is FOREVER)
I have also suggested that you may want to create a lookup from your search and then query the lookup for faster response.
In brief, this is what I'm looking for
1. Index the _raw data as it is to say index_1.
2. Extract the name & values which are in JSON format from index_1 into new index say index_2. I'm able to extract all 60 fields from index_1, but how to save the extracted fields into new index?
3. Write query on index_2 and search query will be faster.
4. Create real time alerts based on pre-defined pattern from index_1 (hence customer dont want to exaction during index time). This is not the problem now.
I'm facing problem in step-2 and if any other ways of solving this problem.
Thanks for the response.
But, I dont want to extract fields during indexing (customers also dont want) as the latency of events arrival increased. Even Splunk document suggest not to do extracting during index time.