Thanks. Since I will have a new 5 million event csv file every day generated from the previous days web events, I guess loading that into splunk and adding it to the config file and restarting splunk
is not an effient way to do this.
Real time lookups are not practical either, would take forever to retrieve all the data from Oracle.
Is there a way to manage the addition of a lookup file dayly without manual intervention?
... View more
I have created a csv lookup table and have successfully loaded it into splunk and used it in a search command
sourcetype="access_log-2" | lookup title_lookup isbn_tag OUTPUTNEW title, pub
now when I go off and use other search commands that dont use this lookup, the data that was looked up now longer is available in the searches. I have to re-issue the lookup command to have the new fields available.
Do the new fields get added to the events permanently so I dont need to issue the lookup command on every search command?
I wanted to be able to enrich the event log with one days worth of lookup data, then create a new lookup table the next day and enrich the new events, and so on, having this data permanently part of the events and the old lookup tables would not be needed anymore.
Is this possible?
Another way I might do this is to nightly append onto the web log events the new fields that are retrieved from remote DB's. Done in Perl outside of splunk.
Since this web data is ingested into splunk in real time, would a search on the web logs see the new fields that were added after the ingestion?
... View more
I need to enrich my event data (web logs) with several other fields based on a value of one of the events fields. I plan to use a lookup that calls an script to go get the fields.
When I run the search again, containing the lookup, will it go and lookup events that were already looked up? Does splunk know to only enrich events that not yet enriched?
... View more
I need to parse apache web logs that can run into the billions of requests per month. I need to coorelate and aggregate all this data and be able to display these results for up to a year back. I have two approaches and can't decide which is the better approach being new to splunk.
Problem: for a given web event, I need to create a custom search expression to extract several fields that are not cleanly parsed by the default field extraction.
Then take the extracted ISBN data field and run it through a perl or python ISBN converter script to get normalized 13 digit values. I then need to call an external Oracle DB system through perl scripts to obtain the title and publisher. I then need to aggregate the occurances of this ISBN per client, which is another field in the event. The user then would want to view this data by seeing the results by client or by top clients, all clients, by ISBN, by publisher. There could be thousands of unique clients.
This data would need to be viewed by marketing separate from the views of the IT dept.
I have proposed to do all the field extractions, ISBN normalization, Title/publisher lookups and aggregation in perl scripts and create a CSV file that represents one days worth of event parsing.
Having this parser run as a cron job once a day. I then proposed that this csv data would be fed into splunk and using the search and stats commands to build views of this tabular data. This of course requires that splunk now has to keep the original event logs, used by the IT guys, and my new csv tables, to be used by marketing.
I would build a dashboard for the marketing team and have custom searches created. Like by-client, all-clients, by-publisher etc...
They want me to try and do this all in a custom search in real time and not pre-parse/stage this data as I just mentioned. Does means rewritting all the perl into python (unless perl can be used?). I see conflicting examples of only python scripts can be used and in some cases perl is used. Can this be done?
Can perl be used in a search command?
When I retrieve the normalized ISBN, title and publisher, will this data be added to the original indexed set of events so this data is not retrieved every time a user views the data?
Should I use summary indexing and aggregate daily to speed up user reports?
We have a licensed version of Splunk so I have access to all its capabilities.
What is the suggested approach?
It seems that if I built a custom search command that parses billions of events, normalizes the ISBN, retrieves the title/publisher for every event then provides a roll up analysis of this data per client for the selected time frame would take a long time. If the user then clicks by-pubisher would the entire data set be processed again? Shouldn't all this data be saved into a new table somehow, so the next user request doesn't go through it all again?
... View more
I need to aggregate the values found in the apache weblogs. First I need to parse out several fields. I can get these fileds parsed out. But now I need to aggregate the counts
of these fields. For example, the number of elements requested per client over a selected time range. So I need to count all the elements for each client and display them in a graph. And also show in descending order the clients that requested an element. Is this doable? If so what components do I use?
... View more