Splunk Search

Need to create summary index continuously realtime

indeed_2000
Motivator

Need to create summary index continuously realtime, now have two questions:

1-run splunk forwarder on client and logs send to splunk server, in each lines lots of data exist so need to create the summary index as soon as log received and store summary of line on that summary index continuously realtime.

 

2-is it possible Automatically Create new index for each day like this myindex-20240115, myindex-20240116, as data comings from forwarder? 

 

Thanks

Labels (3)
0 Karma

PickleRick
SplunkTrust
SplunkTrust

While technically one could think of a solution writing dynamically to a different index each day (you'd still need to have pre-created indexes though), Splunk is not elasticsearch so don't bring habits from there over to Splunk. Splunk works differently and has other methods than elastic of storing data, indexing it and searching.

And the summary indexing thing - well, if you want to summarize something, you must have something to summarize. So you're always summarizing over some period of time. That conflicts with summarizing "realtime". Anyway, even if you had the ability of creating some summary for - let's say - sliding window of 30 minutes "backwards", as soon as you'd summarize your data, that summary would be invalid due to new data incoming. So it makes no real sense.

0 Karma

indeed_2000
Motivator

@PickleRick So what is your suggestion for this?

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Unless you have a very strange use case, there is hardly ever a need for splitting your data (of the same kind) into separate indexes only because you want to make searches faster.

There are "two and a half" (two basic and one more advanced) cases of when you need to split your data among indexes

1. Since you grant access to roles on a per-index basis, you need to split data into indexes if you want to differentiate access to that data separately

2. Retention policies (time to frozen, max index size and so on) are specified on a per-index basis so if you need to store some data longer than the rest, you send it to another index

3. You might want to split your data into separate indexes if you have separate "kinds" of data (different sourcetypes or data from different sources) which have greatly differing volume characteristics (for example, you're getting a huge load of flow data from your network devices and you get just a few alerts daily from your DAM solution - in such case you'd want to separate the DAM events from the network events so that you can search the DAM events quickly, not having to shovel through buckets of network data).

But apart from that Splunk can handle huge loads of data pretty efficiently. And since Splunk stores data in buckets and searches only through buckets relevant to your search time range there is no need to split your data further into indexes just based on time - it would only make managing your data harder because you'd have to keep track of where your data is, which indexes you have to search from and so on.

0 Karma

indeed_2000
Motivator

@PickleRick thanks, how about this part:

1-run splunk forwarder on client and logs send to splunk server, in each lines lots of data exist so need to create the summary index as soon as log received and store summary of line on that summary index continuously realtime.

 

 

0 Karma

isoutamo
SplunkTrust
SplunkTrust

Hi

as other have already said, with splunk there is no reason to split your index based on time as splunk storing data always as time series based and do this splitting by time automatically when is store events into buckets.

Also as said summary index means that you want to do some calculations (statistical summaries/functions) based on your data. If you can open your use case we probably could propose some best practices with it based on splunk?

When you are refer to elastic search or rdbm vs. splunk, usually there is no sense to do it as Splunk works totally different way. Quite often those best practices (and what you must do with those) is almost worst case solutions with splunk. Usually there is much better way to handle those with splunk.

r. Ismo

0 Karma

indeed_2000
Motivator

@isoutamo Hi, forgot about elastic, and separated index.

need to send raw log via forwarder to splunk and create dashboard that work with metrics that exist in log.

what is the most effective performance solution in splunk that work realtime and historical data? Need to load dashboard quickly and accurately e.g span in timechart is 1s.

 

FYI:data coming from several servers and it is lots of log lines in each second.

 

 

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Again - you are very vague about your needs. Also you might have chosen the solution badly - Splunk can do "realtime" but realtime searches have their limitations and are very resource intensive.

To show you an analogy - it is as if you asked "what car should I buy that is most cost-effective? It must be red". We don't know what is it you need to do with that car, whether you need a sports car, a semi-truck or a bus, we don't know what is your reason for owning that car, but you want it to be cost-effective and be painted red.

Depending on context, it could be a Mazda MX-5, a city bus or a Caterpillar 797 in red paint.

PickleRick
SplunkTrust
SplunkTrust

As I wrote before - there is no such thing as "summary realtime". Depending on your particular usecase, you might create a fairly frequent (every 5 minutes? maybe even every minute if you have enough resources but you might run into problems with event lag) scheduled search summarizing your data. But there is no general solution here - it will depend on the particular requirements. Maybe the summary even isn't needed at all, maybe it's just a matter of properly searching the data you have. I don't know, you're very vague in describing your problem.

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @indeed_2000,

avoid to create an index for each day, Splunk isn't a database and an index isn't a table: you should create a different index when you have different retentions or different access grants.

You can search in an index using the timestamp even if it's only one index, you don't need to have a different index by day!

The update frequency of a summary index depends on the scheduled search you are using, as the name itself says, it's scheduled, so you have to give a schedule frequency, that can also be very frequent, depending on the execution time of the search itself: so e.g. the scheduled runs in 30 seconds, you can schedule it every minute, but I don't hint to run too frequently, becsuer you could have skipped searches.

Also running a search in Real Time it's possible but it requires many resources, so avoid it.

Ciao.

Giuseppe 

0 Karma

indeed_2000
Motivator

Hi @gcusello in each day there are lots of datapoints that’s why i need to break it down into the multiple index, probably it will increase speed of search like sharing mechanism in elasticsearch or influxdb.

Tags (1)
0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @indeed_2000 ,

yes probably parallelizing data in more than one index it's a good idea for increasing speed of search, but having an index for day it's a little exagerate and very difficoult to manage!

Eventually divide your data in two or three indexes, but not more!

Ciao.

Giuseppe

 

0 Karma

indeed_2000
Motivator

@gcusello Is there any other technique that work with this condition?

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @indeed_2000 ,

about the indexes, as I said, you could use one index or divide your data in few manageable indexes based e.g on the type of data flows.

To have performant searches on large amount of events, using only some predefined fields (not search on raw logs), you could use a custom (or from CIM) accelerated datamodel, i?m using one of them for one of my customers that needs to fastly search on 8 fields of billions of events.

Then I configured a drilldown on the indexes to display the raw events filtered with the results of the datamodel search.

Ciao.

Giuseppe

0 Karma
Get Updates on the Splunk Community!

More Ways To Control Your Costs With Archived Metrics | Register for Tech Talk

Tuesday, May 14, 2024  |  11AM PT / 2PM ET Register to Attend Join us for this Tech Talk and learn how to ...

.conf24 | Personalize your .conf experience with Learning Paths!

Personalize your .conf24 Experience Learning paths allow you to level up your skill sets and dive deeper ...

Threat Hunting Unlocked: How to Uplevel Your Threat Hunting With the PEAK Framework ...

WATCH NOWAs AI starts tackling low level alerts, it's more critical than ever to uplevel your threat hunting ...