About cantgetnosleep

cantgetnosleep · ‎09-04-2014

Thanks for your help!

cantgetnosleep · ‎09-04-2014

"Since the search-time extractions are not applied until after all other filtering criteria, Splunk only has to "chew through" a relatively small set of potential matches" -- Yeah, that makes sense, and I understand that. But my main question is really about how index-time fields work. Are they used to initially reduce the set of buckets?

cantgetnosleep · ‎09-03-2014

What I would expect is that if the user field is added as a custom index field, there is some sort of hash table or lookup table or other optimization such that if I specify a specific user or groups of users to search for, Splunk wouldn't have to actually go back and search through all of the events, or buckets, for those values. On the other hand, with search-time extracted fields, I get the understanding/feeling that Splunk is going back and actually searching for these fields event by event, and will have to basically chew through every event in the time span to do so.

cantgetnosleep · ‎09-03-2014

Thanks for the reply. Yeah, I've already read through the docs you linked to. The crux of my question is hinted at in this: "Second, the host, source and sourcetype are special fields that help to quickly identify the buckets." This makes it sound like these indexed fields act like indexes on a database table. However, we're told that adding more indexed fields doesn't speed up the search. What I'm trying to understand is if Splunk can avoid having to do a linear, backwards text search when searching for fields that are extracted at search time. Say I want to find all events with user=XYZ.

cantgetnosleep · ‎09-03-2014

Yeah, I've already read those other docs. Thanks for the thoughtful reply, however it's not really answering what I'm trying to understand.

cantgetnosleep · ‎09-03-2014

"why not run one search that lists the first successful auth event for each user that appears in the data and then compare that to the list of 700,000 users" -- That's a great idea! Thanks.

cantgetnosleep · ‎09-03-2014

Thanks for the reply. A question. If "all keywords in events are already indexed", why do the default, index-time fields exist at all? What's the point of them? How are they different than keyword indexes, or are they?

cantgetnosleep · ‎09-03-2014

Where can I find a detailed explanation on how the splunk search algorithm works? There is a pretty good explanation in the docs on how the indexes themselves are created, but I can't find anything as detailed on how the indexes are used. I found this: http://www.splunk.com/web_assets/pdfs/secure/Splunk_and_MapReduce.pdf Is there anything else? For example, if I search for all events where IP=111.11.11.111, how does splunk use the default indexed fields, the raw data files, and the extracted fields to find the matches? If a field is extracted at search time, does it essentially have to do a raw-text search through all possible events? Do fields parsed at index time behave differently? If splunk search is mapreduce and distributed, why do I always receive search results in a reverse time-linear fashion? Thanks, Andrew

cantgetnosleep · ‎09-03-2014

I've read the docs in the splunk manual on parse-time indexed fields. http://docs.splunk.com/Documentation/Splunk/6.1.3/Data/Configureindex-timefieldextraction But I still have a question. We're going to be search 15 months worth of authentication data to see if users have logged in within the previous 15 months. We'll have to do this search for 700,000 different user IDs. So the speed of the individual search is very important. We've already decided to create a summary index that extracts the auth information from the main LDAP and Active Directory logs and creates a new, reduced data set. However, I'm still concerned that search 15 months worth of data will take a LONG time when repeated 700,000 times. For example, if each search requires an average of 0.5 seconds, our search will take 4 days. I'm wondering if creating an index-time field for the user id would speed things up dramatically? This is what we'd do with a database table, but I'm not sure if "indexed" means the same thing in Splunk. Basically each search would need to go back and look for the first successful auth event for each user ID, and could stop there. Unfortunately, we expect a significant number of these to fail, and thus to have to repeatedly search the entire data set. Does this sound like a good use case for creating an index-time field? Thanks, Andrew

cantgetnosleep · ‎08-21-2014

Awesome. Thanks! Those were very helpful answers.

cantgetnosleep · ‎08-20-2014

How does splunk handle transactions that span search time boundaries? If a transaction starts before a search interval, but finishes within it, is it included in the search? Also, if a transaction begins within the search interval but ends after it, how is that handled? Thanks, Andrew

Posts	11
Solutions	0
Karma Given	1
Karma Received	2
Member Since	‎01-17-2013

Online Status	Offline
Date Last Visited	‎06-05-2020 02:04 AM

Where can I find a detailed explanation on how Spl...

Would creating fields at index-time improve search...

How does Splunk handle transactions that span sear...

Re: Would creating fields at index-time improve se...

Re: Where can I find a detailed explanation on how...

Re: Where can I find a detailed explanation on how...

Re: Where can I find a detailed explanation on how...

Re: Where can I find a detailed explanation on how...

Re: Would creating fields at index-time improve se...

Re: Would creating fields at index-time improve se...

Where can I find a detailed explanation on how Spl...

Would creating fields at index-time improve search...

Re: How does Splunk handle transactions that span ...

How does Splunk handle transactions that span sear...