Re: Only include events that match a list of 2000 ...

johnjohnson2 · ‎09-03-2013

I have some logs that can include any one of 50,000+ users. But, i only need to index and keep a subset of that -- approximately 2000 users.. I'm looking for the most efficient way to only include logs that are associated with these users.

I thought of using transforms.conf and doing a ridiculously long regex to match those users, but, looking for any better ideas.

Props.conf
[host::blah]
TRANSFORMS-null= setnull

Tranforms.conf
[setnull]
REGEX=
DEST_KEY=queue
FORMAT=nullQueue

lukejadamec · ‎09-03-2013

I have an automatic lookup table of all Oracle returncodes/descriptions, which is a few times larger than what you’re looking to do, and I see zero performance impact.

Splunk docs (http://docs.splunk.com/Documentation/Splunk/5.0.4/Indexer/Indextimeversussearchtime) says there is a performance hit from index time extractions, so you should avoid it if you can – some mumbojumbo about making the index larger which makes all searches slower. However, it looks like you're doing a nullQueue as opposed to adding a new field, so it may work just fine.

If you really need to do this at index time, then you should figure out a way to automate the management of the regex and then just drop it in what Kristian posted.

It will be far easier to manage a csv lookup table, then it would be to manage a regex of that size.

Please post your results if you do do index time filtering with regex on this because I am curious of the impacts.

johnjohnson2 · ‎09-03-2013

These are iis logs that include usernames (cs_username)

kristian_kolb · ‎09-03-2013

Do these accounts have some sort of distinguishing pattern, like da_xxxxx, admxxxxx, sys-xxxxxx.
Otherwise the regex would be awful to maintain.

Is there perhaps some other field in the events that can be used to make the filtering on a broader scope.

Also, as per the docs on nullQueueing, you'll need to add an extra transform to keep some of the events;

http://docs.splunk.com/Documentation/Splunk/5.0.1/Deploy/Routeandfilterdatad#Keep_specific_events_an...

props.conf

[your_host]
TRANSFORMS-blah = setnull, keepsome

transforms.conf

[setnull]
REGEX = .
DEST_KEY = queue
FORMAT = nullQueue

[keepsome]
REGEX = here is where you write your super regex
DEST_KEY = queue
FORMAT = indexQueue

K

kristian_kolb · ‎09-03-2013

That pretty much answers the question I was asking. Are there any other distinguishing features that can be used for filtering, e.g. the c-ip, if the users you want to keep come from a certain ip-range.

Are you constrained license-wise? Otherwise you might index more data than you need and use tags or automatic lookups to your advantage. Not sure that it would consume less resources, but it would likely be more manageable.

/k

johnjohnson2 · ‎09-03-2013

These are AD usernames so they are all different if that answers what you are trying to ask

kristian_kolb · ‎09-03-2013

My question was rather, what differs between the usernames you want to keep, and those you want to throw out?

Are the all usernames just arbitrary strings, e.g. bob, apple, horse, crane, alice? And there is no pattern that can be used to filter out the unwanted ones. You simply have to know that 'crane' and 'horse' are the ones to keep.

johnjohnson2 · ‎09-03-2013

These are iis logs that include usernames (cs_username)

Only include events that match a list of 2000 different users

Let’s Talk Terraform

Cloud Platform | Customer Change Announcement: Email Notification is Available For ...

Save the Date: GovSummit Returns Wednesday, December 11th!