About Stephen_Sorkin

Stephen_Sorkin · ‎03-29-2012

You have to use the concurrency command before the timechart command like: ... | concurrency duration=duration | timechart max(concurrency)

Stephen_Sorkin · ‎03-27-2012

The docs for stats aggregators is at http://docs.splunk.com/Documentation/Splunk/4.3.1/SearchReference/CommonStatsFunctions. However, it doesn't explicitly mention this, since it's just a consequence of the definition of avg as sum divided by count, and count is the number of occurrences.

Stephen_Sorkin · ‎03-27-2012

Since avg excludes nulls, you can use eval to turn 0 into null, like: ... | stats avg(eval(if(MyValue==0, null(), MyValue))) as Avg Here it's working in a toy example: | stats count | append [stats count | stats count] | stats avg(eval(if(count==0, null(), count))) as avg

Stephen_Sorkin · ‎03-26-2012

The first one will match the exact string "*foo" for file, which is not what you want. Instead, write stats count(eval(match(file, ".*foo"))) as fooCount by clientip

Stephen_Sorkin · ‎03-26-2012

Yes. You the like() function implements SQL's LIKE (where "%" is used as a wildcard, match() implements regex matching functionality (where ".*" is a wildcard) and searchmatch() takes a standard Splunk search expression (like searchmatch("field=foo") or searchmatch("field=\"foo bar\"") . You can read about these, and other, eval functions at http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/CommonEvalFunctions

Stephen_Sorkin · ‎03-24-2012

The typical way to do a join in splunk is via disjunction and stats, rather than relational join. For example: (sourcetype=a ...) OR (sourcetype=b ...) | eval c=if(sourcetype=="a", field_a, field_b) | stats values(x) values(y) values(z) by c Of course, you probably want to use the appropriate aggregator in the stats, to pick the fields that you're interested in and combine the (possibly) several values in the right way.

Stephen_Sorkin · ‎03-22-2012

I don't quite follow. This search should be able to count the number of occurrences of any set of types of events, globally, for each clientip.

Stephen_Sorkin · ‎03-22-2012

You can use eval within stats to count the events that match a predicate. For example: ... | stats count(eval(status=200 AND file="FOO")) as successful_foo_count count(eval(status=200 AND file!="FOO")) as other_access_count by clientip | where successful_foo_count>0 AND other_access_count==0

Stephen_Sorkin · ‎03-22-2012

This is potentially a very long discussion of the differences between Splunk, which seeks to index time-series, machine generated data, and Lucene, which was originally designed to index human-generated text documents. We can begin with your questions. Splunk has no notion of stop words. By default, Splunk indexes all keywords found in events, as defined by the segmentation rules. Splunk provides wildcard searches and phrase searches, but the index doesn't provide native proximity searches or regex searches. For those, we rely on subsequent commands in the search processing pipeline. Splunk aggressively compresses the rawdata we store, and we spend a lot of effort to make the indexes as small as possible, by means of explicit compression and other low footprint data structures. Typically, you can expect that the rawdata will be 10% the size of the original data and the indexes are 20-40% of the size of the original data, depending on entropy. Together Splunk typically requires 30-50% the size of the original raw data as storage. The index itself doesn't provide synonym support, since that's fundamentally a problem for human text. We provide an analogous concept however, in eventtypes, which can be used to represent meaningful classes of queries, including synonyms.

Stephen_Sorkin · ‎03-16-2012

Your approach of using a lookup is good, but I'd suggest a scripted lookup rather than a static CSV lookup (alternately you could use a custom search command; a discussion of the differences are at http://splunk-base.splunk.com/answers/3890/pros-and-cons-external-lookup-script-vs-custom-search-command). If you were to use a lookup, the contract the script would advertise is simple, it is presented three columns corresponding to IP address, one for the number of bits and one for the CIDR address, typically with the CIDR address left blank. Its task would be to fill out that column. For simplicity, you might want to make one lookup function and 32 lookup tables in transforms.conf, each which sends the number of bits to the script, so that you only have to provide two columns. A search command could do the same thing.

Stephen_Sorkin · ‎03-14-2012

It is not really possible in a "good" way. You can defeat the inclusion of these values in the heatmap calculation though, say by evaluating them to look less like numbers. For example, add to your search | eval Total = Total . " ." You could pick an arbitrary character here to make it not look like a number.

Stephen_Sorkin · ‎03-13-2012

You can access the second batch by changing the head predicate to (distinct_times <= 2) and follow that with | search distinct_times = 2 to pick the second batch. The same is true for the n-th set.

Stephen_Sorkin · ‎03-13-2012

This data is stored in the _indextime field. You can access it via: ... | eval indexed_time=strftime(_indextime, "%+") Or: ... | eval lag = _indextime - _time

Stephen_Sorkin · ‎03-08-2012

This sounds like a bug. The typer is typically invoked automatically in the first search command. Do eventtypes show up properly in the search/flashtimeline view? If so, then there is probably a bug with required field propagation, which determines which fields must be extracted up front. You can fix this by adding "| fields *" or "| fields eventtype" after the first pipe to convince splunk to extract the eventtype.

Stephen_Sorkin · ‎03-08-2012

Yes, you can do this using the "sort" command, supposing the processor time is in a field called % Processor Time : ... | sort - "% Processor Time" | head 1 | table host instance "% Processor Time" Now, you can make this more interesting by looking at the top per host: ... | dedup host sortby - "% Processor Time" | table host instance "% Processor Time"

Stephen_Sorkin · ‎03-07-2012

You can find the single highest day by sorting by Percent, descending, and taking the first row: index=monitoring | timechart span=1d sum(done) as Success sum(try) as Attempt | eval Percent=round(Success*100/Attempt,2) | convert ctime(_time) as Date timeformat="%d %B" | fields - _time | fields Date Percent | sort - Percent | head 1 Alternately you can tag the highest row using eventstats and eval: index=monitoring | timechart span=1d sum(done) as Success sum(try) as Attempt | eval Percent=round(Success*100/Attempt,2) | convert ctime(_time) as Date timeformat="%d %B" | fields - _time | fields Date Percent | eventstats max(Percent) as max_Percent | eval is_highest = if(Percent == max_Percent, "yes", "no")

Stephen_Sorkin · ‎03-06-2012

No. The "top" command is used to find the most frequent value of the field specified, here you specify 4 fields, so we return the most common combination of these four fields. You are looking for the "head" command: ... | head 1 | ...

Stephen_Sorkin · ‎03-05-2012

It should work fine. However, is it a typo in your REPORT-access line, where you call the transform fwd-access-combined rather than fwd-access-extractions ? You can also try putting this in a separate REPORT- line, alphabetically before access . I'd recommend overriding just this property in local/props.conf that way: [access_combined] REPORT-aaccess = fwd-access-extractions

Stephen_Sorkin · ‎03-05-2012

Yes, this is a good use of multivalued fields and the MV_ADD property in transforms.conf. First, add a stanza in props.conf like: [source::.../yourfile.log*] REPORT-numberedfields = numberedfields Then add the corresponding stanza in transforms.conf, which directs Splunk to read fieldnames followed by an ignored number, followed by the value, and, when duplicate field names are encountered, to accumulate into the same field: [numberedfields] REGEX = (\w+)\[\d+\]: ([^,]+) FORMAT = $1::$2 MV_ADD = True

Stephen_Sorkin · ‎03-05-2012

There are two choices here. The first is to use the usenull option to timechart: index=myindex sourcetype=mylog | eval productname=case(productid==12,"Product1",productid==13,"Product2",productid==14,"Product3") | timechart count by productname usenull=f The other is to explicitly filter before timechart: index=myindex sourcetype=mylog | eval productname=case(productid==12,"Product1",productid==13,"Product2",productid==14,"Product3") | search productname=* | timechart count by productname

Stephen_Sorkin · ‎03-05-2012

You need to set the prefix as well, since TIME_FORMAT will only look at the beginning of the line, by default: TIME_PREFIX = <timestamp> TIME_FORMAT = %Y-%m-%dT%T.%Q TZ = UTC

Stephen_Sorkin · ‎03-05-2012

When you search for type=delta in part of the search before the first pipe character, we use the heuristic optimization that the value of the field/value comparison, here delta is indexed. I presume that this heuristic fails here. You can fix this by changing "type" in fields.conf to not be an indexed field, at the consequence of some searches being slower. You can find the problematic events using: earliest=-7d latest=-2h sourcetype=x NOT delta status=fatal | search type=delta

Stephen_Sorkin · ‎03-05-2012

Since Splunk always delivers the most recent results first, and since you've set up timestamping for the events to be based on your creation timestamp, you can use streamstats and head to terminate the search as soon as it has seen more than one timestamp: ... | streamstats dc(_time) as distinct_times | head (distinct_times == 1)

Stephen_Sorkin · ‎03-05-2012

This is somewhat tricky. Once the data has left timechart, the values are assigned to fields named by the values of "Accountname". We have to preprocess the data and make sure that timechart doesn't get the undesired values to begin with. ... | bin span=30m _time | stats count(TaskCategory) as TaskCategoryCount by _time, Accountname | where TaskCategoryCount < 50 | timechart span=30m sum(TaskCategoryCount) as TaskCategoryCount by Accountname

Stephen_Sorkin · ‎03-05-2012

You can use streamstats to calculate this: index=AS sourcetype=AS_CDR (host=wdv-as03-01.mydomain.net OR host=wdv-as03-02.mydomain.net) earliest=-1441m@m latest=-1m@m | bin span=1m _time as minute | stats count as minute_count by minute | streamstats window=60 sum(count) as hour_count | eval hour_ending = strftime(minute, "%m/%d/%Y %H:%M") | table hour_ending hour_count | sort - hour_count

Posts	456
Solutions	178
Karma Given	156
Karma Received	905
Member Since	‎01-14-2010

Online Status	Offline
Date Last Visited	‎06-05-2020 02:02 AM

Re: how do I calculate peaks with concurrent event...

Re: Omitting zero values when calculating stats av...

Re: Omitting zero values when calculating stats av...

Re: Finding clientip where more than one of event ...

Re: Finding clientip where more than one of event ...

Re: How to do cross-app join?

Re: Finding clientip where more than one of event ...

Re: Finding clientip where more than one of event ...

Re: Search capabilities of splunk - How powerful i...

Re: group IP by CIDR range in results

Re: Excluding addtotals/addcoltotals from heatmap?

Re: Most recent set of events

Re: Showing indexed time?

Re: Eventtype recognition: When is the 'typer' com...

Re: Finding additional info about a value returned...

Re: find max of two averaged fields over a month o...

Re: Pick latest event

Re: Can you add another transform to an existing, ...

Re: Interesting regex/transforms.conf question

Re: Suppress NULL column in the result set

Re: Date and Time Extraction from XML

Re: Inconsistent Search results

Re: Most recent set of events

Re: timechart suppress values lower then x

Re: Run a search for every possible 60 minute peri...

Join the Conversation