About tlagatta_splunk

tlagatta_splunk · ‎08-24-2017

Hi Jodros, check out the Security Essentials app, which goes through a lot of statistical use cases like this: Security Essentials App: https://splunkbase.splunk.com/app/3435/ For example, you can start from the "Sources Sending a High Volume of DNS Traffic" use case in the Security Essentials app. This query identifies hosts with very high traffic (more than 3 standard standard deviations). You should be able to adapt this to your use case: | inputlookup dns_data_anon.csv | convert mktime(_time) timeformat="%Y-%m-%dT%H:%M:%S.%3Q%z" | bucket _time span=1h | stats sum(bytes*) as bytes* by src_ip _time | eventstats max(_time) as maxtime avg(bytes_out) as avg_bytes_out stdev(bytes_out) as stdev_bytes_out | eventstats count as num_data_samples avg(eval(if(_time < relative_time(maxtime, "@h"),bytes_out,null))) as per_source_avg_bytes_out stdev(eval(if(_time < relative_time(maxtime, "@h"),bytes_out,null))) as per_source_stdev_bytes_out by src_ip | where num_data_samples >=4 AND bytes_out > avg_bytes_out + 3 * stdev_bytes_out AND bytes_out > per_source_avg_bytes_out + 3 * per_source_stdev_bytes_out AND _time >= relative_time(maxtime, "@h") | eval num_standard_deviations_away_from_org_average = round(abs(bytes_out - avg_bytes_out) / stdev_bytes_out,2), num_standard_deviations_away_from_per_source_average = round(abs(bytes_out - per_source_avg_bytes_out) / per_source_stdev_bytes_out,2) | fields - maxtime per_source* avg* stdev*

tlagatta_splunk · ‎06-09-2016

Hello @sansay! That's an elegant solution and I'm glad you were able to solve your problem. You are spot-on in your response: the key is to find problem users and educate them on best practices. Have you seen the Search Activity app? This provides a granular view of how users are using Splunk. There are a lot of searches there which you can adapt in a similar way as above. https://splunkbase.splunk.com/app/2632/

tlagatta_splunk · ‎06-10-2015

What's your use case? Maybe DB Connect does the trick? https://splunkbase.splunk.com/apps/#/page/1/search/db%2520connect/order/relevance

tlagatta_splunk · ‎06-07-2015

I had this problem tonight on my local machine. I solved this by increasing the admin user's srchDiskQuota from 10000 to 100000. To do this, I created the file /etc/system/local/authorize.conf, and added the stanza: [role_admin] srchDiskQuota = 100000 Be careful about increasing this quota for non-admin users, as this can severely hamper performance.

tlagatta_splunk · ‎05-28-2015

This is a common problem when converting an Excel file to CSV. Try saving it as Windows Comma Separated Value (.csv), then you should be able to successfully upload the lookup. (Thanks to some friendly PS consultants for the help!)

tlagatta_splunk · ‎04-29-2015

@mmohiuddin, I encountered this same error last night when uploading some data. This morning, I tried uploading the data on a different network, and it worked without any problem. Were you able to resolve your issue?

tlagatta_splunk · ‎03-20-2015

@skawasaki_splunk provided a good answer to How to only display fields with values in a table, which I adapted to my situation. If your records have a unique Id field, then the following snippet removes null fields: | stats values(*) as * by Id The reason is that "stats values won't show fields that don't have at least one non-null value". If your records don't have a unique Id field, then you should create one first using streamstats: | streamstats count as Id | stats values(*) as * by Id (Warning: if your data has multivalued lists, then stats values will remove duplicates and sort lexicographically.)

tlagatta_splunk · ‎03-10-2015

"so can i control how many past values it will predict for calibration? Is there a min default setting of the number of past values it will predict for calibration?" By default, the predict command uses all past values to build a model of the timeseries (incl. best-fit curve and uncertainty envelope). The holdback argument allows you to leave recent points out of the training process. If you have enough data points (1 time span = 1 data point), then the best-fit curve and uncertainty envelope should both track closely to the past data. If not, then add more historical data or choose a finer span. "From a visual point of view it would be good to be able to do the calibration and then have the option to remove it also. but hey :)" I do not advise removing this, even for visualization purposes. If something in your data changes and the prediction loses its accuracy (e.g., some rare event occurs and severely changes the model), then you want to see that immediately. When you use the predict command to make decisions, you should do so based on both the past & future trendlines, rather than a mix of the raw data & the future trendline alone. In terms of options, you can always use the search language to further manipulate the data. The following query will remove the prediction from rows where the count field is non-null. I can't prevent you from doing this, but I do strongly advise you against it 🙂 | foreach prediction [eval <>=if(isnotnull(count), null(), '<>')]

tlagatta_splunk · ‎03-10-2015

Hi @HattrickNZ, glad it helped. "it always seems to predict for values I already have" This is a feature, not a bug! You should always predict the past values, to calibrate the prediction and make sure it's doing what you expect it to do. In many cases, the first attempt will do a poor job of predicting the past, which means you have to tweak it to make things work (e.g., add more historical data or make the timespan finer, like change span=1mon to span=1w). If you only predict the future, you won't know if the prediction is bad or not until you have to make decisions on it, which is usually too late. "how would i do justa simple linear forecast?" Unfortunately, simple linear regressions are not implemented in the core product right now. If you're looking for just linear trendlines, this community-wiki post on plotting a linear trendline might help. Keep in mind that the predict command implements a Kalman filter, so it's a pretty robust way to make temporal predictions.

tlagatta_splunk · ‎03-09-2015

Hi @HattrickNZ. The default algorithm is LLP5 ("Uses the sum of the LLT and LLP models for its combined prediction."). The holdback command specifies the number of data points from the end that are NOT used to build the model. In your example, holdback=3 means "build a model from points 1-7 and predict the values for points 8-10". This is good for testing and validating the predict command. The predict command uses a Kalman filter to make its prediction, which incorporates a noisy model for the real world. The yellow line is the "best guess" of the "true state" of the world. Since Splunk does a good job capturing your data, you should expect that the blue timechart and yellow best-guess up pretty closely (as they do in your image). If they don't line up, then don't trust the prediction (you can add more data, choose a finer span, etc.). Hope this helps.

tlagatta_splunk · ‎02-11-2015

Using fields - nosuchfield is not satisfactory, since I might not know what the null field names are in advance.

tlagatta_splunk · ‎02-11-2015

Sometimes Splunk has extra null fields floating around (e.g., after fields nosuchfield * ). Is there a command which automatically removes fields which have only null values?

tlagatta_splunk · ‎12-04-2014

Note that in 6.2, splunkweb has been incorporated into splunkd: http://docs.splunk.com/Documentation/Splunk/6.2.0/Installation/Aboutupgradingto6.2READTHISFIRST#The_splunkweb_service_has_been_incorporated_into_the_splunkd_service

tlagatta_splunk · ‎12-01-2014

@karthikp1989, here are the docs pages for creating views & dashboards in Advanced XML: http://docs.splunk.com/Documentation/Splunk/latest/AdvancedDev/AdvancedIntro

tlagatta_splunk · ‎05-21-2014

Thanks, Lisa! That's an interesting workaround. However, it doesn't quite answer my "under the hood" question. Suppose that we have 500,000 products. I'd like to know whether Splunk is literally storing the median value 500,000 times during the piping (replicated across each product name), or whether the replication happens purely at the end of the pipeline (when the visual table is being generated).

tlagatta_splunk · ‎05-20-2014

I have a table with attributes ProductName and TotalSales , and I would like to extract the rows which are in the top 50% of total sales. Naively, I would pipe this into search TotalSales>=median(TotalSales) . However, since search doesn't support the median function, Splunk returns no events. I can make this work via the following hack: | eventstats median(TotalSales) as MTS | where TotalSales>=MTS | fields - MTS I'm worried about the efficiency of this hack. If I omit the fields - MTS command, then the output is a table with attribute MTS, with the median value replicated across all rows. If I have only 20 products, then this isn't that big of a deal, but if I have 500,000 products, then this is an enormous amount of redundancy in memory. My question: what is Splunk doing under the hood? That is, is Splunk literally replicating the median value in memory dc(ProductName) of times, then deleting it once I remove the MTS field? Or, is Splunk smart enough to use the median value just once, and only replicate it only if I insist on viewing the entire table with the MTS field?

tlagatta_splunk · ‎05-14-2014

Thanks Damien! That's good to know. Both should be mentioned on the documentation page.

tlagatta_splunk · ‎05-12-2014

Source: http://cran.r-project.org/doc/manuals/R-admin.html

tlagatta_splunk · ‎05-12-2014

FYI, the documentation for the R Project app states that "For Mac OS X and Linux or Unix, [the default installation path is] probably /usr/bin/R ." This is not correct. According to the official documentation at the CRAN R Project, "The default installation path for R.framework is /Library/Frameworks ".

Posts	19
Solutions	2
Karma Given	177
Karma Received	17
Member Since	‎05-12-2014

Online Status	Offline
Date Last Visited	‎07-21-2021 05:02 PM

How do I remove a null field?

Efficiency of comparing against summary statistics

Documentation: default installation path for R

Re: Create alert when average events greater than ...

Re: How to prevent users from querying all indexes...

Re: Can you use SQL functions with the Splunk ODBC...

Re: Your maximum disk usage quota has been reached...

Re: Why am I getting error "File has no line endin...

Re: Why am I getting error "Parameter name: Path m...

Re: How do I remove a null field?

Re: splunk 6.1.2 + predict Questions/Clarification...

Re: splunk 6.1.2 + predict Questions/Clarification...

Re: splunk 6.1.2 + predict Questions/Clarification...

Re: How do I remove a null field?

How do I remove a null field?

Re: Splunk (server/forwarder) programming language...

Re: Can you refresh a single module/chart without ...

Re: Efficiency of comparing against summary statis...

Efficiency of comparing against summary statistics

Re: Documentation: default installation path for R

Re: Documentation: default installation path for R

Documentation: default installation path for R