Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Splunk Search

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Community
- :
- Splunk Answers
- :
- Using Splunk
- :
- Splunk Search
- :
- How to create a search to predict license violatio...

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark Topic
- Subscribe to Topic
- Mute Topic
- Printer Friendly Page

- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

JdeFalconr

Explorer

11-13-2014
04:44 PM

I'm trying to use commands like `predict`

and `trendline`

to write a search that will alert on a predicted license violation for the day. While http://answers.splunk.com/answers/39980/license-violation-prediction.html has some good information as noted in its comments the search does not return accurate results in terms of data volume.

In writing this search I've realized that what I need to predict is the ever-growing sum of license volume during the day. In other words, say X is my total license volume for the day. Each data point in my search is going to be an ever-increasing value for X as more and more data is indexed (example of data points: 1GB at 1AM, 2GB at 2AM, 3GB at 3AM, 4GB at 4AM, etc). My goal, of course, is to predict what X will be at midnight.

While the search commands I know of (such as those in http://wiki.splunk.com/Community:TroubleshootingIndexedDataVolume) will provide a total indexed volume for a time span I know of none that will plot a series of data points which represent a sum of a value at different points in the day, as with index volume. In other words, right now I can run a search that spans from midnight today until the current time and sums up the total volume of indexed data. However the search I would need to execute in order to make a prediction would have to give me the sum volume of indexed data from midnight until 1am as well as from midnight until 2am, from midnight until 3am, and so on up to the current time.

How would I go about creating such a search that gives me the data points I need to make a prediction?

1 Solution

- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

martin_mueller

SplunkTrust

11-15-2014
06:51 AM

First to answer the question at the bottom, assuming you have a search that gives you a figure for midnight to 1am, 1am to 2am, and so on - basically a timechart of indexed volume - you can turn that into an accumulated figure using `accum`

:

```
earliest=@d latest=now ... | timechart sum(volume) as hourly_volume span=1h | accum hourly_volume as running_total
```

As for the actual use case of predicting today's volume, consider something like this:

```
index=_internal source=*license_usage.log* type=Usage
| timechart span=1h sum(b) AS volume_b | predict volume_b as prediction future_timespan=24
| addinfo | where _time>=relative_time(info_max_time, "@d") AND _time<relative_time(info_max_time, "+d@d") | fields - info*
| eval merged = coalesce(volume_b, prediction) | stats sum(merged) as predicted_volume sum(volume_b) as volume_so_far
```

If run over a reasonably long timerange this will use historical data to predict the volume for the remaining hours of the day and compute a sum of the actual data for today until now plus the predicted data for the remainder of today.

Make sure to use `@h`

as latest to not count a partial hour as a whole hour, or decrease the bucket size. Also make sure the figure for `volume_so_far`

lines up with the LURV figure.

- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

martin_mueller

SplunkTrust

11-15-2014
06:51 AM

First to answer the question at the bottom, assuming you have a search that gives you a figure for midnight to 1am, 1am to 2am, and so on - basically a timechart of indexed volume - you can turn that into an accumulated figure using `accum`

:

```
earliest=@d latest=now ... | timechart sum(volume) as hourly_volume span=1h | accum hourly_volume as running_total
```

As for the actual use case of predicting today's volume, consider something like this:

```
index=_internal source=*license_usage.log* type=Usage
| timechart span=1h sum(b) AS volume_b | predict volume_b as prediction future_timespan=24
| addinfo | where _time>=relative_time(info_max_time, "@d") AND _time<relative_time(info_max_time, "+d@d") | fields - info*
| eval merged = coalesce(volume_b, prediction) | stats sum(merged) as predicted_volume sum(volume_b) as volume_so_far
```

If run over a reasonably long timerange this will use historical data to predict the volume for the remaining hours of the day and compute a sum of the actual data for today until now plus the predicted data for the remainder of today.

Make sure to use `@h`

as latest to not count a partial hour as a whole hour, or decrease the bucket size. Also make sure the figure for `volume_so_far`

lines up with the LURV figure.

- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

martin_mueller

SplunkTrust

11-20-2014
03:40 PM

For added efficiency do the conversion to GB at the end so you only do the calc once rather than per event 😄

You should run this search over e.g. -7d@d to @h, so your prediction has some data to work with. After the prediction is run the where cuts off everything from before today and sums up the data so far and the data so far plus the prediction for the remainder of the day.

- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

JdeFalconr

Explorer

11-20-2014
09:19 AM

Wow that's fantastic! Much more than I had anticipated getting. That search worked great too. The one challenge I have in it is I can't seem to get any results for "volume_so_far" relative start/end times with Earliest and Latest to "@d" and "@h," respectively; I'm guessing that's conflicting with the "where" statements. But that's not a big deal, I've been able to surmount that with the scheduled search settings which don't seem to offend the results much.

Thanks again for the great help and for the complete response.

In case others see this and want the results in GB as opposed to raw bytes I modified the search ever so slightly to give results in terms of GB:

```
index=_internal source=*license_usage.log* type=Usage
| eval GB=((b/1024)/1024)/1024
| timechart span=1h sum(GB) AS volume_b | predict volume_b as prediction future_timespan=24
| addinfo | where _time>=relative_time(info_max_time, "@d") AND _time<relative_time(info_max_time, "+d@d") | fields - info*
| eval merged = coalesce(volume_b, prediction) | stats sum(merged) as predicted_volume sum(volume_b) as volume_so_far
```

Did you miss .conf21 Virtual?