Solved: Re: OpenAI API add-on

Remigiusz · ‎07-18-2023

Hi,
I want to ask if I can use generative AI to generate SPL based on my Splunk indices and the data models in those indices. The main story is being able to type in the input field what you want from Splunk and then return you a usable SPL.
Is this possible using the Open AI API add-on? Is there any other recommended tool?

PickleRick · ‎07-18-2023

1. OpenAI produces things that are rarely usable.

Example -

[begin chatgpt]

Certainly! Here's an example of a Splunk SPL search that finds all network sessions initiated from a host with IP 172.16.0.4 (stored in the src_ip field) from the last two weeks and performs a timechart of the count over destination IP addresses (stored in the dest_ip field) aggregated to the /26 level:

index=<your_index> src_ip="172.16.0.4" earliest=-2w
| stats count by dest_ip
| iprange dest_ip
| eval dest_ip_prefix = cidrize(dest_ip, 26)
| stats sum(count) as count by dest_ip_prefix, _time
| timechart span=1d sum(count) by dest_ip_prefix

[end chatgpt]

At first glance it seems legit. The problem is that Splunk doesn't know about any "iprange" or "cidrize" (and that was the point of the whole exercise!)

And even if it did, the final two lines are completely pointless. Statsing over _time without binning usually doesn't do anything useful. It should have been done with just the timechart.

2. Partially shown above - automatically generated code - even if it's giving you right results - is often highly sub-optimal performance-wise.

View solution in original post

PickleRick · ‎07-18-2023

1. OpenAI produces things that are rarely usable.

Example -

[begin chatgpt]

Certainly! Here's an example of a Splunk SPL search that finds all network sessions initiated from a host with IP 172.16.0.4 (stored in the src_ip field) from the last two weeks and performs a timechart of the count over destination IP addresses (stored in the dest_ip field) aggregated to the /26 level:

index=<your_index> src_ip="172.16.0.4" earliest=-2w
| stats count by dest_ip
| iprange dest_ip
| eval dest_ip_prefix = cidrize(dest_ip, 26)
| stats sum(count) as count by dest_ip_prefix, _time
| timechart span=1d sum(count) by dest_ip_prefix

[end chatgpt]

At first glance it seems legit. The problem is that Splunk doesn't know about any "iprange" or "cidrize" (and that was the point of the whole exercise!)

And even if it did, the final two lines are completely pointless. Statsing over _time without binning usually doesn't do anything useful. It should have been done with just the timechart.

2. Partially shown above - automatically generated code - even if it's giving you right results - is often highly sub-optimal performance-wise.

Remigiusz · ‎07-18-2023

I had similar problems with the generated SPL on the chat gpt site, so I'm curious if the splunk add-on will at least partially solve this problem. Did you use add-on or was the message from their regular website?

PickleRick · ‎07-18-2023

I would not count on any automatic solution to "fix" such stuff.

So called "AI" is just a generator based on some huge corpus of already-seen solutions. It only correlates known patterns, it doesn't _understand_ what you're trying to do.

OpenAI API add-on

search

OpenTelemetry for Legacy Apps? Yes, You Can!

UCC Framework: Discover Developer Toolkit for Building Technology Add-ons

.conf25 Community Recap

Are you a member of the Splunk Community?

OpenAI API add-on

search

OpenTelemetry for Legacy Apps? Yes, You Can!

UCC Framework: Discover Developer Toolkit for Building Technology Add-ons

.conf25 Community Recap