Getting Data In

Why is REST API removing a leading pipe before an "inputcsv" command?

kcnolan13
Communicator

It appears that my use of the REST API is somehow causing a leading pipe to be stripped before an inputcsv command. I have this python search string:

 "| inputcsv scale_med_validation_data | apply fastflux_model | where 'predicted(is_attack)' = 1 | eval t = now()+3600*1 | eval report_hour=strftime(t, "%H") | eval report_date=strftime(t, "%m/%d/%Y") | tail 50 | collect index=fastflux_summary"

This works as desired when entered manually through the web interface.

However, when submitted through the REST API, the jobs screen shows the search query missing the leading pipe:

"inputcsv scale_med_validation_data | apply fastflux_model | where 'predicted(is_attack)' = 1 | eval t = now()+3600*1 | eval report_hour=strftime(t, "%H") | eval report_date=strftime(t, "%m/%d/%Y") | tail 50 | collect index=fastflux_summary"

Naturally, this causes the inputcsv to fail, and so none of the REST API jobs succeed. Why might the leading pipe not be making it through here?

1 Solution

frobinson_splun
Splunk Employee
Splunk Employee

Hey @kcnolan13,
I just heard back from our engineering team and there is an issue with the script as shown in the docs. Specifically, the issue is where it checks for queries starting with 'search' and then prepends 'search' if it's not found. Here is an updated script that should fix the problem. Note this update here:

" # If the query doesn't already start with the 'search' operator or another
# generating command (e.g. "| inputcsv"), then prepend "search " to it.
if not (searchQuery.startswith('search') or searchQuery.startswith("|")):
searchQuery = 'search ' + searchQuery"

I will update the docs. Let me know how this works for you!

import urllib
import httplib2
from xml.dom import minidom

baseurl = 'https://re-latitude.sv.splunk.com:8089'
userName = 'guest'
password = 'guest'

searchQuery = '| inputcsv foo.csv | where sourcetype=access_common | head 5'

# Authenticate with server.
# Disable SSL cert validation. Splunk certs are self-signed.
serverContent = httplib2.Http(disable_ssl_certificate_validation=True).request(baseurl + '/services/auth/login',
    'POST', headers={}, body=urllib.urlencode({'username':userName, 'password':password}))[1]

sessionKey = minidom.parseString(serverContent).getElementsByTagName('sessionKey')[0].childNodes[0].nodeValue

# Remove leading and trailing whitespace from the search
searchQuery = searchQuery.strip()

# If the query doesn't already start with the 'search' operator or another 
# generating command (e.g. "| inputcsv"), then prepend "search " to it.
if not (searchQuery.startswith('search') or searchQuery.startswith("|")):
    searchQuery = 'search ' + searchQuery

print searchQuery

# Run the search.
# Again, disable SSL cert validation. 
print httplib2.Http(disable_ssl_certificate_validation=True).request(baseurl + '/services/search/jobs','POST',
    headers={'Authorization': 'Splunk %s' % sessionKey},body=urllib.urlencode({'search': searchQuery}))[1]

View solution in original post

frobinson_splun
Splunk Employee
Splunk Employee

Hey @kcnolan13,
I just heard back from our engineering team and there is an issue with the script as shown in the docs. Specifically, the issue is where it checks for queries starting with 'search' and then prepends 'search' if it's not found. Here is an updated script that should fix the problem. Note this update here:

" # If the query doesn't already start with the 'search' operator or another
# generating command (e.g. "| inputcsv"), then prepend "search " to it.
if not (searchQuery.startswith('search') or searchQuery.startswith("|")):
searchQuery = 'search ' + searchQuery"

I will update the docs. Let me know how this works for you!

import urllib
import httplib2
from xml.dom import minidom

baseurl = 'https://re-latitude.sv.splunk.com:8089'
userName = 'guest'
password = 'guest'

searchQuery = '| inputcsv foo.csv | where sourcetype=access_common | head 5'

# Authenticate with server.
# Disable SSL cert validation. Splunk certs are self-signed.
serverContent = httplib2.Http(disable_ssl_certificate_validation=True).request(baseurl + '/services/auth/login',
    'POST', headers={}, body=urllib.urlencode({'username':userName, 'password':password}))[1]

sessionKey = minidom.parseString(serverContent).getElementsByTagName('sessionKey')[0].childNodes[0].nodeValue

# Remove leading and trailing whitespace from the search
searchQuery = searchQuery.strip()

# If the query doesn't already start with the 'search' operator or another 
# generating command (e.g. "| inputcsv"), then prepend "search " to it.
if not (searchQuery.startswith('search') or searchQuery.startswith("|")):
    searchQuery = 'search ' + searchQuery

print searchQuery

# Run the search.
# Again, disable SSL cert validation. 
print httplib2.Http(disable_ssl_certificate_validation=True).request(baseurl + '/services/search/jobs','POST',
    headers={'Authorization': 'Splunk %s' % sessionKey},body=urllib.urlencode({'search': searchQuery}))[1]

kcnolan13
Communicator

Nice catch, that was it!

frobinson_splun
Splunk Employee
Splunk Employee

Awesome! Glad to hear it.

0 Karma

GregZillgitt
Path Finder

Try this:

"search | inputcsv ..."
0 Karma

kcnolan13
Communicator

Believe it or not, that hasn't worked either. The "search |" is stripped off and all that shows up in the job viewer query window is still: "inputcsv scale_med_validation_data | apply fastflux_model | where 'predicted(is_attack)' = 1 | eval t = now()+3600*1 | eval report_hour=strftime(t, "%H") | eval report_date=strftime(t, "%m/%d/%Y") | tail 50 | collect index=fastflux_summary"

0 Karma

GregZillgitt
Path Finder

ok, how about

search index=* | head 1 | eval foo="deleteme" | inputcsv ... | blah blah blah| search NOT foo="deleteme"

(just out of curiosity)

0 Karma

kcnolan13
Communicator

Yeah, I've tried that kind of thing too, but you get this:

"Error in 'inputcsv' command: This command must be the first command of a search."

Good thought though.

0 Karma

GregZillgitt
Path Finder

Oh crap, that's right.

How about putting the "|inputcsv..." in a macro? Then...

search `foo` | blah blah...

0 Karma

kcnolan13
Communicator

Nice workaround. I'll give it a shot tomorrow and see if it takes.

0 Karma

GregZillgitt
Path Finder

Something else to try:

search * | head 1 | append [|inputcsv foo.csv | blah ] | blah

Might run into issues if your csv is large (e.g. >50K rows)

0 Karma

kcnolan13
Communicator

So, the macro option doesn't work because unfortunately you still get this:

"Error in 'inputcsv' command: This command must be the first command of a search."

And I'm working with some large CSV files, so the other suggestion isn't ideal for this use case.

Any other tricks up your sleeve?

0 Karma

GregZillgitt
Path Finder

Have you tried just using curl?

curl -ku 'admin':'changeme' https://myserver:8089/servicesNS/admin/search/search/jobs/export -d search="|inputcsv foo.csv | blah" -d output_mode=csv
0 Karma

frobinson_splun
Splunk Employee
Splunk Employee

Hi @kcnolan13,
What endpoint are you using to submit the search?

Have you tried escaping the pipe character?

0 Karma

kcnolan13
Communicator

My base URL is https://xx.xx.xx.xx:8089/

What method of escaping are you referring to? I tried sticking a "\" in front of the leading pipe, but only ended up with a parse error.

0 Karma

frobinson_splun
Splunk Employee
Splunk Employee

Ok, it looks like you are using the correct management port to submit the request. But what endpoint are you using to submit the search? Are you creating a saved search and then retrieving the results? Are you using an SDK or is there anything else about how you are submitting the search that might help troubleshoot?

It might be good to get more context before going further with escaping characters. That might not be the issue.

For extensive troubleshooting, it might also be helpful to contact support.

0 Karma

kcnolan13
Communicator

I'm using a nearly identical Python script to the example shown here:

0 Karma

kcnolan13
Communicator

The important part probably being:

sid = httplib2.Http(disable_ssl_certificate_validation=True).request(baseurl + '/services/search/jobs','POST',
headers={'Authorization': 'Splunk %s' % sessionKey},body=urllib.urlencode({'search': searchQuery}))[1]

0 Karma

frobinson_splun
Splunk Employee
Splunk Employee

Thanks for the info. I have an active request in to our engineering team to review the Python example here and will add your question/issue to this.

In the meantime, in case it is possible to consider alternatives, there is a Python SDK for developers that might be helpful to you, with info on creating + running searches here:
http://dev.splunk.com/view/python-sdk/SP-CAAAEE5

0 Karma

kcnolan13
Communicator

Thanks @frobinson. I'm aware of the SDK, but hoped I could just bang out this small task with a modified version of the example Python script. I hope the developers fix this issue, if it is indeed on their end.

0 Karma

frobinson_splun
Splunk Employee
Splunk Employee

I understand. I've pinged some folks again about this, will post again here if I get an update. Sorry for the confusion!

0 Karma
Get Updates on the Splunk Community!

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...