I stood up a test instance of Splunk that is a "all in one" system, that is indexer and search head. I wrote an app that pulls data via REST API but realized I wasn't sure if I needed to ensure it had a custom outputs.conf if I am "sending" to the same system.
Since it is acting as in indexer, wouldn't it immediately pull the data and then index it without needing a /local/outputs.conf? I wasn't sure and couldn't find any clear documentation explaining this specific scenario.
My script pulls data but I don't have anything populating the main index. If I run the script manually, the data prints to stdout as expected.
Thank you very much for all the suggestions and troubleshooting. After getting the test script to work but mine failing, I concluded that the issue was with my script. Breaking it down to the bare minimum commands to replicate the functionality in my script AND running it with the splunk binary (rather than just from the command line as the splunk user) I realized that Splunk was experiencing some problems. In my original script, I use the curl command but was using the silent flag (-s) so the errors themselves were being hidden from stdout.
Now testing it at its bare minimum, I saw Splunk throwing errors because it was unable to access (or couldn't find for some reason) the SSL CA cert path. There were 2 solutions to this, either use the -k switch in my curl command, or provide the full path using --cacert. I tested both and they both work but ended up using solution 2.
Testing using the splunk binary
/opt/splunk/bin/splunk cmd ./json.sh
Solution 1: (-k) non-ssl cert validation
curl -sgk "https://api.website.com/rest_endpoint" -X GET -u "user:api_token" -H 'Accept: application/json'
Solution 2: Provide the path to standard certs
curl -sg --cacert /etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem "https://api.website.com/rest_endpoint" -X GET -u "user:api_token" -H 'Accept: application/json'
Hi
this should works.
inputs.conf
[script://$SPLUNK_HOME/etc/apps/<your app name>/bin/json.sh]
disabled = false
index = test
interval = 60.0
sourcetype = json_no_timestamp
then script
#!/bin/bash
echo '{"a":"a","b":"b","c":1,"d":{"aa":1,"ab":"ba"}}'
You will have your own sourcetype for that script (h1:json) which is as best practices said. Can you share that definition and also sample of your scripts output so the Community can help you to verify that there haven't been any weirdness?
One hint: Don't use both INDEXED_EXTRACTIONS and KV_MODE at same time or you will get duplicate events!
r. Ismo
Your example did work for me so I guess maybe it's my script. I adjusted my .conf regardless but still nothing. No idea what the issue may be.
---REVISED ---
inputs.conf
[script://$SPLUNK_HOME/etc/apps/TA-HackerOne/bin/hacker_one_pull.sh]
disabled = false
index = test
interval = 180.0
sourcetype = h1:json
props.conf
[h1:json]
DATETIME_CONFIG=CURRENT
INDEXED_EXTRACTIONS=json
KV_MODE=none
LINE_BREAKER=([\r\n]+)
data sample
-bash-4.2$ pwd
/opt/splunk/etc/apps/TA-HackerOne/bin
-bash-4.2$ ./hacker_one_pull.sh
{
"id": "49",
"type": "report",
"attributes": {
"name": "Lorem Ipsum",
"description": "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.",
"external_id": "aaa-123",
"created_at": "2018-02-27T16:48:23.308Z"
}
}
{
"id": "20",
"type": "report",
"attributes": {
"name": "Finibus Bonorum",
"description": "Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium.",
"external_id": "aaa-726",
"created_at": "2019-09-11T08:26:14.625Z"
}
}
-bash-4.2$
This shows up in _internal (but only once, never again)
08-29-2020 20:01:04.689 +0000 INFO ExecProcessor - New scheduled exec process: /opt/splunk/etc/apps/TA-HackerOne/bin/hacker_one_pull.sh
Hi
can you try to change your props.conf to:
[h1:json]
DATETIME_CONFIG =
INDEXED_EXTRACTIONS = json
KV_MODE = none
LINE_BREAKER = ([\r\n]+)
NO_BINARY_CHECK = true
TIMESTAMP_FIELDS = attributes.created_at
TIME_FORMAT = %FT%T.%3Q%Z
disabled = false
pulldown_type = true
Based on my test this should fix it (at least with your examples).
r. Ismo
Thank you very much for all the suggestions and troubleshooting. After getting the test script to work but mine failing, I concluded that the issue was with my script. Breaking it down to the bare minimum commands to replicate the functionality in my script AND running it with the splunk binary (rather than just from the command line as the splunk user) I realized that Splunk was experiencing some problems. In my original script, I use the curl command but was using the silent flag (-s) so the errors themselves were being hidden from stdout.
Now testing it at its bare minimum, I saw Splunk throwing errors because it was unable to access (or couldn't find for some reason) the SSL CA cert path. There were 2 solutions to this, either use the -k switch in my curl command, or provide the full path using --cacert. I tested both and they both work but ended up using solution 2.
Testing using the splunk binary
/opt/splunk/bin/splunk cmd ./json.sh
Solution 1: (-k) non-ssl cert validation
curl -sgk "https://api.website.com/rest_endpoint" -X GET -u "user:api_token" -H 'Accept: application/json'
Solution 2: Provide the path to standard certs
curl -sg --cacert /etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem "https://api.website.com/rest_endpoint" -X GET -u "user:api_token" -H 'Accept: application/json'
Hi
nice to hear that you solve your problem. As you said, the one must test all scripts with splunk cmd your script or splunk cmd python your script to ensure that those are working also when they are calling inside splunk! It’s not unusual that e.g. python scripts works in cmd line and fail when testing first time with splunk.
Happy splunking !
r. Ismo
You should invoke script using splunk inputs.conf to collect events which are printed upon script execution.
can you share inputs.conf which will call your script?
if you don’t have created, create one like below:
[script://./bin/yourscript.extension]
index=<indexname>
interval = <set frequency >
sourcetype = <set sourcetype>
https://docs.splunk.com/Documentation/Splunk/8.0.5/Admin/Inputsconf
I have an inputs, props, and my script. My script file is in /opt/splunk/etc/apps/<myapp>/bin/hacker_one_pull.sh
[script://./bin/hacker_one_pull.sh]
index = main
interval = 600
sourcetype = h1:json
source = api_hackerone
disabled = false
send_index_as_argument_for_path = false
Any reason for adding below:
send_index_as_argument_for_path
can you remove that line and restart splunk service and check.
I was reviewing the docs on inputs because I am not getting data in right now and came across that under the scripted inputs section:
send_index_as_argument_for_path = <boolean>
* Whether or not to pass the index as an argument when specified for
stanzas that begin with 'script://'
* When this setting is "true", the script passes the argument as
'-index <index name>'.
* To avoid passing the index as a command line argument, set this to "false".
* Default: true.
Anyway, it's commented out now and restarted splunk.
This shows up in _internal (and has been on every restart in my troubleshooting) but still no data is coming in. I ran the script manually to ensure it is working (as the splunk user) and JSON data is printing to screen so I know it does work.
08-29-2020 05:37:09.605 +0000 INFO ExecProcessor - New scheduled exec process: /opt/splunk/etc/apps/TA-HackerOne/bin/hacker_one_pull.sh
Can you set all time range and Search with sourcetype and index given in inputs.conf
and also, share your props.conf
confirm if there is timestamp in json logs printing from script.
inputs.conf
[script://$SPLUNK_HOME/etc/apps/TA-HackerOne/bin/hacker_one_pull.sh]
index = main
interval = 180.0
sourcetype = h1:json
source = api_hackerone
disabled = false
# send_index_as_argument_for_path = false
props.conf (NOTE: CURRENT time is the intended and desired behavior)
[h1:json]
CHARSET=UTF-8
DATETIME_CONFIG=CURRENT
INDEXED_EXTRACTIONS=json
KV_MODE=none
SHOULD_LINEMERGE=false
category=Structured
description=HackerOne JSON data via REST API
disabled=false
pulldown_type=true
LINE_BREAKER=([\r\n]+)
Sample data (NOTE: attributes.created_at is not the desired timestamp for _time, hence using CURRENT)
-bash-4.2$ pwd
/opt/splunk/etc/apps/TA-HackerOne/bin
-bash-4.2$ ./hacker_one_pull.sh
{
"id": "49",
"type": "report",
"attributes": {
"name": "Lorem Ipsum",
"description": "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.",
"external_id": "aaa-123",
"created_at": "2018-02-27T16:48:23.308Z"
}
}
{
"id": "20",
"type": "report",
"attributes": {
"name": "Finibus Bonorum",
"description": "Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium.",
"external_id": "aaa-726",
"created_at": "2019-09-11T08:26:14.625Z"
}
}
-bash-4.2$