Question 1: Is there a centralized place to search for all Splunk error messages? Searching answers.splunk.com I've not been able to find a reference to, or solution for,
"Error in 'rex' command: Invalid argument: '(' The search job has failed due to an error. You may be able view the job in the Job Inspector."
Question 2: Why does this rex query work fine in a search, but then fail when used in both a primary and a subsearch? I need to parse fields in both places. I built an initial query that worked fine alone, then created a subsearch and copied/pasted the rex into it. It now fails with
"Error in 'rex' command: Invalid argument: '(' The search job has failed due to an error. You may be able view the job in the Job Inspector."
What do you think is going on, and how do I fix it? The purpose is to find Devices with Tasks that failed at one time, but where a later Task succeeded. Thanks so much.
Here is the code, although for some reason the * asterisks after each dot (.) in the regexes don't seem to come through in the preview window:
source="File1.csv" index="inventory-legacy" | regex Notes="^Succ.*" | transaction Description | rex field=Description "^(?<TaskID>[^-]+).*" | rex field=Description "^[^-]+-(?<DeviceName>.*)" [ search source="File1.csv" index="inventory-legacy" | regex Notes="^Fail.*" | transaction Description | rex field=Description "^(?<TaskID>[^-]+).*" | rex field=Description "^[^-]+-(?<DeviceName>.*)" | dedup DeviceName, TaskID | fields DeviceName ] |sort -_time, +TaskID, +DeviceName | table _time, TaskID, DeviceName, Description, Notes
To search for error messages, you'll need access to the _internal index. Replace "splunkd" with the name of any other Splunk log file you wish to view.
index=_internal source="*/splunkd.log" | ...
Why not combine your searches into one? I don't know if it will solve the problem, but simpler is usually better.
source="File1.csv" index="inventory-legacy" (Notes="Succ*" OR Notes="Fail*" | transaction Description | rex field=Description "^(?<TaskID>[^-]+).*" | rex field=Description "^[^-]+-(?<DeviceName>.*)" | dedup DeviceName, TaskID | sort -_time, +TaskID, +DeviceName | table _time, TaskID, DeviceName, Description, Notes
To search for error messages, you'll need access to the _internal index. Replace "splunkd" with the name of any other Splunk log file you wish to view.
index=_internal source="*/splunkd.log" | ...
Why not combine your searches into one? I don't know if it will solve the problem, but simpler is usually better.
source="File1.csv" index="inventory-legacy" (Notes="Succ*" OR Notes="Fail*" | transaction Description | rex field=Description "^(?<TaskID>[^-]+).*" | rex field=Description "^[^-]+-(?<DeviceName>.*)" | dedup DeviceName, TaskID | sort -_time, +TaskID, +DeviceName | table _time, TaskID, DeviceName, Description, Notes
Hi, Rich. Thanks so much for responding. I guess I could have been clearer with my first question regarding "all Splunk error messages", but I am asking for a listing from Splunk of all error messages that their code generates, what causes each to trigger, and possibly how to fix the underlying cause of the problem. I am not asking how to view errors that have been logged in my system, but rather the meaning of any error message I encounter. Thanks again, especially if you have an answer to that.
As for Question 2 I will try the code you suggested.
Thanks for clarifying, Mark. I don't work for Splunk, but I'm pretty sure what you're asking for doesn't exist. I've been part of a lot of software projects and few of them were documented to the extent you seek. It's not that it can't be done, it's a difficult job in a large product if it hasn't been done and maintained since early days. Perhaps @ppablo can get an official answer for us.
Rich, Thanks again for your input. We'll see what happens.
I can confirm that we do not have a comprehensive error message reference as you describe it. Aside from the difficulty to create and maintain such a reference, given the extent of the code base, there are also multiple, varied conditions that can produce any given error message. It is not a simple one-to-one relationship.
With that said, we do have some work underway to improve the content of the error messages themselves, to assist in recognizing the cause and recovering from the error condition. These improvements will be a gradual process.
Chris, thanks for your answer about error messages. I know some people are using Splunk to review sourcecode, so I'm sure you are doing something similar internally as well. Perhaps that will help pull out those pieces of code that identify error messages for amplification. We'll be interested to see the improvements you speak of.
Do you have any feedback on my more important question? Why is the very same rex query failing when used in a primary/subsearch context, but works fine when used alone in a single query statement?
I tested the code you suggested and it is similar to what I started with originally. It does pull all records, both successes and failures, but it's not quite what I want. The subsearch is to first identify Devices associated with a particular TaskID that attempted an action and failed. Once we have that pool of devices, the primary search looks to see which of those devices subsequently ran with a new TaskID that did succeed. This will greatly reduce the events returned, and will provide the answer I need to the question: "which TaskID (a set of tests run) succeeded after a previous TaskID (different tests) had failed previously. Thanks again for your help.
I think this should be a separate posting.