I need to see all events with fields that have "PROD*" in name, e.g. "PROD deploy", "PROD update", etc.
`index=myIndex sourcetype=mySourceType "PROD*"="*"` doesn't work
..and if event has "PROD*" in field name I need to get the value
How is it possible?
A bit of additional explanation on this (all other comments in this thread so far are valid).
1. If you search by field value, you have to explicitly name the field. There is no wildcard functionality as such. There simply isn't. So field=*whatever*wildcarded*value*you*can*think*of is a formally valid search condition (even though it might not be the best one from performance point of view), your idea of PROD*=* would (if asterisk was allowed in field name; I'm pretty sure it isn't) search for a field called literally PROD*.
2. Even if there was a way to do so, this type of search (as well as any search for non-indexed field value beginning with a with an asterisk) is the worst possible idea performance-wise. Since Splunk cannot limit the set of processed events to some values, it has to parse all events from a given search time range (possibly limited by other search conditions) to find out if there is such field present in your events at all. It might be very, very costly in terms of CPU time.
If your field(s) can be "anchored" to some static, delimited at least at the beginning with a breaker, text within the event, you can help Splunk by limiting your events by adding a proper search term in the initial search.
That's why it's best to combine @ITWhisperer 's solution with @bowesmana 's way of limiting the events you're searching from.
@ITWhisperer and @bowesmana gave some good ideas for obtaining streams of events. I want to note that " if event has "PROD*" in field name I need to get the value" can have different meanings depending on what you want to do with the keys and values. If all you want is to list all values of each key, it can be as simple as
index=myIndex sourcetype=mySourceType
| stats values(Prod*) as Prod*
What I need to do is: Select all events that have `PROD*` in field name and calculate number of events that have `SUCCESS` value in `RESULT` field
Also, if you have lots of events that do NOT have fields called PROD anything and your event data must have PROD as a term in the data, then you can help filter only those events that have PROD in the data with TERM(PROD)
index=myIndex sourcetype=mySourceType TERM(PROD)
Try something like this
| foreach PROD*
[| eval keep=if(isnull(keep) AND isnotnull('<<FIELD>>'), 1, keep)]
| where keep==1
@ITWhisperer Many thanks, it shows all events with `PROD*` filed names
How can I get all events without `PROD*` filed names please?
| fields - PROD*
`
index=myIndex sourcetype=mySourceType | fields - PROD*
` still displays events with and without `PROD*` fields.. Am I doing smtg wrong?
Your wording might be a bit confusing. You apparently wanted to search for all events that do not have any fields matching the PROD* pattern in them whereas @ITWhisperer showed you how to show all events without showing the PROD* fields.
You can do the opposite to previous @ITWhisperer 's solution
<your initial search>
| eval keep=1
| foreach PROD*
[ | eval keep=0 ]
| where keep=1
| fields - keep
This will filter out any event containing PROD* fields. But this is a very ineffective way of searching your data.
It is not clear what your use case is - how can you have events which don't have fields called PROD* and also don't have fields not called PROD* - please clarify what you are trying to do
@ITWhisperer sure, sorry if it wasn't clear
I have some events that have `PROD*` in field names (e.g. `PROD error`, `Production warning`, etc.)
Other events don't have `PROD*` in field names, e.g. they have `DEV error`, `Development warning`, etc.
The end goal is to build dashboard that will compare statistics across different envs, that's why I need data from other fields like `ERROR_CODE`
Hope it makes sense and thanks a lot for your help and patience, it's much appreciated!
I have some events that have `PROD*` in field names (e.g. `PROD error`, `Production warning`, etc.)Other events don't have `PROD*` in field names, e.g. they have `DEV error`, `Development warning`, etc.
The end goal is to build dashboard that will compare statistics across different envs, that's why I need data from other fields like `ERROR_CODE`
It really does not make sense because the above kind of contradicts what is asked in the OP.
A good way - in fact the very basic way to ask an answerable question is to illustrate your data with a table (you can give a few fields of relevance and value variants of importance), then illustrate desired output from the data with a table, then explain the logic to arrive at desired output from illustrated data as if you have no Splunk.
Sorry for the confusion - I'm new to this dataset and some topics became clear only after init search
Anyways I'm running
index="myIndex" sourcetype ="mySourceType" | foreach "*PROD*" [| eval keep=if(isnotnull('<<FIELD>>'), 1, keep)] | where keep==1
to get events with `*PROD*` in field names
..and
index="myIndex" sourcetype ="mySourceType" | foreach "*DEV*" [| eval keep=if(isnotnull('<<FIELD>>'), 1, keep)] | where keep==1
to get events with `*DEV*` in field names
..so no need to find events without `*PROD*` in filed name
But:
1. how can I make foreach condition case-insensitive, so events with both `Production` and `PRODUCTION` fields found?
2. how can I make foreach to search with OR condition, e.g. `foreach "*DEV*" OR "*UAT*"` ?
Many thanks in advance!
1. how can I make foreach condition case-insensitive, so events with both `Production` and `PRODUCTION` fields found?
2. how can I make foreach to search with OR condition, e.g. `foreach "*DEV*" OR "*UAT*"` ?
Back to my previous comment. My exact words are "depending on what you want to do with the keys and values." If you don't tell us what your end goal is, it is not really an answerable question. Like I exemplified, you can do a lot without having to resort to cumbersome foreach subsearches if you have some simple goals. For example,
| stats values(PROD*) as PROD* values(Prod*) as Prod*
| timechart dc(PROD*) as PROD* dc(Prod*) as Prod* by somekey
and so on. Like @PickleRick points out, you could be giving yourself a harder time than it should have been.
Because Splunk is optimized for time series, there are more row-oriented, or value-oriented manipulations than column names or keys. Instead of doing foreach for simple filter function - which is quite wasteful and offers no performance improvement like a simple filter in index search. If there are only a handful variants of these field names, it is perhaps more profitable to simply enumerate them in command line.
index=myindex sourcetype=mysourcetype
("PROD deploy"=* OR "PROD update"=* OR "PROD error"=* OR "Production warning"=*)
This is important for performance because it reduces the number of events.
Additionally, is this separation between Prod and Dev going somewhere or are they really just used in two different outputs?
The answer to the question about OR in foreach is you don't need any. Simply do, for example, foreach DEV* Dev* UAT*. Again, is there a need to put wildcard in front of Prod and Dev?
@yuanliu
1. My task is to calculate number of events with "FAILED" value in "RESULT" key, it looks like this and it works (thanks to you guys!) - `index="myIndex" sourcetype ="mySourceType" | foreach "*DEV*" "UAT*" [| eval keep=if(isnotnull('<<FIELD>>'), 1, keep)] | where keep==1 | stats count(eval('RESULT'=="FAILED")) as FAILS | stats values(FAILS)`
It would be superb to add case-insensitivity, so both `DEV` and `Develop` are included to the result
2. Many thanks, `foreach "*DEV*" "UAT*"` works as a charm!
1. My task is to calculate number of events with "FAILED" value in "RESULT" key, it looks like this and it works (thanks to you guys!) - `index="myIndex" sourcetype ="mySourceType" | foreach "*DEV*" "UAT*" [| eval keep=if(isnotnull('<<FIELD>>'), 1, keep)] | where keep==1 | stats count(eval('RESULT'=="FAILED")) as FAILS | stats values(FAILS)`
This gets even more confusing.
Most importantly still, as @PickleRick and I repeatedly point out, Splunk (and most programming languages) do not perform sophisticated calculations in name space, mostly because there is rarely need to do so. When there is a serious need for manipulating variable name space, it is usually because the upstream programmer made poor design. In Splunk's case, it is super flexible in handling data without preconceived field names. As @bowesmana suggested, if you can demonstrate your raw data containing those special keys, it is probably much easier (and more performant) to simply use TERM() filter to limit raw events rather than trying to apply semantics in extracted field names. (TERM is case insensitive by default.) If you find TERM() too limiting, you can also use Splunk's super flexible field extraction to extract environment groups "Prod" and "Dev" using regex. This way, all you need to do is
index="myIndex" sourcetype ="mySourceType" RESULT=FAILED environment=Dev
| stats count
You can even do something like
index="myIndex" sourcetype ="mySourceType" RESULT=FAILED
| stats count by environment
Any of these alternatives is better in clarity and efficiency.
You can't "add case-insensitivity". Field names are case-sensitive in Splunk by design (so Dev and DEV are two different fields and you can have both of them in your event).
You can try some ugly hacks to "normalize" field case like
| foreach *
[ | eval field=lower("<<FIELD>>")
| eval {field}=<<FIELD>>
| eval <<FIELD>>=null()
| eval field=null() ]
But this is really, really ugly. And if you have two similar but differently-cased field names, only one of them will be retained, other one(s) will be overwritten.
Of course you can "limit" this to just some pattern by doing conditional evals (but it gets even uglier than this because you have to add an if matching the field name to most of those evals so it's getting really spaghetti).
Honestly, you're digging deeper and deeper into something that seems that should be much better solved just by preparing the data correctly.
You either should make your developers log the same fields consistently and distinguish the source of the events by... source field? Or maybe some additional field if all events are aggregated into a single point of origin.
Alternatively as they apparently have different structure, they should have different sourcetypes so that each is parsed differently (and all can be normalized to a common set of fields).
IMHO you're unnecessarily trying to make the life harder than it has to be for yourself.
unfortunately dataset is old and can't be changed
I'd be happy if fields have decent naming pattern..