Below is the search which I'm trying:
index=p_data sourcetype="p_sourcetype" | xmlkv | where EventId!="" | table source,EventId | join type=outer max=0 source [search index=p_data | xmlkv | where ChannelId!="" | table source,ChannelId] | table source,EventId,ChannelId | where EventId = "PG0019283"
The result return with the source and the EventId, but ChannelId is empty, which it should contain some value.
When I conduct the searches separately, they both contain the identical source value for joining.
Pls assist.
Thank you.
What if we did one search like so:
index=p_data (EventId OR ChannelId) | xmlkv | where (sourcetype="p_sourcetype" AND EventId!="") OR ChannelId!="" | stats values(EventId) as EventId values(ChannelId) as ChannelId by source
As xmlkv is your first step in both queries, I'm guessing that the text EventId or ChannelId will appear in your event, we then parse and apply a where to get all events and use stats instead of join to combine results together. As source is usually the name of the monitored file in Splunk, it seems strange to me that it would be a unique identifier for event and channel events, but it could be.
What if we did one search like so:
index=p_data (EventId OR ChannelId) | xmlkv | where (sourcetype="p_sourcetype" AND EventId!="") OR ChannelId!="" | stats values(EventId) as EventId values(ChannelId) as ChannelId by source
As xmlkv is your first step in both queries, I'm guessing that the text EventId or ChannelId will appear in your event, we then parse and apply a where to get all events and use stats instead of join to combine results together. As source is usually the name of the monitored file in Splunk, it seems strange to me that it would be a unique identifier for event and channel events, but it could be.
index=p_data | rex field=_raw "language=\"eng\">\s+(?.*?)" | search index=p_data (EventId OR ChannelId OR Name) | xmlkv | where (sourcetype="p_sourcetype" AND EventId!="") OR ChannelId!="" | stats values(EventId) as EventId values(Name) as Name values(ChannelId) as ChannelId by source
The result came out as below
SOURCE EventId Name ChannelId
source1 PG0019283 Winter 174
PG0018294 Summer
PG0037926 Spring
source2 PG0073947 Spiderman 186
PG0089362 Superman
source3 PG0045281 Apple 902
source4 PG0019283 Winter 174
PG0037926 Spring
I need to flatten them to look like below:
SOURCE EventId Name ChannelId
source1 PG0019283 Winter 174
source1 PG0018294 Summer 174
source1 PG0037926 Spring 174
source2 PG0073947 Spiderman 186
source2 PG0089362 Superman 186
source3 PG0045281 Apple 902
source4 PG0019283 Winter 174
source4 PG0037926 Spring 174
Thanks
In the original events, what is the relationship between EventId and Name and ChannelId? Are they all part of a single event or are they separate events? The problem with stats values(field)
is that order is not preserved, just uniqueness of values, so there is not necessarily a correlation between the first EventId and the first Name. It seems like ChannelId correlates loosely to source, but that could also be a problem of sample size. There are other statistical functions that could be used like list
, but without knowing the makeup of your source data it's hard to make recommendations. If all are part of the same event, then stats
could go away and we could just use table.
Other side note, the contents of your search just before the xmlkv should likely be part of your base search as it seems like you're dealing only with tokens that would exist in the raw event, so why extract things from disk only to filter them 2 steps later? Additionally I'm not sure if it's the forum stripping out important characters or not but it's possible that your rex should actually be a regex ?
The search makes it clear that EventId is only in sourcetype="p_sourcetype", whereas ChannelId may exist in any sourcetype of the same index. This implies that these two fields exist in separate events from separate sourcetypes. "table" will not work, else join would have been unnecessary. "name" may exist in the same events as "ChannelId" and a technique of combining and separating may keep them together, if that is the case.
index=p_data (EventId OR ChannelId) | xmlkv | where (sourcetype="p_sourcetype" AND EventId!="") OR ChannelId!="" | eval channel_name=ChannelId." ".Name | stats values(EventId) as EventId values(channel_name) as channel_name by source | rex field=channel_name "(?<ChannelId>\S+)\s+(?<Name>.+)"
leonheart78, the forum maybe really messing up your regex so that it looks really wierd like it should not work. Try a blank line before it and four spaces.
Second, to split up the EventId-ChannelId pairs, you should try:
| mvexpand ChannelId
after the stats command.
Hi,
Thanks for your suggestion, it seems to have worked, but the issues is that within a ChannelId, it contains multiple EventId, how can I expand it so that each EventId is tagged with a ChannelId so to output to csv?
Could you explain more and/or provide some sample (anonymized) events? Do you mean you have a single event that has a ChannelId and multiple EventIds in it? Or is it just how my search does things?
The simple answer might be you want to just pipe to an mvexpand EventId command, but there might be an even more optimal search out there depending on what your data actually looks like.
Excellent approach. This will not only increase the speed and efficiency but also improve your ability to see variances in the data formatting, if any.
Hi, your suggestion works! However, the each ChannelId now is tied with multiple EventId. How can I tag the ChannelId to each EventId?
Join uses a subsearch, which comes with time and event limitations. 90s and 50k events max. The time limit is configurable. 50k events is an absolute max. You may also want to look at multi-value and null values. Check value formatting, including extra spaces, non-printable characters, etc. I never use max=0, so that might be causing problems. If you are expecting max=0 to remove the 50k limitation, it will not. .. I'll have to look into "max" more later, but it seems to allow search line customization of any max event limitations.
Instead, try:
index=p_data (EventId OR ChannelId) | xmlkv | where (sourcetype="p_sourcetype" AND EventId!="") OR ChannelId!="" | stats values(EventId) as EventId values(ChannelId) as ChannelId by source | mvexpand ChannelId