Solved: Issues with appendcols

netanelm7 · ‎12-11-2017

Hi everybody,

I have a problem with an "appendcols" command.
I have a query which needs to count how many times a field's value is above 450, and i need to do that for 4 fields.
It's usually a really easy query with "count" in the timechart command, But for some reason I see that every field is sampled more then once in a minute, so if im doing "count", it will display a much higher result (because it's counting the duplicates too).
So I've found a solution, i've added "dedup time", but i can do it for each field on its own, so it will dedup with it's time and not others, and i have to do it with appendcols to bring together all 4 fields with dedup time.
Overall, the query works fine, but i has a problem once in a while if it doesnt find any results in the first search (before the appendcols), which then it shows my the same result only with the field _time blank, and then i cant use any drilldown on it.
I hope it's understandable.

He's my main "count" query with the appendcols:

index=storage_18037 sourcetype=csvRotemA_JNL_SUMMARY NOT DATETIME host=RotemA JOURNAL_ID="014" | eval transfer_in_MB=M_JNL_ASYNC_XFER_RATE/1024,IDs="JNL".JOURNAL_ID | where transfer_in_MB>450 | dedup _time | timechart span=$interval2$ count(transfer_in_MB) by IDs | appendcols [search index=storage_18037 sourcetype=csvRotemA_JNL_SUMMARY NOT DATETIME host=RotemA JOURNAL_ID="00A" | eval transfer_in_MB=M_JNL_ASYNC_XFER_RATE/1024,IDs="JNL".JOURNAL_ID | where transfer_in_MB>450 | dedup _time | timechart span=$interval2$ count(transfer_in_MB) by IDs] | appendcols [search index=storage_18037 sourcetype=csvRotemA_JNL_SUMMARY NOT DATETIME host=RotemA JOURNAL_ID="000" | eval transfer_in_MB=M_JNL_ASYNC_XFER_RATE/1024,IDs="JNL".JOURNAL_ID | where transfer_in_MB>450 | dedup _time | timechart span=$interval2$ count(transfer_in_MB) by IDs] | appendcols [search index=storage_18037 sourcetype=csvRotemA_JNL_SUMMARY NOT DATETIME host=RotemA JOURNAL_ID="01E" | eval transfer_in_MB=M_JNL_ASYNC_XFER_RATE/1024,IDs="JNL".JOURNAL_ID | where transfer_in_MB>450 | dedup _time | timechart span=$interval2$ count(transfer_in_MB) by IDs] | fillnull value="0" JNL000 | fillnull value="0" JNL00A | fillnull value="0" JNL014 | fillnull value="0" JNL01E | eval _start_time=_time, _end_time=_time+_span

Thank you very much!!

DalJeanis · ‎12-11-2017

First, appendcols is useful in only a few very limited situations. It is not useful in any situation where the different return values might get out of sync. This is not that situation, so, don't use it here.

Second, you are manually breaking out searches for the different values, when timechart will do that for you automatically. You are working far too hard. Don't do that either.

Start with something like this...

index=storage_18037 sourcetype=csvRotemA_JNL_SUMMARY NOT DATETIME host=RotemA 
(JOURNAL_ID="014" OR  JOURNAL_ID="00A" OR JOURNAL_ID="000" OR JOURNAL_ID="01E")
| eval transfer_in_MB=M_JNL_ASYNC_XFER_RATE/1024
| eval IDs="JNL".JOURNAL_ID 
| where transfer_in_MB>450 
| dedup _time IDs
| timechart span=$interval2$ count(transfer_in_MB) by IDs

I'm not sure about the dedup, or the count in the timechart. Myself, I would check to make sure that the records being eliminated are actually duplicates, and that count() was really what you wanted to graph, rather than, for example max() of per_second values. You know your data, so I'll leave that determination to you.

View solution in original post

DalJeanis · ‎12-11-2017

First, appendcols is useful in only a few very limited situations. It is not useful in any situation where the different return values might get out of sync. This is not that situation, so, don't use it here.

Second, you are manually breaking out searches for the different values, when timechart will do that for you automatically. You are working far too hard. Don't do that either.

Start with something like this...

index=storage_18037 sourcetype=csvRotemA_JNL_SUMMARY NOT DATETIME host=RotemA 
(JOURNAL_ID="014" OR  JOURNAL_ID="00A" OR JOURNAL_ID="000" OR JOURNAL_ID="01E")
| eval transfer_in_MB=M_JNL_ASYNC_XFER_RATE/1024
| eval IDs="JNL".JOURNAL_ID 
| where transfer_in_MB>450 
| dedup _time IDs
| timechart span=$interval2$ count(transfer_in_MB) by IDs

I'm not sure about the dedup, or the count in the timechart. Myself, I would check to make sure that the records being eliminated are actually duplicates, and that count() was really what you wanted to graph, rather than, for example max() of per_second values. You know your data, so I'll leave that determination to you.

netanelm7 · ‎12-12-2017

Thank you!
That "dedup _time IDs" is great.
Didnt know you can give him multiple fields.

DalJeanis · ‎12-12-2017

@netanelm - Yes, you can. Also, there's a parameter keepempty=t that you can use to tell him to ignore any event that is missing any of the fields. That way, you can dedup one type of record while keeping all copies of another.... although there may be more efficient ways of accomplishing that objective, depending on the data.

https://docs.splunk.com/Documentation/SplunkCloud/6.6.3/SearchReference/Dedup

Issues with appendcols

Prove Your Splunk Prowess at .conf25—No Prereqs Required!

Splunk Observability Cloud's AI Assistant in Action Series: Observability as Code

Splunk Answers Content Calendar, July Edition I

Are you a member of the Splunk Community?

Issues with appendcols

Prove Your Splunk Prowess at .conf25—No Prereqs Required!

Splunk Observability Cloud's AI Assistant in Action Series: Observability as Code

Splunk Answers Content Calendar, July Edition I