Splunk Search

Issues with appendcols

netanelm7
Path Finder

Hi everybody,

I have a problem with an "appendcols" command.
I have a query which needs to count how many times a field's value is above 450, and i need to do that for 4 fields.
It's usually a really easy query with "count" in the timechart command, But for some reason I see that every field is sampled more then once in a minute, so if im doing "count", it will display a much higher result (because it's counting the duplicates too).
So I've found a solution, i've added "dedup time", but i can do it for each field on its own, so it will dedup with it's time and not others, and i have to do it with appendcols to bring together all 4 fields with dedup time.
Overall, the query works fine, but i has a problem once in a while if it doesnt find any results in the first search (before the appendcols), which then it shows my the same result only with the field _time blank, and then i cant use any drilldown on it.
I hope it's understandable.

He's my main "count" query with the appendcols:

index=storage_18037 sourcetype=csvRotemA_JNL_SUMMARY NOT DATETIME host=RotemA JOURNAL_ID="014" | eval transfer_in_MB=M_JNL_ASYNC_XFER_RATE/1024,IDs="JNL".JOURNAL_ID | where transfer_in_MB>450 | dedup _time | timechart span=$interval2$ count(transfer_in_MB) by IDs | appendcols [search index=storage_18037 sourcetype=csvRotemA_JNL_SUMMARY NOT DATETIME host=RotemA JOURNAL_ID="00A" | eval transfer_in_MB=M_JNL_ASYNC_XFER_RATE/1024,IDs="JNL".JOURNAL_ID | where transfer_in_MB>450 | dedup _time | timechart span=$interval2$ count(transfer_in_MB) by IDs] | appendcols [search index=storage_18037 sourcetype=csvRotemA_JNL_SUMMARY NOT DATETIME host=RotemA JOURNAL_ID="000" | eval transfer_in_MB=M_JNL_ASYNC_XFER_RATE/1024,IDs="JNL".JOURNAL_ID | where transfer_in_MB>450 | dedup _time | timechart span=$interval2$ count(transfer_in_MB) by IDs] | appendcols [search index=storage_18037 sourcetype=csvRotemA_JNL_SUMMARY NOT DATETIME host=RotemA JOURNAL_ID="01E" | eval transfer_in_MB=M_JNL_ASYNC_XFER_RATE/1024,IDs="JNL".JOURNAL_ID | where transfer_in_MB>450 | dedup _time | timechart span=$interval2$ count(transfer_in_MB) by IDs] | fillnull value="0" JNL000 | fillnull value="0" JNL00A | fillnull value="0" JNL014 | fillnull value="0" JNL01E | eval _start_time=_time, _end_time=_time+_span

Thank you very much!!

Tags (1)
0 Karma
1 Solution

DalJeanis
Legend

First, appendcols is useful in only a few very limited situations. It is not useful in any situation where the different return values might get out of sync. This is not that situation, so, don't use it here.

Second, you are manually breaking out searches for the different values, when timechart will do that for you automatically. You are working far too hard. Don't do that either.

Start with something like this...

index=storage_18037 sourcetype=csvRotemA_JNL_SUMMARY NOT DATETIME host=RotemA 
(JOURNAL_ID="014" OR  JOURNAL_ID="00A" OR JOURNAL_ID="000" OR JOURNAL_ID="01E")
| eval transfer_in_MB=M_JNL_ASYNC_XFER_RATE/1024
| eval IDs="JNL".JOURNAL_ID 
| where transfer_in_MB>450 
| dedup _time IDs
| timechart span=$interval2$ count(transfer_in_MB) by IDs 

I'm not sure about the dedup, or the count in the timechart. Myself, I would check to make sure that the records being eliminated are actually duplicates, and that count() was really what you wanted to graph, rather than, for example max() of per_second values. You know your data, so I'll leave that determination to you.

View solution in original post

DalJeanis
Legend

First, appendcols is useful in only a few very limited situations. It is not useful in any situation where the different return values might get out of sync. This is not that situation, so, don't use it here.

Second, you are manually breaking out searches for the different values, when timechart will do that for you automatically. You are working far too hard. Don't do that either.

Start with something like this...

index=storage_18037 sourcetype=csvRotemA_JNL_SUMMARY NOT DATETIME host=RotemA 
(JOURNAL_ID="014" OR  JOURNAL_ID="00A" OR JOURNAL_ID="000" OR JOURNAL_ID="01E")
| eval transfer_in_MB=M_JNL_ASYNC_XFER_RATE/1024
| eval IDs="JNL".JOURNAL_ID 
| where transfer_in_MB>450 
| dedup _time IDs
| timechart span=$interval2$ count(transfer_in_MB) by IDs 

I'm not sure about the dedup, or the count in the timechart. Myself, I would check to make sure that the records being eliminated are actually duplicates, and that count() was really what you wanted to graph, rather than, for example max() of per_second values. You know your data, so I'll leave that determination to you.

netanelm7
Path Finder

Thank you!
That "dedup _time IDs" is great.
Didnt know you can give him multiple fields.

DalJeanis
Legend

@netanelm - Yes, you can. Also, there's a parameter keepempty=t that you can use to tell him to ignore any event that is missing any of the fields. That way, you can dedup one type of record while keeping all copies of another.... although there may be more efficient ways of accomplishing that objective, depending on the data.

https://docs.splunk.com/Documentation/SplunkCloud/6.6.3/SearchReference/Dedup

0 Karma
Get Updates on the Splunk Community!

Enterprise Security Content Update (ESCU) | New Releases

In December, the Splunk Threat Research Team had 1 release of new security content via the Enterprise Security ...

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

(This is the first of a series of 2 blogs). Splunk Enterprise Security is a fantastic tool that offers robust ...

Index This | What are the 12 Days of Splunk-mas?

December 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...