Re: How to populate nonexistent values in a table?

adnankhan5133 · ‎08-12-2020

My query below generates a table, which appears as follows. The issue that I'm trying to resolve is being able to populate non-existent values with "No Data", as shown in the 2020-08-11 column. Can someone provide some assistance on how to do this? I have used fillnull and filldown, but have not been successful.

Service ID	Resource Name	Transaction Name	Priority	Service Area	Consumer	2020-08-12	2020-08-11	2020-08-10	2020-08-09
ID1	GET	Transaction1	1	Area1	App1	3		4	0
ID2	PUT	Transaction2	2	Area2	App2	8		2	5

index=test_index_1 sourcetype=test_sourcetype_2

| eval epoch_Timestamp=strptime(Timestamp, "%Y-%m-%dT%H:%M:%S.%3QZ")-14400

| rename "Transaction Name" as trans_name, "Application Name" as application_name, "Status Code" as status_code

| eval service_id=case(Verb="GET" AND trans_name="Transaction1" AND application_name="APP1", "ID1", Verb="GET" AND trans_name="Transaction2" AND application_name="App2", "ID2", Verb="PUT" AND trans_name="Transaction2" AND application_name="App2", "ID3", 1=1, "Unqualified")

| where service_id!="Unqualified"

| eval Priority=case(Verb="GET" AND trans_name="Transaction1" AND application_name="APP1", "2", Verb="GET" AND trans_name="Transaction2" AND application_name="App2", "2", Verb="PUT" AND trans_name="Transaction2" AND application_name="App2", "1", 1=1, "Unqualified")

| where Priority!="Unqualified"

| eval service_area=case(Verb="GET" AND trans_name="Transaction1" AND application_name="APP1", "Area1", Verb="GET" AND trans_name="Transaction2" AND application_name="App2", "Area2", Verb="PUT" AND trans_name="Transaction2" AND application_name="App2", "Member", 1=1, "Unqualified")

| where service_area!="Unqualified"

| eval date_reference=strftime(epoch_Timestamp, "%Y-%m-%d")

| stats count(eval(status_code)) as count by service_id, Verb, trans_name, Priority, service_area, application_name, date_reference

| eval combined=service_id."@".Verb."@".trans_name."@".Priority."@".service_area."@".application_name."@"

| xyseries combined date_reference count

| rex field=combined "^(?<service_id>[^\@]+)\@(?<Verb>[^\@]+)\@(?<trans_name>[^\@]+)\@(?<Priority>[^\@]+)\@(?<service_area>[^\@]+)\@(?<application_name>[^\@]+)\@$"

| fillnull value="0"

| table service_id, Verb, trans_name, Priority, service_area, application_name

[ makeresults | addinfo

| eval time = mvappend(relative_time(info_min_time,"@d"),relative_time(info_max_time,"@d"))

| fields time | mvexpand time

| makecontinuous time span=1d

| eval time=strftime(time,"%F")

| reverse

| stats list(time) as time

| return $time

]

| rename service_id as "Service ID", Verb as "Resource Name", trans_name as "Transaction Name", Priority as "Priority", service_area as "Service Area", application_name as "Consumer"

niketn · ‎08-14-2020

@adnankhan5133 you are performing too many preprocessing and filtering of data after pulling from index and before your stats command. Which will impact performance of your query.

Also you are performing eval on string time field with timezone adjustment by 4 hours. If your data is getting indexed with wrong timestamp identification you should handle timestamp extraction and timezone adjustment while indexing the data (not in all of your search time SPL which impacts performance). Please try the following search and confirm whether it matches your expectation.

1. I have _time calculated based on Timestamp string time field, however, you should handle this while indexing of data.

2. I have filtered only three combination of Verb, Transaction Name and Application Name from your data so that there is no need to pull unwanted data from index and filter later.

3. I have used timechart to create empty buckets of time using calculated key field which is built by combining Verb, Transaction Name and Application Name.

4. Finally used transpose to reverse row data as column and vice versa. (Epoch time is converted String Time of format YYYY-mm-dd before transpose so that it shows up properly as column).

Keep adding one pipe at a time to understand what each step is doing.

index=test_index_1 sourcetype=test_sourcetype_2 (Verb="GET" AND 'Transaction Name'="Transaction1" AND 'Application Name'="APP1") OR (Transaction Name'="Transaction2" AND 'Application Name'="App2") OR (Verb="PUT" AND 'Transaction Name'="Transaction2" AND 'Application Name'="App2)
| eval _time=strptime(Timestamp, "%Y-%m-%dT%H:%M:%S.%3QZ")-14400
| eval key=Verb."|".'Transaction Name'."|".'Application Name'
| timechart span=1d count('Status Code') as count by key 
| eval service_id=case(Verb="GET" AND 'Transaction Name'="Transaction1" AND 'Application Name'="APP1", "ID1", 
    Verb="GET" AND 'Transaction Name'="Transaction2" AND 'Application Name'="App2", "ID2", 
    Verb="PUT" AND 'Transaction Name'="Transaction2" AND 'Application Name'="App2", "ID3") 
| eval Priority=case(Verb="GET" AND 'Transaction Name'="Transaction1" AND 'Application Name'="APP1", "2",
    Verb="GET" AND 'Transaction Name'="Transaction2" AND 'Application Name'="App2", "2", 
    Verb="PUT" AND 'Transaction Name'="Transaction2" AND 'Application Name'="App2", "1") 
| eval service_area=case(Verb="GET" AND 'Transaction Name'="Transaction1" AND 'Application Name'="APP1", "Area1",
    Verb="GET" AND 'Transaction Name'="Transaction2" AND 'Application Name'="App2", "Area2",
    Verb="PUT" AND 'Transaction Name'="Transaction2" AND 'Application Name'="App2", "Member")
| eval Time=strftime(_time,"%Y-%m-%d")
| fields - _*
| transpose 0 header_field="Time" column_name=key
| eval key=split(key,"|"), Verb=mvindex(key,0), "Transaction Name"=mvindex(key,1), "Application Name"=mvindex(key,2)
| fields - key
| fields Verb "Transaction Name" "Application Name" *

Following is a run anywhere example based on sample data provided:

| makeresults 
| eval Timestamp=strftime(_time,"%Y-%m-%dT%H:%M:%S.%3QZ") 
| eval data="Verb=\"GET\",\"Transaction Name\"=\"Transaction1\",\"Application Name\"=\"APP1\";Verb=\"GET\",\"Transaction Name\"=\"Transaction2\",\"Application Name\"=\"App2\";Verb=\"PUT\",\"Transaction Name\"=\"Transaction2\",\"Application Name\"=\"App2\"" 
| makemv data delim=";" 
| mvexpand data 
| rename data as _raw 
| extract pairdelim="," kvdelim="=" 
| eval "Status Code"=random() 
| rename "Transaction_Name" as "Transaction Name","Application_Name" as "Application Name" 
| eval _time=strptime(Timestamp, "%Y-%m-%dT%H:%M:%S.%3QZ")-14400
| eval key=Verb."|".'Transaction Name'."|".'Application Name'
| timechart span=1d count('Status Code') as count by key 
| eval service_id=case(Verb="GET" AND 'Transaction Name'="Transaction1" AND 'Application Name'="APP1", "ID1", 
    Verb="GET" AND 'Transaction Name'="Transaction2" AND 'Application Name'="App2", "ID2", 
    Verb="PUT" AND 'Transaction Name'="Transaction2" AND 'Application Name'="App2", "ID3") 
| eval Priority=case(Verb="GET" AND 'Transaction Name'="Transaction1" AND 'Application Name'="APP1", "2",
    Verb="GET" AND 'Transaction Name'="Transaction2" AND 'Application Name'="App2", "2", 
    Verb="PUT" AND 'Transaction Name'="Transaction2" AND 'Application Name'="App2", "1") 
| eval service_area=case(Verb="GET" AND 'Transaction Name'="Transaction1" AND 'Application Name'="APP1", "Area1",
    Verb="GET" AND 'Transaction Name'="Transaction2" AND 'Application Name'="App2", "Area2",
    Verb="PUT" AND 'Transaction Name'="Transaction2" AND 'Application Name'="App2", "Member")
| eval Time=strftime(_time,"%Y-%m-%d")
| fields - _*
| transpose 0 header_field="Time" column_name=key
| eval key=split(key,"|"), Verb=mvindex(key,0), "Transaction Name"=mvindex(key,1), "Application Name"=mvindex(key,2)
| fields - key
| fields Verb "Transaction Name" "Application Name" *

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"

adnankhan5133 · ‎08-15-2020

Thanks @niketn - this is definitely helpful. We ended up getting a lot of assistance from @to4kawa , @xpac , and @Richfez who successfully resolved the issue. It did boil down to the timestamp logic.

Here is the final and working SPL:

index=hcnc_mulesoft_test_new sourcetype=api_events_legacy_new
| rename "API Name" as api_name, "Application Name" as application_name, "Status Code" as status_code 
| eval service_id=case(Verb="GET" AND api_name="Provider Data Management API" AND application_name="PHP-AMHC", "PRO-I-001-0001", Verb="GET" AND api_name="Member Data API" AND application_name="Enrollment Broker", "MEM-I-001-0001", Verb="PUT" AND api_name="Member Data API" AND application_name="Enrollment Broker", "MEM-I-002-0001", 1=1, "Unqualified") 
| where service_id!="Unqualified" 
| eval Priority=case(Verb="GET" AND api_name="Provider Data Management API" AND application_name="PHP-AMHC", "2", Verb="GET" AND api_name="Member Data API" AND application_name="Enrollment Broker", "2", Verb="PUT" AND api_name="Member Data API" AND application_name="Enrollment Broker", "1", 1=1, "Unqualified") 
| where Priority!="Unqualified" 
| eval service_area=case(Verb="GET" AND api_name="Provider Data Management API" AND application_name="PHP-AMHC", "Provider", Verb="GET" AND api_name="Member Data API" AND application_name="Enrollment Broker", "Member", Verb="PUT" AND api_name="Member Data API" AND application_name="Enrollment Broker", "Member", 1=1, "Unqualified") 
| where service_area!="Unqualified" 
| eval combined=service_id."@".Verb."@".api_name."@".Priority."@".service_area."@".application_name."@" 
| timechart span=1d count(eval(status_code)) as count by combined 
| untable _time combined count 
| eval date=strftime(_time, "%Y-%m-%d") 
| xyseries combined date count 
| rex field=combined "^(?<service_id>[^@]+)@(?<Verb>[^@]+)@(?<api_name>[^@]+)@(?<Priority>[^@]+)@(?<service_area>[^@]+)@(?<application_name>[^@]+)@$" 
| fields - combined 
| table service_id Verb api_name Priority service_area application_name 
    [ makeresults 
    | addinfo 
    | eval time = mvappend(relative_time(info_min_time,"@d"),relative_time(info_max_time,"@d")) 
    | fields time 
    | mvexpand time 
    | makecontinuous time span=1d 
    | eval time=strftime(time,"%F") 
    | reverse 
    | stats list(time) as time 
    | return $time
        ] 
| rename service_id as "Service ID", Verb as "Resource Name", api_name as "API Name", Priority as "Priority", service_area as "Service Area", application_name as "Consumer"

niketn · ‎08-15-2020

All my points for Performance improvement are still valid. Do include my SPL as well and check performance of your SPL vs mine using Job Inspector. I have tried to explain some of the basic Splunk Search Best practices (obviously SPL will still work without best practices as well.)

https://docs.splunk.com/Documentation/Splunk/latest/Search/Quicktipsforoptimization

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"

adnankhan5133 · ‎08-13-2020

Anyone? Hoping someone can help out here...

to4kawa · ‎08-13-2020

add
| foreach 2020* [ eval <<FIELD>> =if(isnull('<<FIELD>>') OR !match('<<FIELD>>',"\S+"),"No Data",'<<FIELD>>')]

adnankhan5133 · ‎08-14-2020

Hi @to4kawa - Thanks for the reply. However, when I include the following statement after the makeresults subsearch (which returns the time columns), the query ends up creating a new column called "<<FIELD>>" or whatever value I substitute in, such '<<count>>' or 'count'.

| foreach 2020* [ eval <<FIELD>> =if(isnull('<<FIELD>>') OR !match('<<FIELD>>',"\S+"),"No Data",'<<FIELD>>')]

Any suggestions on how to remedy this?

to4kawa · ‎08-14-2020

| makeresults 
| fillnull 2020-08-01 2020-08-02 count value=10
| foreach 2020* [ eval <<FIELD>>=random() % 10 + '<<FIELD>>']

this query works fine.

It turns out you're doing something extra for your query.

my query can't display "<<FIELD>>"

adnankhan5133 · ‎08-14-2020

Hey @to4kawa - I tried using your updated query. It ends up populating the 2020-08-01 and 2020-08-02 columns with random numbers.

Is there a way to get the date columns associated with non-existent values to populate with "0" ? For example, when the column with today's date (2020-8-14) generates at 12AM EST (US/Eastern), the column values are null/blank. They will only populate later in the morning once certain transactions begin occurring. The requirement is to ensure that those null/blank values display as "0". The current query isn't meeting the requirement. I tried including both your recommendations, and those are getting the null/blank values to populate with "0" either.

adnankhan5133 · ‎08-14-2020

index=test_index_1 sourcetype=test_sourcetype_2

| eval epoch_Timestamp=strptime(Timestamp, "%Y-%m-%dT%H:%M:%S.%3QZ")-14400

| rename "Transaction Name" as trans_name, "Application Name" as application_name, "Status Code" as status_code

| eval service_id=case(Verb="GET" AND trans_name="Transaction1" AND application_name="APP1", "ID1", Verb="GET" AND trans_name="Transaction2" AND application_name="App2", "ID2", Verb="PUT" AND trans_name="Transaction2" AND application_name="App2", "ID3", 1=1, "Unqualified")

| where service_id!="Unqualified"

| eval Priority=case(Verb="GET" AND trans_name="Transaction1" AND application_name="APP1", "2", Verb="GET" AND trans_name="Transaction2" AND application_name="App2", "2", Verb="PUT" AND trans_name="Transaction2" AND application_name="App2", "1", 1=1, "Unqualified")

| where Priority!="Unqualified"

| eval service_area=case(Verb="GET" AND trans_name="Transaction1" AND application_name="APP1", "Area1", Verb="GET" AND trans_name="Transaction2" AND application_name="App2", "Area2", Verb="PUT" AND trans_name="Transaction2" AND application_name="App2", "Member", 1=1, "Unqualified")

| where service_area!="Unqualified"

| eval date_reference=strftime(epoch_Timestamp, "%Y-%m-%d")

| stats count(eval(status_code)) as count by service_id, Verb, trans_name, Priority, service_area, application_name, date_reference

| eval combined=service_id."@".Verb."@".trans_name."@".Priority."@".service_area."@".application_name."@"

| xyseries combined date_reference count

| rex field=combined "^(?<service_id>[^\@]+)\@(?<Verb>[^\@]+)\@(?<trans_name>[^\@]+)\@(?<Priority>[^\@]+)\@(?<service_area>[^\@]+)\@(?<application_name>[^\@]+)\@$"

| fillnull value="0"

| table service_id, Verb, trans_name, Priority, service_area, application_name

[ makeresults | addinfo

| eval time = mvappend(relative_time(info_min_time,"@d"),relative_time(info_max_time,"@d"))

| fields time | mvexpand time

| makecontinuous time span=1d

| eval time=strftime(time,"%F")

| reverse

| stats list(time) as time

| return $time

]

| rename service_id as "Service ID", Verb as "Resource Name", trans_name as "Transaction Name", Priority as "Priority", service_area as "Service Area", application_name as "Consumer"

adnankhan5133 · ‎08-14-2020

I also tried the following, along with filldown, fillnull, and a couple of eval statements, but unfortunately, I'm not making any progress.

| foreach notNULL
[ eval time = if(isnull(time) OR len(time)==0, "0", time)

How to populate nonexistent values in a table?

eval

lookup

regex

subsearch

table

transaction

Extending Observability Content to Splunk Cloud

More Control Over Your Monitoring Costs with Archived Metrics!

New in Observability Cloud - Explicit Bucket Histograms