Getting Data In

lookup csv file contains multiple occurrences of items. Need to query these items in an index for each unique time stamp range in csv

ebele
New Member

lookup csv format where EVENT_ID can have multiple SiteID fields and SiteID can have multiple EVENT_IDs. Only SiteID is a field in the splunk index.

YEAR, SiteID, earliest_date, latest_date, EVENT_ID
2019, AB111, 1560988800, 1562112000, ABSE00350
2019, AB111, 1562198400, 1563321600, ABSE00351
2019, AB111, 1548892800, 1550016000, ABSE00352
2019, AB112, 1548892800, 1550016000, ABSE00352

I use the lookup to query an index, to calculate a KPI for each row.

Ideal query Output (KPI computed for unique combination of SiteID & EVENT_ID):

SiteID, KPI, EVENT_ID
AB111, 68.4, ABSE00350
AB111, 74.3, ABSE00351
AB111, 22.1, ABSE00352
AB112, 34.5, ABSE00352

This is the top of my code, where I do the inputlookup, before proceeding to calculate the KPI from data in the index. However it only gives me a result with a single aggregate of the SiteID, not unique per row as desired.

index=pm_busy_half_hour
[| inputlookup Test2.csv
| rename earliest_date as earliest, latest_date as latest
| table SiteID earliest latest
]
.....
.....
.....

Please advise

0 Karma

ololdach
Builder

Hi ebele,
I assume that only the SiteID field is in your indexed events, not the EVENT_ID. If the assumption is correct, then a possible solution could be:

index=pm_busy_half_hour | dedup SiteID | lookup Test2.csv SiteID as SiteID output EVENT_ID | mvexpand EVENT_ID | lookup Test2.csv EVENT_ID output earliest_date,latest_date  | eval KPI=round(random() % 1000/10,1)

First, select all SiteID's that you'd like the KPI on and make sure they are unique (dedup). Then lookup all EVENT_IDs for the SiteIDs that you are interested in. They will be returned as MV multivalue field. Use mvexpand to make every EVENT_ID into a unique event. Use the same lookup file again, using the EVENT_ID as key this time, to retrieve the dates for each EVENT_ID. From here on do whatever you do to calculate the KPI. The random is just to show the concept in a search that does somethin.

Hope it helps
Oliver

0 Karma

ebele
New Member

Thanks @ololdach for the feedback on using multivalue field. Yes, only the SiteID field is in the indexed events.

After I try this, it seems the search is not recognizing 'earliest_date' and 'latest_date' as time fields. It does not calculate the KPI for each row between these time windows. Instead uses the default time range picker. I believe that that is why I did this renaming in my earlier sample code:

| rename earliest_date as earliest, latest_date as latest

Also, I need to apply this to a lookup csv with same format, but where EVENT_ID can have multiple SiteID fields and SiteID can have multiple EVENT_IDs. (added fourth row to illustrate this):

YEAR, SiteID, earliest_date, latest_date, EVENT_ID
2019, AB111, 1560988800, 1562112000, ABSE00350
2019, AB111, 1562198400, 1563321600, ABSE00351
2019, AB111, 1548892800, 1550016000, ABSE00352
2019, AB112, 1548892800, 1550016000, ABSE00352

Ideal query Output: (KPI computed for unique combination of SiteID & EVENT_ID)
SiteID, KPI, EVENT_ID
AB111, 68.4, ABSE00350
AB111, 74.3, ABSE00351
AB111, 22.1, ABSE00352
AB112, 34.5, ABSE00352

0 Karma

ololdach
Builder

Hi ebele,

you have epoch timestamps in your sample data. If you want the the timestamps to display as human readable dates, use an

 

... | eval human_earliest=strftime(earliest_date,"%c") | ...

 

 If you have a dashboard with a time range picker that populates a token called "time_selection" and would like to  have your search to deliver only the events in the selected time range add the following to your search:

 

...| where earliest_date>=$time_selection.earliest$ AND latest_date<$time_selection.latest$ | ...

 

Last, but not least, if you would like to honor the timestamps in your standard search bar, using the standard time selector on the right of the splunk search field... add the following to your search:

...| addinfo | where earliest_time>=info_min_time AND latest_time<info_max_time | ...

I assume that you only want those KPIs that have both, the earliest and the latest time inside the time range. You may need to tweak that to meet your requirements.

Hope it helps,

Oliver

0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.

Can’t make it to .conf25? Join us online!

Get Updates on the Splunk Community!

Can’t Make It to Boston? Stream .conf25 and Learn with Haya Husain

Boston may be buzzing this September with Splunk University and .conf25, but you don’t have to pack a bag to ...

Splunk Lantern’s Guide to The Most Popular .conf25 Sessions

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...

Unlock What’s Next: The Splunk Cloud Platform at .conf25

In just a few days, Boston will be buzzing as the Splunk team and thousands of community members come together ...