Map command - How to make different searches in di...

bdfr49 · ‎06-04-2020

Hello,

I am currently trying to relate "front" logs to "back" logs depending on their sessionIds and their timestamps in order to understand the errors I am getting (putting face to face "front results" and "back results") .

The logic "flow" is as follows :

I am looking for service A logs that returned a 400 http code ("front" logs, so I need to be within my first index : let's call it front_index)
For each log (one log = one error that occured), I want to extract its timestamp and its sessionId
For each row I am getting, I want to be able to look for "back" logs (which means I am switching to my second index : let's call it back_index) depending on the timestamp and the sessionId. Each "front" logs can have several "back" logs.
Finally, I want to be able to print some details such as the timestamp, the sessionId, some detailed errorCode if present, count if relevant, etc... But that's not the point 🙂

If I am doing it manually, here are the two searches I am running :

search 1 :

index=front_index sourcetype=access_combined "/url/of/my/service" http_response_code=400

results 1 : list of log where I can manually extract the sessionId and the timestamp of each log I want to analyse

search 2 :

index="back_index" ** **

results 2 : I am getting different kind of logs that I manually read in order to extract the information I am looking for

This work well, but on large amount data, it's just.. not the way it should be done 😛

So here is what I tried in order to help me going faster :

search 3 :

index=front_index sourcetype=access_combined "/url/of/my/service" http_response_code=400 | table hour, minute, sessionId 

| map search="search index=back_index $hour$:$minute$ $sessionId$ 

| table _timestamp, session, errorCode"

expectation : I am expecting the first part to extract for each log found the hour, the minute and the sessionId in the front_index and it seems to be fine but then I want the second search to iterate on each row of the first one and to look for all the logs it can find in the back_index related to the timestamp (the minute is precise enough as my timestamps logs don't always perfectly match) and the sessionId.

My issue seems to be that I can't change the index I am working on. Every data I retrieve are from the front_index even if I know the data I am looking for are there. A first step would be to get data from both indexes in the final list of events (or at least from the back_index as this is from there that I will get the details I want).
And I can't figure why I can't do that. I tried to use wildcards in order to search in both indexes as their names are partially the same, but it does not seems to work.

I tried to look the different subjects related to the map command, but I did not find what could help me (or I missed it) or worse, I totally misunderstood something about the command.
The map command seems to be the right way to do what I am trying to do, but.. if there is a better/simpler way, I am also interested of course.

Thanks for your help,

b

richgalloway · ‎06-04-2020

I suspect this is not the best use of map. For one, map defaults to 100 iterations, which may not be enough. Worse, however, is you may find yourself scanning a very large back_index up to 100 times.

Try combining the two searches the Splunk way. Assuming you have "hour", "minute", and "sessionId" fields in both indexes, this should do it.

(index=front_index sourcetype=access_combined "/url/of/my/service" http_response_code=400) OR index=back_index 
| stats values(*) as * by hour, minute, sessionId
| table _timestamp, session, errorCode

---
If this reply helps you, Karma would be appreciated.

bdfr49 · ‎06-11-2020

Thanks for your answer !
I understand the map command is not the best way to do this, speaking of performances (did not thought that way while trying to build my request but I should have 😅)

But the query you gave me does not seem to return what I expect.

The search I make on the "front_index" is supposed to give me the ID and the timestamp of each iteration I need to check on the "back_index". Following this idea, I am expecting to get (in my final table) only the logs that matched the ID and the timestamp which should lead to one, two or maybe three logs for each iteration.
But here, I am getting way more logs than expected. Some of them don't even match with what I can manually see on the front (so, it is out of "scope").

I was expecting the following command to match front log with back logs depending on the parameters that follow (hour, minute, sessionId) but it does not seem to work like that :

| stats values(*) as * by hour, minute, sessionId

Have I misunderstood the way this command works ?

Here, without using the table command at the end, I am getting a table which has unrelated logs (back logs from 22h35m** while I have no front logs for this timestamp and the http code I am looking for).

In a simple word, I am trying to restrict my "back search" depending on what my "front search" returned me.

Again, thank you for your help.
I'll keep trying 😅

richgalloway · ‎06-11-2020

Let's see if we can reduce the amount of excess events. This modified query will only consider events that have a sessionId field.

(index=front_index sourcetype=access_combined "/url/of/my/service" http_response_code=400 sessionId=*) OR (index=back_index sessionId=*)
| stats values(*) as * by hour, minute, sessionId
| table _timestamp, session, errorCode

If that's not enough, try adding 'hour=* minute=*' to each phrase in the base query.

---
If this reply helps you, Karma would be appreciated.

bdfr49 · ‎06-15-2020

One of the issue I have (and that was not explicit because, I did not thought about it until now, sorry), is that my sessionId is clearly indexed on the front logs (ie : got sessionId=xxxxxxxxx) whereas it is not on the back logs (the sessionId is appearing aggregated to an other id and is not clearly indexed - the format of the log is, let's say, "perfectible")

Currently, I just a made a dashboard to "help" me thinking :

1/ Extraction of the sessionIds related the 400 errors (and their hour, minute, second, ...)

index="front_index" sourcetype=access_combined "/url/of/my/service" http_code=400 | table timestamp, hour, minute, second, sessionId

So the results is as follow :

timestamp	hour	minute	second	sessionId
2020-06-19 18:21:53	18	21	53	sessionIdOfMySecondError
2020-06-19 18:10:45	18	10	45	sessionIdOfMyFirstError

2/ As I click on one of the line of my previous search, I get to extract and re-use some values to identify the back logs that I want.

index="back_index" *$sessionId$* *$hour$:$minute$:$second$* | where errorCode!="" | table timestamp, errorCode

As the errorCode I want is not always there and is not properly indexed (again - I had to extract it with a regex), I use the "where" clause to exclude the events where it is missing.

So, if I am cliking on the first row of my first table :

timestamp	errorCode
2020-06-19 18:10:45	explicit_error_code

Here is an example of the back log event I am interested in (there are some others with a different template, but, this the first one) :

2020-06-19|18:21:53|ACH|someId_sessionId|ERROR|my.domain.error.mapper.GenericExceptionMapper|ServiceException contextualErrorCode=xxxx, reponse will be ErrorCode=explicit_error_code1

This works fine, but it forces me to "blindly" investigate each front log hoping for some helpful back logs.

So what I would like in the end would be as follows :

timestamp	sessionId	errorCode
2020-06-19 18:21:53	sessionIdOfMySecondError	explicit_error_code2
2020-06-19 18:10:45	sessionIdOfMyFirstError	explicit_error_code1

The map command looked ideal to get to this kind of results (not speaking of performance at all which must be taken into account of course) whereas, using the "OR" clause with the stats command seemed to "just mix the events" (no matter the timestamp nor the sessionId) => it does not seem to permit to build the table I want.

Don't get me wrong : I am not sure about that, maybe I am totally wrong 🙂

I thought about using :

subsearch
join

But they don't seem to be meant for this "loopy" logic I am thinking of (which may not be the best way to look at it, but I am struggling thinking in a different manner right now)

Thanks again.
Hope I am not completely wrong about this (missing the point of your solutions :))

Map command - How to make different searches in different indexes ?

Accelerating Observability as Code with the Splunk AI Assistant

Integrating Splunk Search API and Quarto to Create Reproducible Investigation ...

Congratulations to the 2025-2026 SplunkTrust!

Join the Conversation

Map command - How to make different searches in different indexes ?

Accelerating Observability as Code with the Splunk AI Assistant

Integrating Splunk Search API and Quarto to Create Reproducible Investigation ...

Congratulations to the 2025-2026 SplunkTrust!