All Apps and Add-ons

Why is the map command, in the jellyfisher app, neither returning all results nor erroring out?

teresachila
Path Finder

I am using the jellyfisher app and I want to calculate the jaro winkler distance between a list of words (1150 unique values, comparing each combination), which means it should return 1150x1150=1,322,500 results. I have the query as follow using the map command:

| inputlookup wordlist.csv | rename word as sw1  
| map [|inputlookup wordlist.csv  | rename word as sw2 | eval sw3="$sw1$" | jellyfisher jaro_winkler(sw2,sw3) | eval sid=$_serial_id$] maxsearches=1150

It is working except that it only returns the first 50600 results (1150x44). The sid being displayed only showed up to 44. There is no error in the search log. However, in the search log, I can see it was evaluating up to sid=1150, but it is not showing in the results. I couldn't find any error log. Is there some configuration that is restricting this?
Thanks!

0 Karma
1 Solution

DalJeanis
SplunkTrust
SplunkTrust

First, as a general case, subsearches are limited to 50K records. you can check your limits.conf stanza called [subsearch] for the exact details.

In this case, since it sounds like you may be getting results which are going away, you might be running into some other limitation on how many Meg of results you are allowed to keep, or how long the system will run your search before timing out.

Consider sending each individual run to a separate file with something like this inside your map

... | outputcsv myoutput.$_serial_id$.csv | where false()  ...

The first will output the records to a csv, the second will delete the data stream so that any overall record limits are not encountered. it won't help you with time limits, but you can deal with that any number of ways.

View solution in original post

DalJeanis
SplunkTrust
SplunkTrust

First, as a general case, subsearches are limited to 50K records. you can check your limits.conf stanza called [subsearch] for the exact details.

In this case, since it sounds like you may be getting results which are going away, you might be running into some other limitation on how many Meg of results you are allowed to keep, or how long the system will run your search before timing out.

Consider sending each individual run to a separate file with something like this inside your map

... | outputcsv myoutput.$_serial_id$.csv | where false()  ...

The first will output the records to a csv, the second will delete the data stream so that any overall record limits are not encountered. it won't help you with time limits, but you can deal with that any number of ways.

teresachila
Path Finder

Thanks! I can see all the files on the search head and each file has the correct number of lines. No time limit issue. I still don't get the answer from the UI but at least this is a workaround. Thanks!

Get Updates on the Splunk Community!

Observability | How to Think About Instrumentation Overhead (White Paper)

Novice observability practitioners are often overly obsessed with performance. They might approach ...

Cloud Platform | Get Resiliency in the Cloud Event (Register Now!)

IDC Report: Enterprises Gain Higher Efficiency and Resiliency With Migration to Cloud  Today many enterprises ...

The Great Resilience Quest: 10th Leaderboard Update

The tenth leaderboard update (11.23-12.05) for The Great Resilience Quest is out >> As our brave ...