Solved: Why is the map command, in the jellyfisher app, ne...

teresachila · ‎02-07-2018

I am using the jellyfisher app and I want to calculate the jaro winkler distance between a list of words (1150 unique values, comparing each combination), which means it should return 1150x1150=1,322,500 results. I have the query as follow using the map command:

| inputlookup wordlist.csv | rename word as sw1  
| map [|inputlookup wordlist.csv  | rename word as sw2 | eval sw3="$sw1$" | jellyfisher jaro_winkler(sw2,sw3) | eval sid=$_serial_id$] maxsearches=1150

It is working except that it only returns the first 50600 results (1150x44). The sid being displayed only showed up to 44. There is no error in the search log. However, in the search log, I can see it was evaluating up to sid=1150, but it is not showing in the results. I couldn't find any error log. Is there some configuration that is restricting this?
Thanks!

DalJeanis · ‎02-07-2018

First, as a general case, subsearches are limited to 50K records. you can check your limits.conf stanza called [subsearch] for the exact details.

In this case, since it sounds like you may be getting results which are going away, you might be running into some other limitation on how many Meg of results you are allowed to keep, or how long the system will run your search before timing out.

Consider sending each individual run to a separate file with something like this inside your map

... | outputcsv myoutput.$_serial_id$.csv | where false()  ...

The first will output the records to a csv, the second will delete the data stream so that any overall record limits are not encountered. it won't help you with time limits, but you can deal with that any number of ways.

View solution in original post

DalJeanis · ‎02-07-2018

First, as a general case, subsearches are limited to 50K records. you can check your limits.conf stanza called [subsearch] for the exact details.

In this case, since it sounds like you may be getting results which are going away, you might be running into some other limitation on how many Meg of results you are allowed to keep, or how long the system will run your search before timing out.

Consider sending each individual run to a separate file with something like this inside your map

... | outputcsv myoutput.$_serial_id$.csv | where false()  ...

The first will output the records to a csv, the second will delete the data stream so that any overall record limits are not encountered. it won't help you with time limits, but you can deal with that any number of ways.

teresachila · ‎02-08-2018

Thanks! I can see all the files on the search head and each file has the correct number of lines. No time limit issue. I still don't get the answer from the UI but at least this is a workaround. Thanks!

Why is the map command, in the jellyfisher app, neither returning all results nor erroring out?

Index This | Why did the turkey cross the road?

Enter the Agentic Era with Splunk AI Assistant for SPL 1.4

Feel the Splunk Love: Real Stories from Real Customers

Are you a member of the Splunk Community?

Why is the map command, in the jellyfisher app, neither returning all results nor erroring out?

Index This | Why did the turkey cross the road?

Enter the Agentic Era with Splunk AI Assistant for SPL 1.4

Feel the Splunk Love: Real Stories from Real Customers