Hello,
I have 2 csv files created using outputcsv. Because of their size (500K records +) AND because they are really data sources and not true lookups (requiring bundle replication) outputcsv and not outputlookup was used. The issue of having is how to find what records are in the larger of the csv files. If these were generated with outputlookup, no problem.
| inputlookup csv1
| lookup csv2 field2 as field1 output field2a
| where isnull(field2a)
However, with outputcsv files, Splunk doesn't know the csv is NOT in the app/lookups folder. *There should be a way to override the default location within the SPL.
Thanks and God bless,
Genesius
Edited: 500K+ records each. Not 500 each. This is why have to use the | outputcsv command.
The | outputlookup command will create very large bundles on the indexer.
If you are trying to use outputcsv to create LOOKUP files that can be used as lookups - you can't.
outputcsv creates files that are NOT available for lookups. inputcsv can be used to retrieve a file created with outputcsv, but it is still NOT a lookup file and you cannot use it as a lookup.
The issue around bundle replication is real and creating huge lookups does break replication for all apps on the search head, so you are sensibly trying to address the issue.
However, if you need to use one of your CSVs as a lookup and you don't want that huge file to be replicated, then you will need to configure settings in the app's distsearch.conf
See the spec here
and you should look at these two parameters
concerningReplicatedFileSize = <integer>
excludeReplicatedLookupSize = <integer>
This will allow you to create lookups with outputlookup and use lookup against those lookups you have created.
Note that outputcsv is also not supported in Splunk Cloud, so is not a practical option if you are going to migrate to Cloud at any time.
Just as a follow up.
The reason for the large lookup files is because we have to run multiple dbxquery and dbxlookup commands to bring data in from multiple sources (hundreds of thousands of records). However, we now working with our DBAs to assist in writing more efficient queries that will join multiple tables/views. Hopefully, reducing the number and runtime for the queries.
Thanks @bowesmana and @richgalloway for your help.
God bless,
Genesius
If you are trying to use outputcsv to create LOOKUP files that can be used as lookups - you can't.
outputcsv creates files that are NOT available for lookups. inputcsv can be used to retrieve a file created with outputcsv, but it is still NOT a lookup file and you cannot use it as a lookup.
The issue around bundle replication is real and creating huge lookups does break replication for all apps on the search head, so you are sensibly trying to address the issue.
However, if you need to use one of your CSVs as a lookup and you don't want that huge file to be replicated, then you will need to configure settings in the app's distsearch.conf
See the spec here
and you should look at these two parameters
concerningReplicatedFileSize = <integer>
excludeReplicatedLookupSize = <integer>
This will allow you to create lookups with outputlookup and use lookup against those lookups you have created.
Note that outputcsv is also not supported in Splunk Cloud, so is not a practical option if you are going to migrate to Cloud at any time.
Have you tried the inputcsv command?
A lookup file created using the | outputcsv command is not accessible to the | lookup command. Unless I missed it somewhere, when the | lookup command is run, Splunk looks for the whatever.csv file in the current applications lookups folder.
/opt/splunk/etc/apps/search/lookups
Where those created by the | outputcsv command are in the /opt/splunk/var/run/splunk/csv folder. And the | lookup command has no way of knowing this.
Thanks and God bless,
Genesius
I suppose that's all true, but it has nothing to do with my answer. I suggested the inputcsv command, the logical counterpart to outputcsv.
I believe I might have confused the issue.
When a lookup file is created with the | outputlookup command, it is available to be used with the | lookup command.
When a lookup file is created with the | outputcsv command, it is not available to be used with the | lookup command.
Both of my lookup files, abc.csv and xyz.csv, were created with the | outputcsv command. Therefore, the | lookup command cannot be used.
This works.
| inputlookup abc.csv
| lookup xyz.csv field1 output field2
This does not.
| inputcsv abc.csv
| lookup xyz.csv field1 output field2
Thanks and God bless,
Genesius
I agree that the issue is confused. First, the inputlookup command doesn't work, then it does. I don't know where to go from here.
Have you considered using outputlookup rather than outputcsv?