Splunk Search

How would I search what records are in the larger of 2 csv Files Created with outputcsv?

genesiusj
Builder

Hello,

I have 2 csv files created using outputcsv. Because of their size (500K records +) AND because they are really data sources and not true lookups (requiring bundle replication) outputcsv and not outputlookup was used. The issue of having is how to find what records are in the larger of the csv files. If these were generated with outputlookup, no problem.

 

 

| inputlookup csv1
| lookup csv2 field2 as field1 output field2a
| where isnull(field2a)

 

 

However, with outputcsv files, Splunk doesn't know the csv is NOT in the app/lookups folder. *There should be a way to override the default location within the SPL.

Thanks and God bless,
Genesius

Edited: 500K+ records each. Not 500 each. This is why have to use the | outputcsv command.
The | outputlookup command will create very large bundles on the indexer.

Labels (1)
Tags (1)
0 Karma
1 Solution

bowesmana
SplunkTrust
SplunkTrust

If you are trying to use outputcsv to create LOOKUP files that can be used as lookups - you can't.

outputcsv creates files that are NOT available for lookups. inputcsv can be used to retrieve a file created with outputcsv, but it is still NOT a lookup file and you cannot use it as a lookup.

The issue around bundle replication is real and creating huge lookups does break replication for all apps on the search head, so you are sensibly trying to address the issue.

However, if you need to use one of your CSVs as a lookup and you don't want that huge file to be replicated, then you will need to configure settings in the app's distsearch.conf

See the spec here

https://docs.splunk.com/Documentation/Splunk/9.0.4/Admin/Distsearchconf#.27classic.27_REPLICATION-SP...

and you should look at these two parameters

concerningReplicatedFileSize = <integer>
excludeReplicatedLookupSize = <integer>

This will allow you to create lookups with outputlookup and use lookup against those lookups you have created.

Note that outputcsv is also not supported in Splunk Cloud, so is not a practical option if you are going to migrate to Cloud at any time.

 

View solution in original post

genesiusj
Builder

Just as a follow up.

The reason for the large lookup files is because we have to run multiple dbxquery and dbxlookup commands to bring data in from multiple sources (hundreds of thousands of records). However, we now working with our DBAs to assist in writing more efficient queries that will join multiple tables/views. Hopefully, reducing the number and runtime for the queries.

Thanks @bowesmana and @richgalloway for your help.
God bless,
Genesius

bowesmana
SplunkTrust
SplunkTrust

If you are trying to use outputcsv to create LOOKUP files that can be used as lookups - you can't.

outputcsv creates files that are NOT available for lookups. inputcsv can be used to retrieve a file created with outputcsv, but it is still NOT a lookup file and you cannot use it as a lookup.

The issue around bundle replication is real and creating huge lookups does break replication for all apps on the search head, so you are sensibly trying to address the issue.

However, if you need to use one of your CSVs as a lookup and you don't want that huge file to be replicated, then you will need to configure settings in the app's distsearch.conf

See the spec here

https://docs.splunk.com/Documentation/Splunk/9.0.4/Admin/Distsearchconf#.27classic.27_REPLICATION-SP...

and you should look at these two parameters

concerningReplicatedFileSize = <integer>
excludeReplicatedLookupSize = <integer>

This will allow you to create lookups with outputlookup and use lookup against those lookups you have created.

Note that outputcsv is also not supported in Splunk Cloud, so is not a practical option if you are going to migrate to Cloud at any time.

 

richgalloway
SplunkTrust
SplunkTrust

Have you tried the inputcsv command?

---
If this reply helps you, Karma would be appreciated.
0 Karma

genesiusj
Builder

@richgalloway 

A lookup file created using the | outputcsv command is not accessible to the | lookup command. Unless I missed it somewhere, when the | lookup command is run, Splunk looks for the whatever.csv file in the current applications lookups folder.

/opt/splunk/etc/apps/search/lookups

Where those created by the  | outputcsv command are in the /opt/splunk/var/run/splunk/csv folder. And the | lookup command has no way of knowing this.

Thanks and God bless,
Genesius

0 Karma

richgalloway
SplunkTrust
SplunkTrust

I suppose that's all true, but it has nothing to do with my answer.  I suggested the inputcsv command, the logical counterpart to outputcsv.

---
If this reply helps you, Karma would be appreciated.
0 Karma

genesiusj
Builder

@richgalloway 

I believe I might have confused the issue.

When a lookup file is created with the | outputlookup command, it is available to be used with the | lookup command.

When a lookup file is created with the | outputcsv command, it is not available to be used with the | lookup command.

Both of my lookup files, abc.csv and xyz.csv, were created with the | outputcsv command. Therefore, the | lookup command cannot be used.

This works.

| inputlookup abc.csv
| lookup xyz.csv field1 output field2

This does not.

| inputcsv abc.csv
| lookup xyz.csv field1 output field2

Thanks and God bless,
Genesius

0 Karma

richgalloway
SplunkTrust
SplunkTrust

I agree that the issue is confused.  First, the inputlookup command doesn't work, then it does. I don't know where to go from here.

Have you considered using outputlookup rather than outputcsv?

---
If this reply helps you, Karma would be appreciated.
0 Karma
Get Updates on the Splunk Community!

Now Available: Cisco Talos Threat Intelligence Integrations for Splunk Security Cloud ...

At .conf24, we shared that we were in the process of integrating Cisco Talos threat intelligence into Splunk ...

Preparing your Splunk Environment for OpenSSL3

The Splunk platform will transition to OpenSSL version 3 in a future release. Actions are required to prepare ...

Easily Improve Agent Saturation with the Splunk Add-on for OpenTelemetry Collector

Agent Saturation What and Whys In application performance monitoring, saturation is defined as the total load ...