Getting Data In

Index based on Raw Data?

Cuyose
Builder

I run a python script to get data into an indexer from mdb files, this basically creates events with source, host, sourcetype and raw data. We are almost always concerned with only reporting on the raw data. millions of rows are generated in csv format and I have created custom fields within the raw data with splunk having no problem identifying those.

The issue is doing a search to find 2 distinct values in those 12 million + rows takes forever, it parses all 12 million rows before returning the values.

Tags (2)
0 Karma

Cuyose
Builder

index = perfdata | dedup LR_Run_Name

Where LR_Run_Name is in the raw data and we extracted the field value. I checked the fields.conf and there are no indexed values in there on these fields we extracted, they all look like this

Out of millions of rows there are only a handful of unique values in the indexed raw data.

[sourcetype]
INDEXED = True
INDEXED_VALUE = False

Would I add something like?
[LR_Run_Name]
INDEXED = True
INDEXED_VALUE = False

0 Karma

sideview
SplunkTrust
SplunkTrust

Can you paste the exact search you're using?

In a nutshell, if Splunk is having to read all the data off disk, then the most likely reason is that your searchterms are either not in the initial search clause... ie you're doing something like

`sourcetype=foo | <some other command(s)> | search <searchterms>`

Then there are a lot of other strange possibilities, like, to take a random example, you could have a foo="bar" term, and you could have it in the initial search clause, but then for some reason something could have configured INDEXED_VALUE in fields.conf to be false for that field.

In any event, without seeing the search it's hard to speculate on the answer, but there most likely is an answer, and it's probably fixable.

0 Karma

Supriya
Path Finder

Could you please help me out how to search multiple words from raw data

0 Karma

Simeon
Splunk Employee
Splunk Employee

What is the keyword you are searching and exact query? How often does it exist in the raw data?

0 Karma
Get Updates on the Splunk Community!

Building Reliable Asset and Identity Frameworks in Splunk ES

 Accurate asset and identity resolution is the backbone of security operations. Without it, alerts are ...

Cloud Monitoring Console - Unlocking Greater Visibility in SVC Usage Reporting

For Splunk Cloud customers, understanding and optimizing Splunk Virtual Compute (SVC) usage and resource ...

Automatic Discovery Part 3: Practical Use Cases

If you’ve enabled Automatic Discovery in your install of the Splunk Distribution of the OpenTelemetry ...