Getting Data In

How come Splunk only can read 10000 lines from my csv? I need 9000000!

christianubeda
Path Finder

Hi team!

I have a problem.

I want to match two fields. The first one is an src_ip from an indexer(traffic events) the second one is an IP from a CSV.

My CSV has 9.000.000 lines and inputlookup only can read the first 10.000 lines...

how can I do it??

Tags (2)
0 Karma

woodcock
Esteemed Legend

It is hard to answer when you do not show us your search. Why would you not share your SPL?

0 Karma

woodcock
Esteemed Legend

DO NOT USE JOIN. It has limits. Try this:

index=cesa_paloalto sourcetype="pan:traffic" type=TRAFFIC vendor_action=allow
| lookup ipsmalware2.csv ip AS src_ip OUTPUT ip AS keepMeIfNonNull
| where isnotnull(keepMeIfNonNull)
| stats values(src_ip)

christianubeda
Path Finder

This is the query

index=cesa_paloalto sourcetype="pan:traffic" type=TRAFFIC vendor_action=allow | join src_ip [| inputlookup append=t ipsmalware2.csv | eval src_ip=Ip]
| stats values(src_ip)

I introduced 5 IP's in my csv

Line 1 OK
Line 9999 OK
Line 10004 Fail
Line 12000 FAIL
Line 222333(last one) FAIL

0 Karma

Vijeta
Influencer

Do you have 900000 unique Ip in your lookup? If not can you use dedup ip-

index=cesa_paloalto sourcetype="pan:traffic" type=TRAFFIC vendor_action=allow | join src_ip [| inputlookup append=t ipsmalware2.csv | dedup ip|eval src_ip=Ip]
| stats values(src_ip)
0 Karma

christianubeda
Path Finder

Hi,

Yes, I have 9000000 unique IP.

0 Karma

Vijeta
Influencer

How many unique IP your index returns within the timeframe you are searching?

0 Karma

Vijeta
Influencer

If that is less than 10K then you can try below

|inputlookup append=t ipsmalware2.csv | eval src_ip=Ip|join src_ip type=inner[|search index=cesa_paloalto sourcetype="pan:traffic" type=TRAFFIC vendor_action=allow ]
0 Karma

christianubeda
Path Finder

Hi Vijeta,

I only have matches if the IP from my csv is in the first 10000 lines. Actually I have 9000000 lines in my csv. So It didn`t work.

If I do that |inputlookup append=t ipsmalware2.csv I see al files. The problem is when I try to match them...

0 Karma

Vijeta
Influencer

That is not coz of matching , that is limitation of a subsearch. The subsearch returns you only 10K results and that is why rest all appear as not matched.

0 Karma

Vijeta
Influencer

Are you using lookup in a subsearch, probably that is limiting it to 10K results as there is a max limit for subserach results. Can you use dedup on IP or avoid subsearch by any means? Also would be better if you can paste your query here.

0 Karma
Get Updates on the Splunk Community!

Stay Connected: Your Guide to January Tech Talks, Office Hours, and Webinars!

What are Community Office Hours? Community Office Hours is an interactive 60-minute Zoom series where ...

[Puzzles] Solve, Learn, Repeat: Reprocessing XML into Fixed-Length Events

This challenge was first posted on Slack #puzzles channelFor a previous puzzle, I needed a set of fixed-length ...

Data Management Digest – December 2025

Welcome to the December edition of Data Management Digest! As we continue our journey of data innovation, the ...