Getting Data In

How come Splunk only can read 10000 lines from my csv? I need 9000000!

christianubeda
Path Finder

Hi team!

I have a problem.

I want to match two fields. The first one is an src_ip from an indexer(traffic events) the second one is an IP from a CSV.

My CSV has 9.000.000 lines and inputlookup only can read the first 10.000 lines...

how can I do it??

Tags (2)
0 Karma

woodcock
Esteemed Legend

It is hard to answer when you do not show us your search. Why would you not share your SPL?

0 Karma

woodcock
Esteemed Legend

DO NOT USE JOIN. It has limits. Try this:

index=cesa_paloalto sourcetype="pan:traffic" type=TRAFFIC vendor_action=allow
| lookup ipsmalware2.csv ip AS src_ip OUTPUT ip AS keepMeIfNonNull
| where isnotnull(keepMeIfNonNull)
| stats values(src_ip)

christianubeda
Path Finder

This is the query

index=cesa_paloalto sourcetype="pan:traffic" type=TRAFFIC vendor_action=allow | join src_ip [| inputlookup append=t ipsmalware2.csv | eval src_ip=Ip]
| stats values(src_ip)

I introduced 5 IP's in my csv

Line 1 OK
Line 9999 OK
Line 10004 Fail
Line 12000 FAIL
Line 222333(last one) FAIL

0 Karma

Vijeta
Influencer

Do you have 900000 unique Ip in your lookup? If not can you use dedup ip-

index=cesa_paloalto sourcetype="pan:traffic" type=TRAFFIC vendor_action=allow | join src_ip [| inputlookup append=t ipsmalware2.csv | dedup ip|eval src_ip=Ip]
| stats values(src_ip)
0 Karma

christianubeda
Path Finder

Hi,

Yes, I have 9000000 unique IP.

0 Karma

Vijeta
Influencer

How many unique IP your index returns within the timeframe you are searching?

0 Karma

Vijeta
Influencer

If that is less than 10K then you can try below

|inputlookup append=t ipsmalware2.csv | eval src_ip=Ip|join src_ip type=inner[|search index=cesa_paloalto sourcetype="pan:traffic" type=TRAFFIC vendor_action=allow ]
0 Karma

christianubeda
Path Finder

Hi Vijeta,

I only have matches if the IP from my csv is in the first 10000 lines. Actually I have 9000000 lines in my csv. So It didn`t work.

If I do that |inputlookup append=t ipsmalware2.csv I see al files. The problem is when I try to match them...

0 Karma

Vijeta
Influencer

That is not coz of matching , that is limitation of a subsearch. The subsearch returns you only 10K results and that is why rest all appear as not matched.

0 Karma

Vijeta
Influencer

Are you using lookup in a subsearch, probably that is limiting it to 10K results as there is a max limit for subserach results. Can you use dedup on IP or avoid subsearch by any means? Also would be better if you can paste your query here.

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...