- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Have I run out of space with my lookup? what should I do?

I am running into trouble while trying to accumulate data into a csv. Things ran great for a long time, but now records are no longer accumulating properly.
I am trying to accumulate data on external IP addresses scanning my company's web sites. The query ran great for months, but suddenly a bunch of data disappeared from the CSV where I had been storing it. I tried rebuilding the data, but I can't make the total amount any larger.
Total query size is not large compared to other queries that I have run, but the CSV is up to 50K records, my largest CSV so far.
Is there some limit to what I can do with the CSV?
This is the query that I am running (with a few identifying markers removed)
index=web_servers (file=*.cfm OR file=*.jsp)
NOT [|inputlookup AuthorizedScanners.csv | fields + ip_address |rename ip_address as clientip]
| bin _time span=10s
| stats earliest(_time) as firstseen by clientip
| eval reason="request for unreasonable web resources"
| eval days=strftime(firstseen,"%F") | rename clientip as src_ip
| append [|inputlookup web_scanning_found.csv | makemv delim=" " days |mvexpand days]
| eval lastseen=coalesce(lastseen,firstseen)
| sort 0 +src_ip, firstseen, lastseen
| stats values(reason) as reason min(firstseen) as firstseen max(lastseen) as lastseen dc(days) as days_seen values(days) as days by src_ip
| eval first_seen=strftime(firstseen,"%F %T") | eval last_seen=strftime(lastseen, "%F %T")
| table src_ip reason, firstseen, lastseen, first_seen, last_seen, days_seen, days | outputlookup web_scanning_found.csv
The query gives me web_scanning_found.csv, which contains
- IP address of the scanner
- A reason for identifying the scanner
- timestamp when the scanner first showed up
- timestamp when the scanner was last seen
- formatted dates for the above two
- number of days that the scanner has been seen in our environment
- mv field of days that the scanner has been seen in our environment
running this query with dates of 7/1-7/15 gives me results with 49667 records
if I then run it for 7/16-7/26, the results drop to 48618 records
I can rerun 7/1-7/15 and get 49667 records again.
since I am just appending and doing stats by src_ip, I should always see at least as many src_ip total events should never drop, so I must be exceeding some limit.
What I have tried
- early on I ran into a similar problem because of the 10K limit on sort, so I added 0 to remove that limit
- To figure out if I was limited in the size of a CSV, I tried removing the final outputlookup command, but still lost records, so the limit is not in what the csv can hold
Anyone have an idea where I am going wrong? What else can I try?
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

I was just reading up on the table
command this week, and with that suspicious number 50K, it might be your culprit. Check this first.
FROM THIS PAGE - http://docs.splunk.com/Documentation/SplunkCloud/6.6.0/SearchReference/Table
Truncated results
The table
command truncates the number of results returned based on settings in the limits.conf
file. In the [search] stanza, if the value for the truncate_report
parameter is 1
, the number of results returned is truncated.
The number of results is controlled by the max_count
parameter in the [search] stanza. If truncate_report
is set to 0
, the max_count
parameter is not applied.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Worth checking. I do not have access to the server to check config files, but I can try running a query that writes a table with more than 50k results
|inputlookup web_scanning_found.csv | append [|inputlookup web_scanning_found.csv] | table
doing that gives me a table with 100,023 results.
So that is a little wierd...
web_scanning_found.csv has 50,023 records in it. appending it to itself should have resulted in 100,046 records, but it looks like only 50K were added
append doc shows me that 50k is the default maxout. So I'll see what changing that does.
http://docs.splunk.com/Documentation/SplunkCloud/6.6.1/SearchReference/Append
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

looks like append cannot exceed max_count from the limits.conf file.
https://answers.splunk.com/answers/30678/append-and-max-results-50000.html
Guess I could try changing the order of my query to include the lookup first and append the smaller query result set
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

53k results!
So that must be at least part of it. Cannot fully explain since the CSV contained 50,023 unique IP addresses, so even if append only added 50K, the final result should have been at least 50K (since stats are by IP address) whereas my original query resulted in 49213
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Definitely something else going on. I ran the query for several periods. As previously noted, I would expect the total results to climb or stay still with each run
range total
6/16-6/30: 59079
6/1-6/15: 63155
6/16-6/30: 63155 -makes sense since I already ran this range
5/16-5/31: 66724
5/1-5/25: 71464
6/1-6/15: 71882
I already ran this last range, so the numbers should not have grown! This suggests that despite some of the increase from the last couple of runs the query must have also lost some records.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

@MonkeyK, have a look at collect
command to write to an index instead of csv. http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Collect
| makeresults | eval message= "Happy Splunking!!!"
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Hmm... A couple of thoughts on the summary index:
1) My goal in accumulating data is to support taking action on the IP addresses found (blocking them temporarily). Even if I send them to an index, I will still need to track the blocked state for each IP address, which I think, would require another lookup.
2) I have was thinking in terms of what happens by IP address.
Records sent to an index would be immutable, no? So rather than a single record per scanning IP address, I would have a record every time that IP address showed up in a search. This is especially relevant if I plan to run the query frequently.
I suppose that I could just add a newer current state, so that might not be so bad... but then my query would have to find the latest event for each IP address (or maybe the latest event for each day) in order to track trends. I'd have to think about this one some more to figure out how I would track scanning in an immutable dataset.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Oh. reading up on summary indices, maybe I am mistaken. I think that I am reading that they summarize a whole result set...
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

See if this alleviates the problem little bit.
index=web_servers (file=*.cfm OR file=*.jsp)
NOT [|inputlookup AuthorizedScanners.csv | fields + ip_address |rename ip_address as clientip]
| stats max(_time) as firstseen by clientip
| eval reason="request for unreasonable web resources"
| eval days=strftime(firstseen,"%F") | rename clientip as src_ip
| append [|inputlookup web_scanning_found.csv | makemv delim=" " days |mvexpand days]
| eval lastseen=coalesce(lastseen,firstseen)
| stats values(reason) as reason min(firstseen) as firstseen max(lastseen) as lastseen dc(days) as days_seen values(days) as days by src_ip
| eval first_seen=strftime(firstseen,"%F %T") | eval last_seen=strftime(lastseen, "%F %T")
| table src_ip reason, firstseen, lastseen, first_seen, last_seen, days_seen, days
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

remove bin, sort, and outputlookup lines? OK I just tried that. The number of records still decreases.
I can make it more clear as:
|inputlookup web_scanning_found.csv
shows 50,023 records
but if I run the above query for 7/16-7/31 I get 49,213
