I can see in the savedsearch [http session id lookup] from Dell SonicWall Analytics App 1867 where the executed query looks every 60 minutes for the sessionID's that have bstrong texteen logged and matches them against the existing ones already present in the lookup file sonicwallhttpsession_id.csv, appends the new one and stores everything again.
index=sonicwall tid=257 app_name="General HTTPS" OR app_name="General HTTP" OR app_name="HTTP" | inputlookup sonicwall_http_session_id.csv append=t | dedup session_id | fields session_id, src_ip, dest_ip, app_name | fields - _* | outputlookup sonicwall_http_session_id.csv
This ended up in having a very large lookup file with currently more than 4GB size and over 70 million records in it. Running the above search takes more than 15 minutes currently and the memory consumption is significant during this time.
Anyone facing same? Or is there a recommendation around to avoid having this file growing without any limitation?
Yeah, this is a problem I've been putting off investigating until today. Are we the only two people using dsa?
I just setup a cronjob to delete the file everyday lest it grow to big and cause the distributed bundle replication manager to start choking.
Many thanks for the answer. The maintenance of files within a splunk app doesn't seem the right thing to do since you will have to deal with documentation, handover and other operational questions to let everyone know..
I have tried to handle same within the dsa app and wrote two little queries using inputlookup and outputlookup, saved them as reports and configured a monthly schedule.
The first query takes the content of the sonicwallhttpsessionid.csv file and outputs the same into sonicwallhttpsessionid.bak.csv
| inputlookup sonicwall_http_session_id.csv | outputlookup sonicwall_http_session_id.bak.csv
The second query then reads the file sonicwallhttpsession_id.csv, keeps the first 15'000'000 entries (about what we have for a month) and writes the content into same file again.
| inputlookup sonicwall_http_session_id.csv | head 15000000 | outputlookup sonicwall_http_session_id.csv
With this setup, we retain the information of this file for 2 months in the system. Optionally, you can replace the .bak. from first query with todays date and keep the rotated files on the filesystem as long as you like.
Both reports are scheduled to run once a month where second is setup to run 15 minutes after the first one.
This setup has done fine for us so far. Any better idea to deal with this issue is most welcome. Currently we do above rotation with the *session_id.csv only because it's the fastest growing. But a more 'standardized' way of doing it would be more favorable.
Yeah, this is a problem I've been putting off addressing until today.
I had setup a cronjob to delete the file everyday lest it grow to big and causes the distributed bundle replication manager to start choking.
Are we the only two people using dsa? I don't see how this app is workable with this csv file.
*sorry I "answered" but should have commented.