We have Cisco ESA logs, which takes more time to search for the events using transaction, I created a summary index for the logs. What is the best way to ingest 30 days of logs to summary index? I tried it for the last 24 hrs, using below search, but the search broken after a certain time.
index=cisco_esa | transaction keepevicted=true icid mid |collect index="summary" marker="report=cisco_asa"
Cisco ESA logs don't scale well. Even the default search for Cisco's ESA app starts with the following, which is horrible if you have a large log volume.
* | transaction icid dcid mid | ...
With multiple subsearches and nested queries, you can create a much faster (but complicated) search to put it all together with the
internal_message_id and the other internal IDs.
For a summary index, I would recommend only including the information required for your reports. For any transactions, include criteria like
endswith to decrease the open transactions.
I see a couple long terms solutions:
I don't know why DalJeanis used the autoregress command. I think the streamstats (with current/window) and transaction commands appear better for this.
earliest=-7h@h latest=-1h@h `comment("Use the Cisco ESA TA for supported extractions")` sourcetype="cisco:esa:legacy" AND ( (mid=* AND icid=*) OR (mid=* AND message_size=*)) `comment("Look for MID/ICID mapping and message sizes")` | transaction icid, mid maxspan=1m startswith="Start MID" endswith=" bytes from " | bin _time span=1h | eval message_size=coalesce(message_size, 0) | stats count, min(message_size) AS min_size, mean(message_size) AS avg_size, max(message_size) AS max_size, sum(message_size) AS total_bytes, by _time
And it is even faster if you don't want to include a 0 size for messages without a size.
earliest=-7h@h latest=-1h@h `comment("Use the Cisco ESA TA for supported extractions")` sourcetype="cisco:esa:legacy" AND (mid=* AND message_size=*) `comment("Look for message sizes")` | bin _time span=1h | stats count, min(message_size) AS min_size, mean(message_size) AS avg_size, max(message_size) AS max_size, sum(message_size) AS total_bytes, by _time
Dang, I lost a long response.
Briefly, then: your query would basically duplicate all the information from the ESA logs into the summary index, resulting in no time savings.
You need to pare down what you are trying to collect, to what you actually need. Don't try to boil the ocean, just cook the fishes that you really want.
And.... transaction is great, but 80% of the time it's not needed, and in this case, it was the anchor that was slowing your query down in the first place.
So, list the metrics that you need to report, and determine what kind of event record gets you that. Select ONLY those records, and THEN use
stats if you can, or
transaction if you have to.
This is a code sample, assuming that you were trying to get the count of email messages and the number of bytes sent from the Cisco messages for each unit of time (say, each hour). It's based on the sample data posted farther down the page, that I got from a CISCO web site.
The only complicated thing in this is copying the ICID onto the "bytes" record that is missing it. And, note, if you don't need to report anything about the individual messages, then you really wouldn't have to even do that, just add up all the bytes for each _time increment.
earliest=-7h@h latest=-1h@h index=cisco_esa "MID" ("ICID" OR "bytes") | rex field=_raw "MID (?<myMID>\d+)" | rex field=_raw "ICID (?<myICID>\d+)" | rex field=_raw " (?<myBytes>\d+) bytes" | sort 0 MID _time | autoregress ICID as prevICID p=1 | autoregress MID as prevMID p=1 | eval ICID=coalesce(ICID,if(MID=prevMID,prevICID,null())) | stats min(_time) as startTime, max(_time) as endTime, sum(myBytes) as myBytes by MID ICID | bin startTime as _time span=1h | stats count as hourCount, sum(myBytes) as hourBytes, by _time
Record formats were assumed based on this sample data posted on Cisco URL http://www.cisco.com/c/en/us/support/docs/security/email-security-appliance/118232-technote-esa-00.h...
Mon Apr 17 19:56:22 2003 Info: New SMTP ICID 5 interface Management (10.1.1.1) address 10.1.1.209 reverse dns host remotehost.com verified yes Mon Apr 17 19:57:20 2003 Info: Start MID 6 ICID 5 Mon Apr 17 19:57:20 2003 Info: MID 6 ICID 5 From: <email@example.com> Mon Apr 17 19:58:06 2003 Info: MID 6 ICID 5 RID 0 To: <firstname.lastname@example.org> Mon Apr 17 19:59:52 2003 Info: MID 6 ready 100 bytes from <email@example.com> Mon Apr 17 19:59:59 2003 Info: ICID 5 close
can you elaborate on the problem you are trying to solve?
what are the results you are anticipating? looking at your search, it will still take long as you use transaction and then send the results to summary index.
Yes, Once I send the events to summary after the transaction, I will schedule it every hour to the summary index for new results. I have a requirement to show the metrics for last 60 days and weekly reports.
How are you currently summarizing? Using Splunk's inbuilt acceleration or using collect command.
You can push historical data for Summary indexing using collect command. However, you should run several tests to make sure the metadata for historical data (specially _time is accurate and consistent with existing summary). The collect command has testmode that you can enable during your test runs. Collect command also gives you facility to write to index of your choice so you can create some dummy index to test your historical summary against current summary.