This answer came to me from support: You can not select specific lines only to be published to log analytics as analytics agent will read the entire file configured under log analytics source rule with all its contents and send those to ES as log events. So its totally based on how many log files you are monitoring and how big they are that will decide how much disk space log analytics data will consume.; this can not be controller manually. Now once data is in ES, you may use regex and grok patterns for field extraction, however each raw message lines which analytics agent reads will be published to the events service as mentioned in point 1. Further in order to see only certain data, you can make use of ADQL filters and print only the interested ones. As all the data which was present in your log files is now with ES, those are stored in various shards based on their timestamp and spacing. How much data ES will keep is dependent on your retention period. So if your analytics retention is (say) 90 days( which is per your license units and the retention configuration that you have set in controller), all the data will be stored for at least 90 days and when those indices expire, they will be automatically deleted form backend. However if that entire data is too much as your ES is right now just a single node and doe snot have enough resources to store all this huge amount of data that you are sending, you may choose to delete older data and keep data for lesser duration, like only 30 days or only 10 days or 8 days which is the minimum retention period. This does not mean you delete all the log analytics since 90 days data or keep all, rather its more like you can choose lesser retention for data to be stored in ES and delete data which is old so they don't occupy space unnecessarily. Now Regarding "Having so many resources allocated just for extracting errors from logs does not seem like the right way to me." None of the suggested recommendations was to fetch only ERROR data from logs, as it is clearly mentioned that this can not be done per the product design. The recommendations however were for how in this scenario when we can't control what comes to ES from your log files, can we still manage your data and space nicely so that you get the useful data and discard extra data to have not to worry about using more disk space on this host. Regarding "Alternatively, could you recommend me how to select only errors from the log files?". This is already answered in point 1.
... View more