I have the ability to configure a search head but not the indexers. I am wondering if I can break multi-line netstat events into multiple events on the search head using props.conf. I realize that I can use multikv to break things during the search, but I also need field extraction happening after line breaking and do it in the search creates really long searches.
How are you using your data? Are the results of those long searches going to be used in dashboards?
If yes, you are better off not extracting those fields during search time, but pre-fetching them into a summary index with a scheduled search and then building the dashboards off that summary index instead.
In addition to being much more efficient overall (you extract the new fields just once and reuse them as many times as necessary), this way does not add to your license usage - you are paying for the original data, but not for the transformed by "collect" or summary searches. The downside will be your inability to run the dashboards in real time - the last data point will be from the most recent run of that scheduled summary search.
An example of how I would use this netstat data would be to look for a new LISTENING port open on a host..... so I might do something like (in rough psuedocode) search earliest=-2w latest=now netstat.data| stats first(event) by port,host | where first > -1d
Thank you for your suggestion. That was my initial approach to the problem. Then when an admin installed the Splunk_TA_nix and splunk_app_for_nix, other events like ps started getting broken up, so I figured that I could reverse engineer what it was doing and apply it to netstat. No luck.
The only concern with summary indexes is the additional space being used.
If your concern is the additional space needed for summary indexes, you can cap the original, uprocessed data at a very low size and have the bulk of the data in the summary. We came to this eventually. We even have two types of dashboards - "static" and "live". The "live" ones offer a much simpler data processing, work off the raw data and have the real-time default timers -
rt, for example. The "static" dashboards feed off the summary data, run historic searches and have more bells and whistles - at the expense of not being able to track data as they come in.
Your method of usage seems to be tolerant to the fact that the data might not be available immediately but will be collected by a summary search at some near point in the future - unless the user of the dashboard wants to see the new listening ports as soon as they are opened.
Update to my own post: I am able to get close by using kv_mode=multi in my props.conf. However I find that although this works for 99.8% of the hosts, there are some hosts that seem to not get parsed by multikv.
I am able to duplicate this by removing the kv_mode line from props.conf and running the netstat output manually through multikv. It works for almost all the hosts but only works for 100% if I run it with "noheader=true"
It's driving me nuts because the output of the netstat that parses and the netstat that doesn't parse looks exactly the same. I am wishing that I could get a hex dump of _raw so I could look for differences, but I don't know of a way to do so.
No, this is not possible. Talk to the guys that can configure the indexers and let them do the changes for you on the indexer.
If this is not possible for what ever reason, create another scripted input that returns events in the format you need them and run this scripted input on the search head.
Hope this helps ...