It seems that simply adding a props.conf on the shclsuter tier in conjunction with my limits.conf changes are allowing all fields to be automatically extracted at search time as I expect. I will need to test removing single limits.conf stanza values to see if any of those had an impact as well. [<sourcetype>]
KV_MODE = json But, yes json is very "verbose" logging since it calls out field names and such. This team though is using HEC which in general prefers JSON (if you use the /event endpoint, and we don't want them using /raw and needing extractions there). HEC/JSON has allowed us to give the users some flexibility on choosing how/what they log. These events are actually pre-processed as well and cut down on event count greatly, it comes from another system which ingests metrics from many, many, many sources and then we are using these datasets for Machine Learning. So we don't have much of an option for bringing down event count or size. This as well is the only dataset like this we work with, every other is ~20 fields from similar use cases. But more than anything we needed to show that we could technically do this, while not ideal.
... View more
I have an event ingesting to splunk via HEC which is around 13k characters, and approx. 260 fields within the json of the event. Currently, we do not see all the fields being extracted with auto kv at search time, and I do not want to have these as indexed fields because it would balloon the index size greatly to do so. In some other non-json events that are rather large we have increased the limits.conf/[kv] maxchars value up to 100000 to allow for key value pairs to be extracted as expected by users in larger events. I figured that this new scenario with JSON was similar and I have so far increased within the same stanza; limit = 0 (unlimited), maxcols = 1024, avg_extractor_time = 3000, max_extractor_time = 6000. So far after these updates I am still not seeing all fields extracted. I as well tried using spath on the entire _raw which did not work, so I upped the limtits.conf/[spath] stanza to an extraction_cutoff = 100000. Similarly it did not extract when doing the whole raw. I could call a specific field within " spath path=<field_name>" but I do not want to do that for 50+ fields, especially if more are added or removed at a later date. I have been trying to consider if the issue is occurring on ingestion via HEC, but these are all at search time and is not indexed extractions. Are there any other configurations anyone knows of around auto kv extraction that we should look into testing with an increased limit? For the best user experience I want this to all continue to happen automatically and not call out many fields explicitly in an spath.
... View more
In an attempt to bring in some additional Azure AD data we have begun using the Microsoft Azure Add-on for Splunk, however we are not seeing any results actually come back to Splunk, and not seeing any errors in collection.
When enabling debug logging I can see that we are getting a http status code of 200, but a content length of 'None'
2020-03-20 19:44:33,920 DEBUG pid=62692 tid=MainThread file=connectionpool.py:_make_request:400 | https://graph.microsoft.com:443 "GET /beta/auditLogs/directoryAudits?$orderby=activityDateTime&$filter=activityDateTime+gt+2020-03-13T19%3a29%3a45.102703Z+and+activityDateTime+le+2020-03-20T19%3a22%3a45.501341Z&$skiptoken=f207127ca72cc8e1dca1f7873280c23e_326040 HTTP/1.1" 200 None
And the python which generates the logs to see that the length is the final attribute of that log message in connectionpool.py:_make_request:400
log.debug("%s://%s:%s \"%s %s %s\" %s %s", self.scheme, self.host, self.port,
method, url, http_version, httplib_response.status,
The service continues to run and get 200 responses, and even finding the next link to go to from @odata.nextLink within the JSON returned, even though the debug logging shows no content length.
2020-03-20 19:44:34,136 DEBUG pid=62692 tid=MainThread file=base_modinput.py:log_debug:286 | _Splunk_ nextLink URL (@odata.nextLink): https://graph.microsoft.com/beta/auditLogs/directoryAudits?$orderby=activityDateTime&$filter=activityDateTime+gt+2020-03-13T19%3a29%3a45.102703Z+and+activityDateTime+le+2020-03-20T19%3a22%3a45.501341Z&$skiptoken=a05e0f88b9ed79f47b8955baefdf862c_327085
I have been able to replicate the action being used by the TA within Postman using the Microsoft Graph Collection and environment (https://docs.microsoft.com/en-us/graph/use-postman) where I can see plenty of data returned from each one of the URLs provided and I am using the exact same Azure AD application with the same client id and secret.
I am trying to dig through the actual python but I have not been able to find anything in relation to causing this issue yet.
... View more
I have recently deployed the Splunk OVA for VMWare which acts as the Data Collection Node to use with the Splunk app and Add-ons for VMWare. I have followed the install and configuration instructions from the documentation which was fairly simple, but now when I go to the collection configuration page within the VMWare app I can make a successful connection, but I am left with "Username/Password are good but apps are not there" in Add-on Validation.
Since this was deployed from an OVA, it should have all the apps pre-installed. As well, I have confirmed that I can see all the add-ons on the Data Collection node in /opt/splunk/etc/apps that are listed as needed in the below link. (might not show up because of low karma but is the Installation Overview for the VMWare add-ons.)
Has anyone seen something like this before or have any suggestions on what to try next?
Edit: I recognized that my OVA, indexers and search heads had different versions running of the apps and add-on, so I upgraded all to the latest 3.4.1 package, but still am unfortunately seeing the same error. I had hoped the versioning difference would have solved it not finding the apps, but was not that lucky today.
... View more