Splunk Search

Suggestions for line breaking and field extraction of deeply-nested XML configuration files

Justin_Grant
Contributor

We're stumped how to approach field extraction for XML configuration files for ASP.NET web applications. I want to enable use-cases like:

  • when did this configuration file change last, and what changed?
  • what is the value of this config setting (e.g. an attribute in a particular element) across 20 servers?
  • send me email if any of my servers have more than 20 sub-elements under a particular element in their config files.

For log files in XML (as described in this question) you can define event boundaries and extract fields based on sub-elements or attributes. But XML configuration files typically have deeply nested elements without the same elements being repeated over and over. And it's also not clear where to set "event boundaries" amid deeply nested XML.

For example, here's simlpified XML which might be in a config file:

<configuration>
  <someParent>
    <uri>
      <idn enabled="All" someReallyLongAttributeGoesHere="SomeReallylongAttribute.Value.Goes.Here"
               anotherLineAttribute="goes here"/>
      <iriParsing enabled="true"/>
    </uri>
    ....... more XML goes here
  </someParent>
  <someOtherParent>
    ....... more XML goes here
  </someOtherParent>
<configuration>

Line-breaking. XML elements can span multiple lines and multiple elements can be stuffed into one line. Should we index files as one giant event, and use xpath at search-time? Should we "line-break" on every XML element? (and if so how to search related elements, like child elements within a parent element).

Field Extraction. A few questions:

  • What field names should I use? Should they include the XPath of how to get to the element or attribute?
  • Can fields overlap, so (using the example above) I'd have one field called "uri" with the InnerXML as its value, as well as idn/idn/enabled and idn/iriParsing/enabled fields? Or do I need to choose a granularity ahead of time?
  • Can the above be made automatic

Here's a more realistic configuration file-- not exactly the one I'm planning to use, but close.

<?xml version="1.0"?>
<configuration>
    <configSections>
        <section name="uri" type="System.Configuration.UriSection, System, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089"/>
        <section name="log4net" type="log4net.Config.Log4NetConfigurationSectionHandler" requirePermission="false"/>
        <section name="dotNetOpenAuth" type="DotNetOpenAuth.Configuration.DotNetOpenAuthSection" requirePermission="false" allowLocation="true"/>
        <sectionGroup name="system.web.extensions" type="System.Web.Configuration.SystemWebExtensionsSectionGroup, System.Web.Extensions, Version=3.5.0.0, Culture=neutral, PublicKeyToken=31BF3856AD364E35">
            <sectionGroup name="scripting" type="System.Web.Configuration.ScriptingSectionGroup, System.Web.Extensions, Version=3.5.0.0, Culture=neutral, PublicKeyToken=31BF3856AD364E35">
                <section name="scriptResourceHandler" type="System.Web.Configuration.ScriptingScriptResourceHandlerSection, System.Web.Extensions, Version=3.5.0.0, Culture=neutral, PublicKeyToken=31BF3856AD364E35" requirePermission="false" allowDefinition="MachineToApplication"/>
                <sectionGroup name="webServices" type="System.Web.Configuration.ScriptingWebServicesSectionGroup, System.Web.Extensions, Version=3.5.0.0, Culture=neutral, PublicKeyToken=31BF3856AD364E35">
                    <section name="jsonSerialization" type="System.Web.Configuration.ScriptingJsonSerializationSection, System.Web.Extensions, Version=3.5.0.0, Culture=neutral, PublicKeyToken=31BF3856AD364E35" requirePermission="false" allowDefinition="Everywhere"/>
                    <section name="profileService" type="System.Web.Configuration.ScriptingProfileServiceSection, System.Web.Extensions, Version=3.5.0.0, Culture=neutral, PublicKeyToken=31BF3856AD364E35" requirePermission="false" allowDefinition="MachineToApplication"/>
                    <section name="authenticationService" type="System.Web.Configuration.ScriptingAuthenticationServiceSection, System.Web.Extensions, Version=3.5.0.0, Culture=neutral, PublicKeyToken=31BF3856AD364E35" requirePermission="false" allowDefinition="MachineToApplication"/>
                    <section name="roleService" type="System.Web.Configuration.ScriptingRoleServiceSection, System.Web.Extensions, Version=3.5.0.0, Culture=neutral, PublicKeyToken=31BF3856AD364E35" requirePermission="false" allowDefinition="MachineToApplication"/>
                </sectionGroup>
            </sectionGroup>
        </sectionGroup>
    </configSections>

    <!-- The uri section is necessary to turn on .NET 3.5 support for IDN (international domain names),
         which is necessary for OpenID urls with unicode characters in the domain/host name. 
         It is also required to put the Uri class into RFC 3986 escaping mode, which OpenID and OAuth require. -->
    <uri>
        <idn enabled="All"/>
        <iriParsing enabled="true"/>
    </uri>

    <system.net>
        <defaultProxy enabled="true" />
        <settings>
            <!-- This setting causes .NET to check certificate revocation lists (CRL) 
                 before trusting HTTPS certificates.  But this setting tends to not 
                 be allowed in shared hosting environments. -->
            <!--<servicePointManager checkCertificateRevocationList="true"/>-->
        </settings>
    </system.net>

    <!-- this is an optional configuration section where aspects of DotNetOpenAuth can be customized -->
    <dotNetOpenAuth>
        <openid>
            <provider>
                <security requireSsl="false" />
                <behaviors>
                    <!-- Behaviors activate themselves automatically for individual matching requests. 
                         The first one in this list to match an incoming request "owns" the request.  If no
                         profile matches, the default behavior is assumed. -->
                    <!--<add type="DotNetOpenAuth.OpenId.Behaviors.PpidGeneration, DotNetOpenAuth" />-->
                </behaviors>
                <!-- Uncomment the following to activate the sample custom store.  -->
                <!--<store type="OpenIdProviderWebForms.Code.CustomStore, OpenIdProviderWebForms" />-->
            </provider>
        </openid>
        <messaging>
            <untrustedWebRequest>
                <whitelistHosts>
                    <!-- since this is a sample, and will often be used with localhost -->
                    <add name="localhost"/>
                </whitelistHosts>
            </untrustedWebRequest>
        </messaging>
        <!-- Allow DotNetOpenAuth to publish usage statistics to library authors to improve the library. -->
        <reporting enabled="true" />
    </dotNetOpenAuth>

    <system.web>
        <!-- 
            Set compilation debug="true" to insert debugging 
            symbols into the compiled page. Because this 
            affects performance, set this value to true only 
            during development.
        -->
        <compilation debug="true">
            <assemblies>
                <add assembly="System.Core, Version=3.5.0.0, Culture=neutral, PublicKeyToken=B77A5C561934E089"/>
                <add assembly="System.Web.Extensions, Version=3.5.0.0, Culture=neutral, PublicKeyToken=31BF3856AD364E35"/>
                <add assembly="System.Xml.Linq, Version=3.5.0.0, Culture=neutral, PublicKeyToken=B77A5C561934E089"/>
                <add assembly="System.Data.DataSetExtensions, Version=3.5.0.0, Culture=neutral, PublicKeyToken=B77A5C561934E089"/>
            </assemblies>
        </compilation>
        <sessionState mode="InProc" cookieless="false"/>
        <membership defaultProvider="AspNetReadOnlyXmlMembershipProvider">
            <providers>
                <clear/>
                <add name="AspNetReadOnlyXmlMembershipProvider" type="OpenIdProviderWebForms.Code.ReadOnlyXmlMembershipProvider" description="Read-only XML membership provider" xmlFileName="~/App_Data/Users.xml"/>
            </providers>
        </membership>
        <authentication mode="Forms">
            <!-- named cookie prevents conflicts with other samples -->
            <forms name="OpenIdProviderWebForms"/>
        </authentication>
        <customErrors mode="RemoteOnly"/>
        <!-- Trust level discussion:
        Full: everything works (this is required for Google Apps for Domains support)
        High: TRACE compilation symbol must NOT be defined
        Medium: doesn't work unless originUrl=".*" or WebPermission.Connect is extended, and Google Apps doesn't work.
        Low: doesn't work because WebPermission.Connect is denied.
        -->
        <trust level="Medium" originUrl=".*"/>
        <pages>
            <controls>
                <add tagPrefix="asp" namespace="System.Web.UI" assembly="System.Web.Extensions, Version=3.5.0.0, Culture=neutral, PublicKeyToken=31BF3856AD364E35"/>
                <add tagPrefix="asp" namespace="System.Web.UI.WebControls" assembly="System.Web.Extensions, Version=3.5.0.0, Culture=neutral, PublicKeyToken=31BF3856AD364E35"/>
            </controls>
        </pages>
        <httpHandlers>
            <remove verb="*" path="*.asmx"/>
            <add verb="*" path="*.asmx" validate="false" type="System.Web.Script.Services.ScriptHandlerFactory, System.Web.Extensions, Version=3.5.0.0, Culture=neutral, PublicKeyToken=31BF3856AD364E35"/>
            <add verb="*" path="*_AppService.axd" validate="false" type="System.Web.Script.Services.ScriptHandlerFactory, System.Web.Extensions, Version=3.5.0.0, Culture=neutral, PublicKeyToken=31BF3856AD364E35"/>
            <add verb="GET,HEAD" path="ScriptResource.axd" validate="false" type="System.Web.Handlers.ScriptResourceHandler, System.Web.Extensions, Version=3.5.0.0, Culture=neutral, PublicKeyToken=31BF3856AD364E35"/>
        </httpHandlers>
        <httpModules>
            <add name="ScriptModule" type="System.Web.Handlers.ScriptModule, System.Web.Extensions, Version=3.5.0.0, Culture=neutral, PublicKeyToken=31BF3856AD364E35"/>
        </httpModules>
    </system.web>
    <location path="decide.aspx">
        <system.web>
            <authorization>
                <deny users="?"/>
            </authorization>
        </system.web>
    </location>
    <!-- log4net is a 3rd party (free) logger library that DotNetOpenAuth will use if present but does not require. -->
    <log4net>
        <appender name="RollingFileAppender" type="log4net.Appender.RollingFileAppender">
            <file value="Provider.log"/>
            <appendToFile value="true"/>
            <rollingStyle value="Size"/>
            <maxSizeRollBackups value="10"/>
            <maximumFileSize value="100KB"/>
            <staticLogFileName value="true"/>
            <layout type="log4net.Layout.PatternLayout">
                <conversionPattern value="%date (GMT%date{%z}) [%thread] %-5level %logger - %message%newline"/>
            </layout>
        </appender>
        <appender name="TracePageAppender" type="OpenIdProviderWebForms.Code.TracePageAppender, OpenIdProviderWebForms">
            <layout type="log4net.Layout.PatternLayout">
                <conversionPattern value="%date (GMT%date{%z}) [%thread] %-5level %logger - %message%newline"/>
            </layout>
        </appender>
        <!-- Setup the root category, add the appenders and set the default level -->
        <root>
            <level value="INFO" />
            <!--<appender-ref ref="RollingFileAppender" />-->
            <appender-ref ref="TracePageAppender" />
        </root>
        <!-- Specify the level for some specific categories -->
        <logger name="DotNetOpenAuth">
            <level value="INFO" />
        </logger>
    </log4net>
    <system.codedom>
        <compilers>
            <compiler language="c#;cs;csharp" extension=".cs" type="Microsoft.CSharp.CSharpCodeProvider,System, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089" warningLevel="4">
                <providerOption name="CompilerVersion" value="v3.5"/>
                <providerOption name="WarnAsError" value="false"/>
            </compiler>
            <compiler language="vb;vbs;visualbasic;vbscript" extension=".vb" type="Microsoft.VisualBasic.VBCodeProvider, System, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089" warningLevel="4">
                <providerOption name="CompilerVersion" value="v3.5"/>
                <providerOption name="OptionInfer" value="true"/>
                <providerOption name="WarnAsError" value="false"/>
            </compiler>
        </compilers>
    </system.codedom>
    <system.webServer>
        <validation validateIntegratedModeConfiguration="false"/>
        <modules>
            <remove name="ScriptModule"/>
            <add name="ScriptModule" preCondition="managedHandler" type="System.Web.Handlers.ScriptModule, System.Web.Extensions, Version=3.5.0.0, Culture=neutral, PublicKeyToken=31BF3856AD364E35"/>
        </modules>
        <handlers>
            <remove name="WebServiceHandlerFactory-Integrated"/>
            <remove name="ScriptHandlerFactory"/>
            <remove name="ScriptHandlerFactoryAppServices"/>
            <remove name="ScriptResource"/>
            <add name="ScriptHandlerFactory" verb="*" path="*.asmx" preCondition="integratedMode" type="System.Web.Script.Services.ScriptHandlerFactory, System.Web.Extensions, Version=3.5.0.0, Culture=neutral, PublicKeyToken=31BF3856AD364E35"/>
            <add name="ScriptHandlerFactoryAppServices" verb="*" path="*_AppService.axd" preCondition="integratedMode" type="System.Web.Script.Services.ScriptHandlerFactory, System.Web.Extensions, Version=3.5.0.0, Culture=neutral, PublicKeyToken=31BF3856AD364E35"/>
            <add name="ScriptResource" verb="GET,HEAD" path="ScriptResource.axd" preCondition="integratedMode" type="System.Web.Handlers.ScriptResourceHandler, System.Web.Extensions, Version=3.5.0.0, Culture=neutral, PublicKeyToken=31BF3856AD364E35"/>
        </handlers>
    </system.webServer>
    <runtime>
        <legacyHMACWarning enabled="0" />
        <assemblyBinding xmlns="urn:schemas-microsoft-com:asm.v1">
            <dependentAssembly>
                <assemblyIdentity name="System.Web.Extensions" publicKeyToken="31bf3856ad364e35"/>
                <bindingRedirect oldVersion="1.0.0.0-1.1.0.0" newVersion="3.5.0.0"/>
            </dependentAssembly>
            <dependentAssembly>
                <assemblyIdentity name="System.Web.Extensions.Design" publicKeyToken="31bf3856ad364e35"/>
                <bindingRedirect oldVersion="1.0.0.0-1.1.0.0" newVersion="3.5.0.0"/>
            </dependentAssembly>
        </assemblyBinding>
    </runtime>
</configuration>
1 Solution

sideview
SplunkTrust
SplunkTrust

I believe that its possible to set up search time extractions, and even to set up the multivalue tokenization correctly to get multivalue fields. but the problem is that often the same node or attribute can appear in the different places and mean different things, so this kind of flat tokenization can come up short.

Therefore, the best thing is probably just index the files in their entirety as single events and use the xpath command at search time.

I believe the xpath command is new in 4.1 and it was added to provide more useful and more fine-grained searching and reporting over xml files.

First Example: say you want to find all the files that have a <section> node with name="jsonSerialization"

sourcetype="xml_file" | xpath "//section/@name=" outfield=sectionName | search sectionName="jsonSerialization"

(Note that in this example the sectionName field generated by the xpath command will be a multivalue field. )

Second Example: The more fine grained and complex the matching you want, the more you may have to break out the xpath book. For example if you want to find all the xml files where there's a tag that has name="jsonSerialization" and that also has requirePermission="false":

sourcetype="xml_file" | xpath "//section[@name=\"jsonSerialization\"]/@requirePermission" outfield=requirePermission | search requiredPermission="false"

View solution in original post

sideview
SplunkTrust
SplunkTrust

I believe that its possible to set up search time extractions, and even to set up the multivalue tokenization correctly to get multivalue fields. but the problem is that often the same node or attribute can appear in the different places and mean different things, so this kind of flat tokenization can come up short.

Therefore, the best thing is probably just index the files in their entirety as single events and use the xpath command at search time.

I believe the xpath command is new in 4.1 and it was added to provide more useful and more fine-grained searching and reporting over xml files.

First Example: say you want to find all the files that have a <section> node with name="jsonSerialization"

sourcetype="xml_file" | xpath "//section/@name=" outfield=sectionName | search sectionName="jsonSerialization"

(Note that in this example the sectionName field generated by the xpath command will be a multivalue field. )

Second Example: The more fine grained and complex the matching you want, the more you may have to break out the xpath book. For example if you want to find all the xml files where there's a tag that has name="jsonSerialization" and that also has requirePermission="false":

sourcetype="xml_file" | xpath "//section[@name=\"jsonSerialization\"]/@requirePermission" outfield=requirePermission | search requiredPermission="false"

Justin_Grant
Contributor

great answer! thanks!

0 Karma
Get Updates on the Splunk Community!

Developer Spotlight with Paul Stout

Welcome to our very first developer spotlight release series where we'll feature some awesome Splunk ...

State of Splunk Careers 2024: Maximizing Career Outcomes and the Continued Value of ...

For the past four years, Splunk has partnered with Enterprise Strategy Group to conduct a survey that gauges ...

Data-Driven Success: Splunk & Financial Services

Splunk streamlines the process of extracting insights from large volumes of data. In this fast-paced world, ...