<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Complex data manipulation before indexing in Getting Data In</title>
    <link>https://community.splunk.com/t5/Getting-Data-In/Complex-data-manipulation-before-indexing/m-p/438987#M76532</link>
    <description>&lt;P&gt;Hi @DavidHourani , thanks for your advice! I did read about scripted inputs and unfortunately it is not what I am searching for (as mentioned at the end of my question). In my specific case I see these disadvantages:&lt;/P&gt;

&lt;UL&gt;
&lt;LI&gt;it introduces lags (data is ingested once in a period) - compared to monitor:// method reacting virtually immediately&lt;/LI&gt;
&lt;LI&gt;introduces unnecessary load (executing the script periodically even when no perftests are running/no data is produced)&lt;/LI&gt;
&lt;LI&gt;requires additional measures to prevent re-reading the same data: the source simply appends new data to the end of existing log file and my script works in a stream manner - running the current script periodically would read it whole over and over again; I would have to implement some kind of file pointer similar to what monitor:// method already does, or try to tweak SmartMeter's logging behaviour&lt;/LI&gt;
&lt;/UL&gt;</description>
    <pubDate>Mon, 24 Jun 2019 14:22:43 GMT</pubDate>
    <dc:creator>eregon</dc:creator>
    <dc:date>2019-06-24T14:22:43Z</dc:date>
    <item>
      <title>Complex data manipulation before indexing</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Complex-data-manipulation-before-indexing/m-p/438985#M76530</link>
      <description>&lt;P&gt;Dear fellow Splunkthusiasts, is there a way to put my own script manipulating the data in between the forwarder and indexer?&lt;/P&gt;

&lt;P&gt;To be specific: I have XML logs from SmartMeter/jMeter looking like this:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;&amp;lt;?xml version="1.0" encoding="UTF-8"?&amp;gt;
&amp;lt;testResults version="1.2"&amp;gt;
&amp;lt;httpSample t="86" it="0" lt="37" ts="1553000000000" s="true" lb="openLoginPage" rc="200" rm="OK #subresults:3" tn="10.20.30.40:1234_TestCases 1-2" dt="text" de="ISO-8859-1" by="3999" sc="1" ec="0" ng="2" na="2" hn="sm-generator2"&amp;gt;
 &amp;lt;httpSample t="37" it="0" lt="37" ts="1553000000000" s="true" lb="openLoginPage-0" rc="200" rm="OK" tn="10.20.30.40:1234_TestCases 1-2" dt="text" de="ISO-8859-1" by="1578" sc="1" ec="0" ng="2" na="2" hn="sm-generator2"&amp;gt;
    &amp;lt;responseHeader class="java.lang.String"&amp;gt;&amp;lt;/responseHeader&amp;gt;
    &amp;lt;requestHeader class="java.lang.String"&amp;gt;&amp;lt;/requestHeader&amp;gt;
    &amp;lt;responseData class="java.lang.String"&amp;gt;&amp;lt;/responseData&amp;gt;
    &amp;lt;responseFile class="java.lang.String"&amp;gt;&amp;lt;/responseFile&amp;gt;
    &amp;lt;cookies class="java.lang.String"&amp;gt;&amp;lt;/cookies&amp;gt;
    &amp;lt;method class="java.lang.String"&amp;gt;GET&amp;lt;/method&amp;gt;
    &amp;lt;queryString class="java.lang.String"&amp;gt;&amp;lt;/queryString&amp;gt;
    &amp;lt;java.net.URL&amp;gt;https://some.host/path/&amp;lt;/java.net.URL&amp;gt;
  &amp;lt;/httpSample&amp;gt;
  &amp;lt;httpSample t="17" it="0" lt="17" ts="1553000000001" s="true" lb="openLoginPage-1" rc="200" rm="OK" tn="10.20.30.40:1234_TestCases 1-2" dt="text" de="" by="578" sc="1" ec="0" ng="2" na="2" hn="sm-generator2"&amp;gt;
    &amp;lt;responseHeader class="java.lang.String"&amp;gt;&amp;lt;/responseHeader&amp;gt;
    &amp;lt;requestHeader class="java.lang.String"&amp;gt;&amp;lt;/requestHeader&amp;gt;
    &amp;lt;responseData class="java.lang.String"&amp;gt;&amp;lt;/responseData&amp;gt;
    &amp;lt;responseFile class="java.lang.String"&amp;gt;&amp;lt;/responseFile&amp;gt;
    &amp;lt;cookies class="java.lang.String"&amp;gt;some_cookie_name=some_cookie_value&amp;lt;/cookies&amp;gt;
    &amp;lt;method class="java.lang.String"&amp;gt;GET&amp;lt;/method&amp;gt;
    &amp;lt;queryString class="java.lang.String"&amp;gt;&amp;lt;/queryString&amp;gt;
    &amp;lt;java.net.URL&amp;gt;https://some.host/path/&amp;lt;/java.net.URL&amp;gt;
  &amp;lt;/httpSample&amp;gt;
&amp;lt;/httpSample&amp;gt;
...
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;That is way too verbose for my needs, so I wrote a script transforming the XML to the following:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;httpSession sessionId="123" t="86" it="0" lt="37" ts="1553000000000" s="true" lb="openLoginPage" rc="200" rm="OK #subresults:3" tn="10.20.30.40:1234_TestCases 1-2" dt="text" de="ISO-8859-1" by="3999" sc="1" ec="0" ng="2" na="2" hn="sm-generator2"
httpRequest sessionId="123" t="37" it="0" lt="37" ts="1553000000000" s="true" lb="openLoginPage-0" rc="200" rm="OK" tn="10.20.30.40:1234_TestCases 1-2" dt="text" de="ISO-8859-1" by="1578" sc="1" ec="0" ng="2" na="2" hn="sm-generator2" method="GET" url="https://some.host/path/"
httpRequest sessionId="123" t="17" it="0" lt="17" ts="1553000000001" s="true" lb="openLoginPage-1" rc="200" rm="OK" tn="10.20.30.40:1234_TestCases 1-2" dt="text" de="" by="578" sc="1" ec="0" ng="2" na="2" hn="sm-generator2" cookies="some_cookie_name=some_cookie_value" method="GET" url="https://some.host/path/"
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Please note the output is enriched by sessionId field holding the relationship of session and requests, which can't be simply done by sed.&lt;/P&gt;

&lt;P&gt;I would like to collect the original log in XML format by universal forwarder, have it processed by my script (possibly on HFW?) and finally index the simplified output. Is something like that possible?&lt;/P&gt;

&lt;P&gt;Scripted outputs are not exactly what I am looking for as this method would introduce data lags and a need to prevent re-reading the same data (both is solved with monitor:// input method).&lt;/P&gt;</description>
      <pubDate>Mon, 24 Jun 2019 13:08:27 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Complex-data-manipulation-before-indexing/m-p/438985#M76530</guid>
      <dc:creator>eregon</dc:creator>
      <dc:date>2019-06-24T13:08:27Z</dc:date>
    </item>
    <item>
      <title>Re: Complex data manipulation before indexing</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Complex-data-manipulation-before-indexing/m-p/438986#M76531</link>
      <description>&lt;P&gt;Hi @eregon,&lt;/P&gt;

&lt;P&gt;You can use the script you made as the input script using &lt;CODE&gt;scripted inputs&lt;/CODE&gt;. Whatever your script will output will automatically go straight into Splunk.&lt;/P&gt;

&lt;P&gt;It's quite straightforward, all you have to do is add the script to the bin folder of an app and then create the input that goes with it.&lt;/P&gt;

&lt;P&gt;You can find out how to apply this in detail here :&lt;BR /&gt;
&lt;A href="https://docs.splunk.com/Documentation/Splunk/7.2.6/AdvancedDev/ScriptSetup"&gt;https://docs.splunk.com/Documentation/Splunk/7.2.6/AdvancedDev/ScriptSetup&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;Let me know if that helps and if you need further help.&lt;/P&gt;

&lt;P&gt;Cheers,&lt;BR /&gt;
David&lt;/P&gt;</description>
      <pubDate>Mon, 24 Jun 2019 13:55:46 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Complex-data-manipulation-before-indexing/m-p/438986#M76531</guid>
      <dc:creator>DavidHourani</dc:creator>
      <dc:date>2019-06-24T13:55:46Z</dc:date>
    </item>
    <item>
      <title>Re: Complex data manipulation before indexing</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Complex-data-manipulation-before-indexing/m-p/438987#M76532</link>
      <description>&lt;P&gt;Hi @DavidHourani , thanks for your advice! I did read about scripted inputs and unfortunately it is not what I am searching for (as mentioned at the end of my question). In my specific case I see these disadvantages:&lt;/P&gt;

&lt;UL&gt;
&lt;LI&gt;it introduces lags (data is ingested once in a period) - compared to monitor:// method reacting virtually immediately&lt;/LI&gt;
&lt;LI&gt;introduces unnecessary load (executing the script periodically even when no perftests are running/no data is produced)&lt;/LI&gt;
&lt;LI&gt;requires additional measures to prevent re-reading the same data: the source simply appends new data to the end of existing log file and my script works in a stream manner - running the current script periodically would read it whole over and over again; I would have to implement some kind of file pointer similar to what monitor:// method already does, or try to tweak SmartMeter's logging behaviour&lt;/LI&gt;
&lt;/UL&gt;</description>
      <pubDate>Mon, 24 Jun 2019 14:22:43 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Complex-data-manipulation-before-indexing/m-p/438987#M76532</guid>
      <dc:creator>eregon</dc:creator>
      <dc:date>2019-06-24T14:22:43Z</dc:date>
    </item>
    <item>
      <title>Re: Complex data manipulation before indexing</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Complex-data-manipulation-before-indexing/m-p/438988#M76533</link>
      <description>&lt;P&gt;The closest Splunk feature I could find is SEDCMD option in props.conf and it could possibly solve my trouble, if I am able to read a value in parent-level httpSample tag and then insert it into subsequent lines.&lt;/P&gt;</description>
      <pubDate>Mon, 24 Jun 2019 14:26:22 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Complex-data-manipulation-before-indexing/m-p/438988#M76533</guid>
      <dc:creator>eregon</dc:creator>
      <dc:date>2019-06-24T14:26:22Z</dc:date>
    </item>
    <item>
      <title>Re: Complex data manipulation before indexing</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Complex-data-manipulation-before-indexing/m-p/438989#M76534</link>
      <description>&lt;P&gt;It does add latency and isn't as efficient as &lt;CODE&gt;monitor&lt;/CODE&gt;, you're right. &lt;/P&gt;

&lt;P&gt;There are three points at which you can apply that data cleansing : &lt;BR /&gt;
1- Before reading the data : The easiest way - run your script on your data and have the results stored in files, then read the data directly from there with your UF.&lt;BR /&gt;
2- On read : Scripted inputs.&lt;BR /&gt;
3- After reading, on indexing : SEDCMD could be an option but you will need to write a complex sed command to get the cleansing done. Might have the same impact as the scripted inputs.&lt;/P&gt;</description>
      <pubDate>Mon, 24 Jun 2019 14:30:54 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Complex-data-manipulation-before-indexing/m-p/438989#M76534</guid>
      <dc:creator>DavidHourani</dc:creator>
      <dc:date>2019-06-24T14:30:54Z</dc:date>
    </item>
  </channel>
</rss>

