Dear fellow Splunkthusiasts, is there a way to put my own script manipulating the data in between the forwarder and indexer?
To be specific: I have XML logs from SmartMeter/jMeter looking like this:
<?xml version="1.0" encoding="UTF-8"?>
<testResults version="1.2">
<httpSample t="86" it="0" lt="37" ts="1553000000000" s="true" lb="openLoginPage" rc="200" rm="OK #subresults:3" tn="10.20.30.40:1234_TestCases 1-2" dt="text" de="ISO-8859-1" by="3999" sc="1" ec="0" ng="2" na="2" hn="sm-generator2">
<httpSample t="37" it="0" lt="37" ts="1553000000000" s="true" lb="openLoginPage-0" rc="200" rm="OK" tn="10.20.30.40:1234_TestCases 1-2" dt="text" de="ISO-8859-1" by="1578" sc="1" ec="0" ng="2" na="2" hn="sm-generator2">
<responseHeader class="java.lang.String"></responseHeader>
<requestHeader class="java.lang.String"></requestHeader>
<responseData class="java.lang.String"></responseData>
<responseFile class="java.lang.String"></responseFile>
<cookies class="java.lang.String"></cookies>
<method class="java.lang.String">GET</method>
<queryString class="java.lang.String"></queryString>
<java.net.URL>https://some.host/path/</java.net.URL>
</httpSample>
<httpSample t="17" it="0" lt="17" ts="1553000000001" s="true" lb="openLoginPage-1" rc="200" rm="OK" tn="10.20.30.40:1234_TestCases 1-2" dt="text" de="" by="578" sc="1" ec="0" ng="2" na="2" hn="sm-generator2">
<responseHeader class="java.lang.String"></responseHeader>
<requestHeader class="java.lang.String"></requestHeader>
<responseData class="java.lang.String"></responseData>
<responseFile class="java.lang.String"></responseFile>
<cookies class="java.lang.String">some_cookie_name=some_cookie_value</cookies>
<method class="java.lang.String">GET</method>
<queryString class="java.lang.String"></queryString>
<java.net.URL>https://some.host/path/</java.net.URL>
</httpSample>
</httpSample>
...
That is way too verbose for my needs, so I wrote a script transforming the XML to the following:
httpSession sessionId="123" t="86" it="0" lt="37" ts="1553000000000" s="true" lb="openLoginPage" rc="200" rm="OK #subresults:3" tn="10.20.30.40:1234_TestCases 1-2" dt="text" de="ISO-8859-1" by="3999" sc="1" ec="0" ng="2" na="2" hn="sm-generator2"
httpRequest sessionId="123" t="37" it="0" lt="37" ts="1553000000000" s="true" lb="openLoginPage-0" rc="200" rm="OK" tn="10.20.30.40:1234_TestCases 1-2" dt="text" de="ISO-8859-1" by="1578" sc="1" ec="0" ng="2" na="2" hn="sm-generator2" method="GET" url="https://some.host/path/"
httpRequest sessionId="123" t="17" it="0" lt="17" ts="1553000000001" s="true" lb="openLoginPage-1" rc="200" rm="OK" tn="10.20.30.40:1234_TestCases 1-2" dt="text" de="" by="578" sc="1" ec="0" ng="2" na="2" hn="sm-generator2" cookies="some_cookie_name=some_cookie_value" method="GET" url="https://some.host/path/"
Please note the output is enriched by sessionId field holding the relationship of session and requests, which can't be simply done by sed.
I would like to collect the original log in XML format by universal forwarder, have it processed by my script (possibly on HFW?) and finally index the simplified output. Is something like that possible?
Scripted outputs are not exactly what I am looking for as this method would introduce data lags and a need to prevent re-reading the same data (both is solved with monitor:// input method).
Hi @eregon,
You can use the script you made as the input script using scripted inputs
. Whatever your script will output will automatically go straight into Splunk.
It's quite straightforward, all you have to do is add the script to the bin folder of an app and then create the input that goes with it.
You can find out how to apply this in detail here :
https://docs.splunk.com/Documentation/Splunk/7.2.6/AdvancedDev/ScriptSetup
Let me know if that helps and if you need further help.
Cheers,
David
Hi @DavidHourani , thanks for your advice! I did read about scripted inputs and unfortunately it is not what I am searching for (as mentioned at the end of my question). In my specific case I see these disadvantages:
The closest Splunk feature I could find is SEDCMD option in props.conf and it could possibly solve my trouble, if I am able to read a value in parent-level httpSample tag and then insert it into subsequent lines.
It does add latency and isn't as efficient as monitor
, you're right.
There are three points at which you can apply that data cleansing :
1- Before reading the data : The easiest way - run your script on your data and have the results stored in files, then read the data directly from there with your UF.
2- On read : Scripted inputs.
3- After reading, on indexing : SEDCMD could be an option but you will need to write a complex sed command to get the cleansing done. Might have the same impact as the scripted inputs.