<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Are there performance benefits to placing the timestamp at the start of input event data? in Getting Data In</title>
    <link>https://community.splunk.com/t5/Getting-Data-In/Are-there-performance-benefits-to-placing-the-timestamp-at-the/m-p/458579#M79252</link>
    <description>&lt;P&gt;That's a good approach.  The docs team is great about chasing down answers to questions raised by the docs.&lt;/P&gt;</description>
    <pubDate>Wed, 07 Nov 2018 12:25:10 GMT</pubDate>
    <dc:creator>richgalloway</dc:creator>
    <dc:date>2018-11-07T12:25:10Z</dc:date>
    <item>
      <title>Are there performance benefits to placing the timestamp at the start of input event data?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Are-there-performance-benefits-to-placing-the-timestamp-at-the/m-p/458576#M79249</link>
      <description>&lt;H2&gt;Background&lt;/H2&gt;

&lt;P&gt;I forward data to Splunk in JSON Lines format with the event timestamp as the first field of each line:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;{"time":"2018-11-02T23:59:30.123456Z","type":"xyz", ...
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Here is the corresponding &lt;CODE&gt;props.conf&lt;/CODE&gt; stanza:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[myapp]
SHOULD_LINEMERGE = false
KV_MODE = json
TIME_PREFIX = {\"time\":\"
# Time stamp:
# - ISO 8601 extended format
# - Seconds to a maximum precision of 6 decimal places
# - With zone designator
TIME_FORMAT = %Y-%m-%dT%H:%M:%S.%6N%:z
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;This works.&lt;/P&gt;

&lt;P&gt;Recently, a colleague who is designing the JSON Lines output for a new project, where the data will also be forwarded to Splunk, queried two aspects of what I have just described:&lt;/P&gt;

&lt;UL&gt;
&lt;LI&gt;The position of the timestamp at the start of each line. The colleague proposed outputting fields in alphabetical order; &lt;CODE&gt;"time"&lt;/CODE&gt; would be towards the end of each line.&lt;/LI&gt;
&lt;LI&gt;The lack of a "start of line" anchor (^) at the start of the &lt;CODE&gt;TIME_PREFIX&lt;/CODE&gt; value.&lt;/LI&gt;
&lt;/UL&gt;

&lt;P&gt;These queries from my colleague prompted me to revisit the corresponding settings in Splunk and to ask the following questions here...&lt;/P&gt;

&lt;P&gt;(Note: I am &lt;EM&gt;not&lt;/EM&gt; talking about the order in which JSON parsers process properties. I don't believe that issue is relevant here, in the context of Splunk timestamp recognition.)&lt;/P&gt;

&lt;H2&gt;Questions&lt;/H2&gt;

&lt;H3&gt;Are there performance benefits to placing the timestamp at the start of input event data?&lt;/H3&gt;

&lt;P&gt;(As opposed to placing &lt;CODE&gt;"time"&lt;/CODE&gt; later in each line.)&lt;/P&gt;

&lt;P&gt;I thought the answer was "yes", but, after carefully re-reading the related Splunk docs, I'm no longer sure.&lt;/P&gt;

&lt;P&gt;I had previously thought that Splunk scanned each line from left to right for the first match for the &lt;CODE&gt;TIME_PREFIX&lt;/CODE&gt; regex.&lt;/P&gt;

&lt;P&gt;However, based on what my colleague tells me about regex processing in environments outside of Splunk, I suspect I've been naive about that strict "left to right" assumption. Which leads to my next question...&lt;/P&gt;

&lt;H3&gt;Would adding a start of line anchor (^) to my regex improve performance?&lt;/H3&gt;

&lt;P&gt;Like this:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;TIME_PREFIX = ^{\"time\":\"
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;I'm asking because I previously thought—perhaps naively—that this anchor would be redundant, because Splunk searched the input line from left to right anyway.&lt;/P&gt;

&lt;H3&gt;Would setting &lt;CODE&gt;MAX_TIMESTAMP_LOOKAHEAD&lt;/CODE&gt; offer any performance benefits?&lt;/H3&gt;

&lt;P&gt;If so, how, exactly? (To "abort" reading malformed/garbled input lines sooner rather than later?)&lt;/P&gt;

&lt;P&gt;The default value of 128 exceeds the longest possible time stamp value; I could reduce it to match that longest possible value.&lt;/P&gt;

&lt;H3&gt;What are the optimal &lt;CODE&gt;props.conf&lt;/CODE&gt; settings for timestamp recognition in this case?&lt;/H3&gt;

&lt;P&gt;This is really just a "catch-all" in case I missed any issues in my earlier, more specific questions.&lt;/P&gt;

&lt;P&gt;For example, given that, in this case, the timestamp is only a few characters into the line, would it be more performant to &lt;EM&gt;not&lt;/EM&gt; specify &lt;CODE&gt;TIME_PREFIX&lt;/CODE&gt;, and instead let Splunk scan through those first few characters without any &lt;CODE&gt;TIME_PREFIX&lt;/CODE&gt;-related regex processing? (And also specify &lt;CODE&gt;MAX_TIMESTAMP_LOOKAHEAD&lt;/CODE&gt;.)&lt;/P&gt;</description>
      <pubDate>Fri, 02 Nov 2018 06:30:03 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Are-there-performance-benefits-to-placing-the-timestamp-at-the/m-p/458576#M79249</guid>
      <dc:creator>Graham_Hanningt</dc:creator>
      <dc:date>2018-11-02T06:30:03Z</dc:date>
    </item>
    <item>
      <title>Re: Are there performance benefits to placing the timestamp at the start of input event data?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Are-there-performance-benefits-to-placing-the-timestamp-at-the/m-p/458577#M79250</link>
      <description>&lt;P&gt;Last question first, one should always specify &lt;CODE&gt;TIME_PREFIX&lt;/CODE&gt; and &lt;CODE&gt;TIME_FORMAT&lt;/CODE&gt;.  This keeps Splunk from guessing about your data and is slightly more performant.&lt;BR /&gt;
Use of the &lt;CODE&gt;^&lt;/CODE&gt; character does not improve performance, AFAIK.  I tend to use it only if the timestamp is the first character of a line.  I would suggest removing &lt;CODE&gt;{&lt;/CODE&gt; from your &lt;CODE&gt;TIME_PREFIX&lt;/CODE&gt; setting just in case the  timestamp is not the first field.&lt;BR /&gt;
I have no information to prove putting the timestamp at the beginning of a line performs better, just a hunch that it does.&lt;/P&gt;</description>
      <pubDate>Fri, 02 Nov 2018 11:19:30 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Are-there-performance-benefits-to-placing-the-timestamp-at-the/m-p/458577#M79250</guid>
      <dc:creator>richgalloway</dc:creator>
      <dc:date>2018-11-02T11:19:30Z</dc:date>
    </item>
    <item>
      <title>Re: Are there performance benefits to placing the timestamp at the start of input event data?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Are-there-performance-benefits-to-placing-the-timestamp-at-the/m-p/458578#M79251</link>
      <description>&lt;BLOCKQUOTE&gt;
&lt;P&gt;I have no information to prove putting the timestamp at the beginning of a line performs better, just a hunch that it does.&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;

&lt;P&gt;I appreciate your answer (thanks!), but I will admit I was hoping for more than a hunch. I have that same hunch.&lt;/P&gt;

&lt;P&gt;I'm hoping that the Splunk devs will step in and answer. They have "inside information"—they know the code path—whereas I must rely on information I can gather externally, performing tests and measuring the results. I don't really have the time to do that properly, but it's looking like I'll need to make time if I want a fact-based answer.&lt;/P&gt;

&lt;P&gt;I've submitted feedback on the Splunk docs topic "&lt;A href="https://docs.splunk.com/Documentation/Splunk/latest/Data/Tunetimestampextractionforbetterindexingperformance"&gt;Tune timestamp recognition for better indexing performance&lt;/A&gt;", which you'd think might answer these questions, but doesn't. From that topic:&lt;/P&gt;

&lt;BLOCKQUOTE&gt;
&lt;P&gt;To speed up indexing, you can use props.conf to adjust how far ahead into events the Splunk timestamp processor looks&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;

&lt;P&gt;The topic goes on to mention &lt;CODE&gt;MAX_TIMESTAMP_LOOKAHEAD&lt;/CODE&gt;, but not &lt;CODE&gt;TIME_PREFIX&lt;/CODE&gt;.&lt;/P&gt;

&lt;P&gt;Perhaps my feedback on that topic might prompt the Splunk devs or writers to address this question.&lt;/P&gt;</description>
      <pubDate>Wed, 07 Nov 2018 03:37:37 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Are-there-performance-benefits-to-placing-the-timestamp-at-the/m-p/458578#M79251</guid>
      <dc:creator>Graham_Hanningt</dc:creator>
      <dc:date>2018-11-07T03:37:37Z</dc:date>
    </item>
    <item>
      <title>Re: Are there performance benefits to placing the timestamp at the start of input event data?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Are-there-performance-benefits-to-placing-the-timestamp-at-the/m-p/458579#M79252</link>
      <description>&lt;P&gt;That's a good approach.  The docs team is great about chasing down answers to questions raised by the docs.&lt;/P&gt;</description>
      <pubDate>Wed, 07 Nov 2018 12:25:10 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Are-there-performance-benefits-to-placing-the-timestamp-at-the/m-p/458579#M79252</guid>
      <dc:creator>richgalloway</dc:creator>
      <dc:date>2018-11-07T12:25:10Z</dc:date>
    </item>
  </channel>
</rss>

