Splunk Search

Search progress bar disappeared

kdavis
Engager

I am searching through postfix email logs and trying to put all the revevent logs together for each email. I am also setting up the search in a view so that our email admin can just type in the search string and find an email.

The first search I came up with is as follows. This search worked well but was very slow for search of 24 hours or more (we log about 500,000 emails a day).

  <row>
    <chart>
      <title>Number of Messages over Time</title>
      <searchTemplate>sourcetype=postfix_syslog | transaction keepevicted=true  message_pid | search to=*$Username$* | timechart count by host</searchTemplate>
      <option name="charting.chart">column</option>
      <option name="charting.primaryAxisTitle.text">Timeline</option>
      <option name="charting.secondaryAxisTitle.text">Messages</option>
      <option name="charting.legend.placement">right</option>
    </chart>
  </row>


  <row>
    <event>
      <title>Message Logs</title>
      <searchTemplate>sourcetype=postfix_syslog | transaction keepevicted=true  message_pid | search to=*$Username$* OR orig_to=*$Username$*</searchTemplate>
      <option name="count">20</option>
      <option name="showPager">true</option>
    </event>
  </row>

I then changed the search to the following and it worked a lot faster but now does not display a progress bar. This is causing our email admins to keep clicking thinking it has locked up.

  <row>
    <chart>
      <title>Number of Messages over Time</title>
      <searchTemplate>sourcetype=postfix_syslog [ search sourcetype=postfix_syslog *$Username$* | dedup message_pid | fields message_pid ] | transaction keepevicted=true  fields=message_pid maxspan=3m maxpause=1m | timechart count by host</searchTemplate>
      <option name="charting.chart">column</option>
      <option name="charting.primaryAxisTitle.text">Timeline</option>
      <option name="charting.secondaryAxisTitle.text">Messages</option>
      <option name="charting.legend.placement">right</option>
    </chart>
  </row>

  <row>
    <event>
      <title>Message Logs</title>
      <searchTemplate>sourcetype=postfix_syslog [ search sourcetype=postfix_syslog *$Username$* | dedup message_pid | fields message_pid ] | transaction keepevicted=true  fields=message_pid maxspan=3m maxpause=1m</searchTemplate>
      <option name="count">20</option>
      <option name="showPager">true</option>
    </event>
  </row>

How do I get a progress bar back for the last search and why did I loose it?
Anyone else working on postfix email logs?

---- Kirk

Tags (2)

sideview
SplunkTrust
SplunkTrust

The progress bar went away because it only shows progress for the main search pipeline.

In the rewritten version it's the subsearch that is doing most of the work and the outer search is comparatively zippy so the JobProgressIndicator only appears at the end for a very brief time.

You can probably confirm this by running them separately in the charting view. ie run

sourcetype=postfix_syslog *$Username$*

vs

sourcetype=postfix_syslog (message_pid=<pidA> OR message_pid=<pidB> OR message_pid=<pidC> ...) | transaction keepevicted=true

(Its the prefix+postfix search on Username that makes it expensive, because it has to get all of the events off of disk and then scan them in memory. )

I dont think there's any way to get the main job to reflect the progress of the subsearch job, and that JobProgressIndicator definitely only responds to the main job.

One quite different solution you might try:

a) extract the username field if it isn't already.

b) create a summary index search that runs every 10mins or so that maps usernames to pids.

sourcetype=postfix_syslog | stats count by username, message_pid

c) then you can search for this

sourcetype=postfix_syslog [ index=summary username="*$Username$*" | dedup pid ] | transaction keepevicted=true fields=message_pid maxspan=3m maxpause=1m | timechart count by host

Of course, removing the asterisks around Username will probably make this problem go away as well...

0 Karma

sideview
SplunkTrust
SplunkTrust

Can you tell us more about why you're using transaction at all? Do the message_pid values repeat a lot? Seems like "sourcetype=postfix_syslog | dedup message_pid | timechart count by host" or just "sourcetype=postfix_syslog | timechart dc(message_pid) by host" might work and they'd be a lot simpler...

0 Karma
Get Updates on the Splunk Community!

Automatic Discovery Part 1: What is Automatic Discovery in Splunk Observability Cloud ...

If you’ve ever deployed a new database cluster, spun up a caching layer, or added a load balancer, you know it ...

Real-Time Fraud Detection: How Splunk Dashboards Protect Financial Institutions

Financial fraud isn't slowing down. If anything, it's getting more sophisticated. Account takeovers, credit ...

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...

 Are you tired of troubleshooting delays caused by siloed frontend, application, and network data? We've got a ...