Splunk Search

Monitor CPU, RAM, DISK, INBOUND and OUTBOUND NETWORK TRAFIC of forwarders

israbenbr
Explorer

Hello,

I am posting here to know if anyone of you have an idea about the queries i have to search in order to save them and create a single dashboard to monitor my forwarders.

I need queries to : 

-  show the maximum CPU usage (in percent) per machine monitored, and the maximum CPU usage (in percent) of all these machines

- another one exactly as the previous one, but for the average CPU usage (in percentage)

-A third one with the same concept, but for RAM instead of CPU (always in percentage)

- Same thing, with disk usage (in percentage)

-2 other ones for the inbound and outbound network trafic (in percentage with unit : 1Gbps)

 

The data are collected from the monitored machines via the plugin 'Splunk add-on for Unix and Linux', and stored in an index called Linux

Thank you !

Labels (4)
0 Karma

johnhuang
Motivator

Couple of searches I've saved around Indexers, Forwarders, etc.

-- Indexer and Search Head System and Hardware Utilization
| rest /services/server/status/resource-usage/hostwide
| table splunk_server cpu_arch cpu_count virtual_cpu_count cpu_idle_pct cpu_system_pct cpu_user_pct mem mem_used pg_paged_out pg_swapped_out normalized_load_avg_1min runnable_process_count os_name os_name_ext os_version splunk_version
| append 
    [| rest /services/server/status/partitions-space
| rename available AS disk_available capacity AS disk_capacity free AS disk_free
| table splunk_server disk_available disk_capacity disk_free fs_type mount_point]
| append 
    [| rest /services/server/info
| eval server_role=mvindex(server_roles, 0)
| table splunk_server, server_role, cluster_label]
| stats values(*) AS * by splunk_server
| table splunk_server server_role cpu_arch cpu_count virtual_cpu_count cpu_idle_pct cpu_system_pct cpu_user_pct mem mem_used pg_paged_out pg_swapped_out  disk_available disk_capacity disk_free fs_type mount_point os_name os_version splunk_version cluster_label


-- TCP Input Stats to Indeder by Forwarder
index=_internal sourcetype=splunkd group=tcpin_connections (connectionType=cooked OR connectionType=cookedSSL) fwdType=full guid=* 
| rename fwdType AS forwarder_type version AS splunk_ver arch AS os_arch os AS os_type
| stats max(_time) as _time, sum(kb) as tcp_kb_total, sparkline(avg(tcp_KBps), 1m) as tcp_kbps_avg_sparkline, avg(tcp_KBps) as tcp_kbps_avg, avg(tcp_eps) as tcp_eps_avg, max(tcp_eps) as tcp_eps_max by hostname forwarder_type splunk_ver os_arch os_type
| foreach tcp_kbps_avg tcp_kb_total tcp_eps_avg tcp_eps_max  [| eval <<FIELD>>=ROUND(<<FIELD>>, 0)]
| eval hostname=UPPER(hostname)
| table _time hostname forwarder_type splunk_ver os_arch os_type tcp_kbps_avg_sparkline tcp_kbps_avg tcp_kb_total tcp_eps_avg tcp_eps_max


-- Average 24 Hourly Event Throughput in MB (Forwarder)
index=_internal source=*metrics.log group=per_sourcetype_thruput earliest=-2d@d 
    [search index=_internal sourcetype=splunkd group=tcpin_connections fwdType=full | dedup hostname | rename hostname AS host | table host]
| bucket _time span=1h
| eval series=host
| stats sum(kb) AS size_kb BY _time series
| eval size_mb=size_kb/1024
| eval event_hour=strftime(_time, "%H:%M")
| rename series AS data_source
| chart limit=24 avg(size_mb) AS size_mb by data_source event_hour
| fillnull value="0.00" 
| addtotals fieldname="hourly_avg"
| eval hourly_avg=ROUND(hourly_avg/24, 2)
| foreach *:* hourly_avg [| eval <<FIELD>>=ROUND('<<FIELD>>', 2)]

 

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @israbenbr,

the easiest way is to see the Splunk App for Linux and Unix (https://splunkbase.splunk.com/app/273/) where you can find all the requested searches, and also the Splunk Monitor Console.

Ciao.

Giuseppe

0 Karma

israbenbr
Explorer

Hello,

The problem is that the Splunk support told me to avoid that solution because it will be soon no more supported by Splunk

 

 

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @israbenbr,

I don't know why Splunk Support told this, but anyway, you could use that app as a guide to find the searches to put in your own app.

Anyway I developed this dashboard, see if it could help you.

<form>
  <label>Hardware and Software Details: Linux Servers</label>
  <fieldset submitButton="false">
    <input type="dropdown" token="host">
      <label>Server</label>
      <prefix>host="</prefix>
      <suffix>"</suffix>
      <fieldForLabel>host</fieldForLabel>
      <fieldForValue>host</fieldForValue>
      <search>
        <query>index=os sourcetype=hardware [ | inputlookup Server | fields host ] 
          | eval host=upper(host) 
          | dedup host 
          | sort host 
          | table host</query>
      </search>
    </input>
  </fieldset>
  <row>
    <panel>
      <title>HostName</title>
      <html>
      <h3 align="center">
        <strong> <font size="10">Server<img src="/static/app/infrastructure_monitoring/Linux_logo.png" style="height:100px;border:0;"/>
            </font>
          </strong>
        </h3>
    </html>
      <single>
        <search>
          <query>index=os sourcetype=hardware $host$ 
            | dedup host 
            | table host</query>
          <earliest>-24h@h</earliest>
          <latest>now</latest>
          <sampleRatio>1</sampleRatio>
        </search>
      </single>
    </panel>
  </row>
  <row>
    <panel>
      <title>Hardware</title>
      <table>
        <search>
          <query>index=os sourcetype=hardware $host$
            | dedup host 
            | eval MEMORY_REAL=MEMORY_REAL/1024/1024, MEMORY_SWAP=MEMORY_SWAP/1024/1024, host=upper(host)
            | lookup Server host OUTPUT IP Tipologia
            | table IP Tipologia CPU_TYPE CPU_COUNT CPU_CACHE MEMORY_REAL MEMORY_SWAP fd0 hdc sda 
            | rename CPU_TYPE AS CPU CPU_COUNT AS "Number of CPUs" CPU_CACHE AS Cache MEMORY_REAL As RAM MEMORY_SWAP AS Swap HARD_DRIVES AS "Hard Disks" fd0 AS "Floppy Disk" hdc AS "Hard Disk" sda AS "Virtual disk"</query>
          <earliest>-24h@h</earliest>
          <latest>now</latest>
          <sampleRatio>1</sampleRatio>
        </search>
        <option name="count">100</option>
        <option name="dataOverlayMode">none</option>
        <option name="drilldown">cell</option>
        <option name="percentagesRow">false</option>
        <option name="rowNumbers">false</option>
        <option name="totalsRow">false</option>
        <option name="wrap">true</option>
        <format type="number" field="Floppy Disk">
          <option name="unit">GB</option>
        </format>
        <format type="number" field="Hard Disk">
          <option name="unit">GB</option>
        </format>
        <format type="number" field="Virtual disk">
          <option name="unit">GB</option>
        </format>
        <format type="number" field="RAM">
          <option name="unit">GB</option>
        </format>
        <format type="number" field="Swap">
          <option name="unit">GB</option>
        </format>
        <format type="number" field="Cache">
          <option name="unit">kB</option>
        </format>
      </table>
    </panel>
    <panel>
      <title>df</title>
      <table>
        <search>
          <query>index=os  sourcetype=df $host$ 
            | dedup host 
            | multikv 
            | table Filesystem Type Size Used Avail UsePct MountedOn</query>
          <earliest>-24h@h</earliest>
          <latest>now</latest>
          <sampleRatio>1</sampleRatio>
        </search>
        <option name="count">100</option>
        <option name="dataOverlayMode">none</option>
        <option name="drilldown">cell</option>
        <option name="percentagesRow">false</option>
        <option name="rowNumbers">false</option>
        <option name="totalsRow">false</option>
        <option name="wrap">true</option>
        <format type="number" field="Floppy Disk">
          <option name="unit">GB</option>
        </format>
        <format type="number" field="Hard Disk">
          <option name="unit">GB</option>
        </format>
        <format type="number" field="Virtual disk">
          <option name="unit">GB</option>
        </format>
        <format type="number" field="RAM">
          <option name="unit">GB</option>
        </format>
        <format type="number" field="Swap">
          <option name="unit">GB</option>
        </format>
        <format type="number" field="Cache">
          <option name="unit">kB</option>
        </format>
      </table>
    </panel>
  </row>
  <row>
    <panel>
      <title>Processes</title>
      <table>
        <search>
          <query>index=os sourcetype=ps $host$ 
            | multikv 
            | table USER PID PSR pctCPU CPUTIME pctMEM RSZ_KB VSZ_KB TTY S ELAPSED COMMAND ARGS</query>
          <earliest>-24h@h</earliest>
          <latest>now</latest>
          <sampleRatio>1</sampleRatio>
        </search>
        <option name="count">10</option>
        <option name="dataOverlayMode">none</option>
        <option name="drilldown">cell</option>
        <option name="percentagesRow">false</option>
        <option name="rowNumbers">false</option>
        <option name="totalsRow">false</option>
        <option name="wrap">true</option>
        <format type="number" field="Floppy Disk">
          <option name="unit">GB</option>
        </format>
        <format type="number" field="Hard Disk">
          <option name="unit">GB</option>
        </format>
        <format type="number" field="Virtual disk">
          <option name="unit">GB</option>
        </format>
        <format type="number" field="RAM">
          <option name="unit">GB</option>
        </format>
        <format type="number" field="Swap">
          <option name="unit">GB</option>
        </format>
        <format type="number" field="Cache">
          <option name="unit">kB</option>
        </format>
      </table>
    </panel>
    <panel>
      <title>top command</title>
      <table>
        <search>
          <query>index=os sourcetype=top $host$ 
            | dedup host 
            | multikv 
            | table PID USER PR NI VIRT RES SHR S pctCPU pctMEM cpuTIME COMMAND</query>
          <earliest>-24h@h</earliest>
          <latest>now</latest>
          <sampleRatio>1</sampleRatio>
        </search>
        <option name="count">10</option>
        <option name="dataOverlayMode">none</option>
        <option name="drilldown">cell</option>
        <option name="percentagesRow">false</option>
        <option name="rowNumbers">false</option>
        <option name="totalsRow">false</option>
        <option name="wrap">true</option>
        <format type="number" field="Floppy Disk">
          <option name="unit">GB</option>
        </format>
        <format type="number" field="Hard Disk">
          <option name="unit">GB</option>
        </format>
        <format type="number" field="Virtual disk">
          <option name="unit">GB</option>
        </format>
        <format type="number" field="RAM">
          <option name="unit">GB</option>
        </format>
        <format type="number" field="Swap">
          <option name="unit">GB</option>
        </format>
        <format type="number" field="Cache">
          <option name="unit">kB</option>
        </format>
      </table>
    </panel>
  </row>
  <row>
    <panel>
      <title>netstat</title>
      <table>
        <search>
          <query>index=os sourcetype=netstat $host$ 
            | dedup host 
            | multikv 
            | table Proto Recv-Q Send-Q LocalAddress ForeignAddress State</query>
          <earliest>-24h@h</earliest>
          <latest>now</latest>
          <sampleRatio>1</sampleRatio>
        </search>
        <option name="count">10</option>
        <option name="dataOverlayMode">none</option>
        <option name="drilldown">cell</option>
        <option name="percentagesRow">false</option>
        <option name="rowNumbers">false</option>
        <option name="totalsRow">false</option>
        <option name="wrap">true</option>
        <format type="number" field="Floppy Disk">
          <option name="unit">GB</option>
        </format>
        <format type="number" field="Hard Disk">
          <option name="unit">GB</option>
        </format>
        <format type="number" field="Virtual disk">
          <option name="unit">GB</option>
        </format>
        <format type="number" field="RAM">
          <option name="unit">GB</option>
        </format>
        <format type="number" field="Swap">
          <option name="unit">GB</option>
        </format>
        <format type="number" field="Cache">
          <option name="unit">kB</option>
        </format>
      </table>
    </panel>
    <panel>
      <title>packages</title>
      <table>
        <search>
          <query>index=os sourcetype=package $host$ 
            | multikv 
            | dedup host NAME 
            | table NAME VERSION RELEASE ARCH VENDOR GROUP 
            | sort NAME</query>
          <earliest>-24h@h</earliest>
          <latest>now</latest>
          <sampleRatio>1</sampleRatio>
        </search>
        <option name="count">10</option>
        <option name="dataOverlayMode">none</option>
        <option name="drilldown">cell</option>
        <option name="percentagesRow">false</option>
        <option name="rowNumbers">false</option>
        <option name="totalsRow">false</option>
        <option name="wrap">true</option>
        <format type="number" field="Floppy Disk">
          <option name="unit">GB</option>
        </format>
        <format type="number" field="Hard Disk">
          <option name="unit">GB</option>
        </format>
        <format type="number" field="Virtual disk">
          <option name="unit">GB</option>
        </format>
        <format type="number" field="RAM">
          <option name="unit">GB</option>
        </format>
        <format type="number" field="Swap">
          <option name="unit">GB</option>
        </format>
        <format type="number" field="Cache">
          <option name="unit">kB</option>
        </format>
      </table>
    </panel>
  </row>
  <row>
    <panel>
      <title>openPorts</title>
      <table>
        <search>
          <query>index=os sourcetype=openPorts $host$ 
            | dedup host 
            | multikv 
            | table Proto Port</query>
          <earliest>-24h@h</earliest>
          <latest>now</latest>
          <sampleRatio>1</sampleRatio>
        </search>
        <option name="count">10</option>
        <option name="dataOverlayMode">none</option>
        <option name="drilldown">cell</option>
        <option name="percentagesRow">false</option>
        <option name="rowNumbers">false</option>
        <option name="totalsRow">false</option>
        <option name="wrap">true</option>
        <format type="number" field="Floppy Disk">
          <option name="unit">GB</option>
        </format>
        <format type="number" field="Hard Disk">
          <option name="unit">GB</option>
        </format>
        <format type="number" field="Virtual disk">
          <option name="unit">GB</option>
        </format>
        <format type="number" field="RAM">
          <option name="unit">GB</option>
        </format>
        <format type="number" field="Swap">
          <option name="unit">GB</option>
        </format>
        <format type="number" field="Cache">
          <option name="unit">kB</option>
        </format>
      </table>
    </panel>
    <panel>
      <title>protocol</title>
      <table>
        <search>
          <query>index=os sourcetype=protocol $host$ 
            | dedup host 
            | multikv 
            | table IPdropped TCPrexmits TCPreorder TCPpktRecv TCPpktSent UDPpktLost UDPunkPort UDPpktRecv UDPpktSent</query>
          <earliest>-24h@h</earliest>
          <latest>now</latest>
          <sampleRatio>1</sampleRatio>
        </search>
        <option name="count">10</option>
        <option name="dataOverlayMode">none</option>
        <option name="drilldown">cell</option>
        <option name="percentagesRow">false</option>
        <option name="rowNumbers">false</option>
        <option name="totalsRow">false</option>
        <option name="wrap">true</option>
        <format type="number" field="Floppy Disk">
          <option name="unit">GB</option>
        </format>
        <format type="number" field="Hard Disk">
          <option name="unit">GB</option>
        </format>
        <format type="number" field="Virtual disk">
          <option name="unit">GB</option>
        </format>
        <format type="number" field="RAM">
          <option name="unit">GB</option>
        </format>
        <format type="number" field="Swap">
          <option name="unit">GB</option>
        </format>
        <format type="number" field="Cache">
          <option name="unit">kB</option>
        </format>
      </table>
    </panel>
  </row>
  <row>
    <panel>
      <title>Users with private logins</title>
      <table>
        <search>
          <query>index=os sourcetype=usersWithLoginPrivs $host$ 
            | dedup host 
            | multikv 
            | table USERNAME HOME_DIR USER_INFO</query>
          <earliest>-24h@h</earliest>
          <latest>now</latest>
          <sampleRatio>1</sampleRatio>
        </search>
        <option name="count">100</option>
        <option name="dataOverlayMode">none</option>
        <option name="drilldown">cell</option>
        <option name="percentagesRow">false</option>
        <option name="rowNumbers">false</option>
        <option name="totalsRow">false</option>
        <option name="wrap">true</option>
        <format type="number" field="Floppy Disk">
          <option name="unit">GB</option>
        </format>
        <format type="number" field="Hard Disk">
          <option name="unit">GB</option>
        </format>
        <format type="number" field="Virtual disk">
          <option name="unit">GB</option>
        </format>
        <format type="number" field="RAM">
          <option name="unit">GB</option>
        </format>
        <format type="number" field="Swap">
          <option name="unit">GB</option>
        </format>
        <format type="number" field="Cache">
          <option name="unit">kB</option>
        </format>
      </table>
    </panel>
  </row>
  <row>
    <panel ref="Footer" app="infrastructure_monitoring"></panel>
  </row>
</form>

Ciao.

Giuseppe

0 Karma

israbenbr
Explorer

Thank you for sharing your code.

I am editing it to create my own customized dashboard, but i am really struggling : 

This is the query I made :

index=linux sourcetype=df host="the_host_name" Filesystem=/dev/s* earliest=-7d
| dedup host Filesystem
| stats avg(UsePct) AS Utilisation_Moyenne, max(UsePct) AS Utilisation_Maximale
| table host Filesystem UsePct Utilisation_Moyenne Utilisation_Maximale

 

It doesn't work : it only shows the field "Utilisation_Maximale" in only one row..

I want it to show for a given host, the max value (in percentage) and average value (in percentage) of its 2 disks usage, for the last week

I think it doesn't work because max and avg needs to have a numeric value, but it's strange because it is showing one in the field "Utilisation_Maximale"

 

Any ideas ? 

 

Thanks.

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @israbenbr,

after a stats command you have only the fields in the stats not all the fields, so please try this:

index=linux sourcetype=df host="the_host_name" Filesystem=/dev/s* earliest=-7d
| stats avg(UsePct) AS Utilisation_Moyenne, max(UsePct) AS Utilisation_Maximale BY host Filesystem
| table host Filesystem Utilisation_Moyenne Utilisation_Maximale

Ciao.

Giuseppe

0 Karma

israbenbr
Explorer

Oh thank you,

But it worked only for the max field, the avg field is still empty

That's very strange

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @israbenbr,

check if you have also the percentage in your "UsePct" values, if yes you cannot calculate the avg, so you have to extract values using a regex.

Ciao.

Giuseppe

0 Karma

israbenbr
Explorer

Hey,

It finally worked

Now i am struggling with the query that shows the RAM utilisation

I followed this tuto : https://www.youtube.com/watch?v=nsC4YytjRCY&ab_channel=SplunkHow-To

The problem is that no search i made detects the vmstat.sh data

Any ideas ? 

 

Many thanks

0 Karma
Get Updates on the Splunk Community!

Webinar Recap | Revolutionizing IT Operations: The Transformative Power of AI and ML ...

The Transformative Power of AI and ML in Enhancing Observability   In the realm of IT operations, the ...

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...