Solved: Issues with _internal metrics.log

GersonGarcia · ‎02-01-2019

All,
I am working on project to "predict" how much Splunk license I may need in order to onboard new customer. Usually we ingest the same information for all customers, what is the main difference is the number of entries in the logs.
The problem I am having is that I cannot trust on the _internal metrics.log of my indexers. Looks like it does not have all information. For example, if I run:

index=ssn host=*xey* earliest="01/31/2019:1:29:00" latest="01/31/2019:2:29:00" | stats count by host

1   lasxeypr01dem01.las.ssnsgs.net  107
2   lasxeypr01slv01.las.ssnsgs.net  120
3   lasxeypr01vmw01.las.ssnsgs.net  28865
4   lasxeypr01vmw02.las.ssnsgs.net  12242

At the same time:

index="_internal" source="*metrics.log" group="per_host_thruput" earliest="01/31/2019:1:29:00" latest="01/31/2019:2:29:00" series=*xey* | chart sum(kb) by series | sort - sum(kb)
No results found.

Some data is there:

index="_internal" source="*metrics.log" group="per_host_thruput" earliest="01/31/2019:1:29:00" latest="01/31/2019:2:29:00" | stats count by host
1   arnvtnpr01spl01.arn.ssnsgs.net  117
2   iadphite01spl01.iad.ssnsgs.net  116
3   janentpr01spl01.jan.ssnsgs.net  116
4   lascocpr01mys01.las.ssnsgs.net  116
5   lascocpr01mys02.las.ssnsgs.net  117
6   lascocpr01mys03.las.ssnsgs.net  116
7   lashrmpr01kaf05     117
8   lashrmpr01wor02     116
9   lasssnpr01spl01.las.ssnsgs.net  1160
10  lasssnpr01spl02.las.ssnsgs.net  1160
11  lasssnpr01spl03.las.ssnsgs.net  1160
12  lasssnpr01spl04.las.ssnsgs.net  679
13  lasssnpr01spl05.las.ssnsgs.net  170
14  lasssnpr01spl06.las.ssnsgs.net  116
15  lasssnpr01spl07.las.ssnsgs.net  116
16  lasssnpr01spl08.las.ssnsgs.net  213
17  lasssnspl01app01.las.ssnsgs.net     188
18  lcxfplpr02spl01.fpl.ssnsgs.net  1160
19  litentpr02spl01.lit.ssnsgs.net  117
20  okcogepr02spl01.okc.ssnsgs.net  116
21  pdxpcfte01spl01.pdx.ssnsgs.net  152
22  phlphipr01spl01.phl.ssnsgs.net  117
23  sanssnpoc02slv01.san.ssnsgs.net     116
24  sanssnpr01spl01.san.ssnsgs.net  1160
25  sanssnpr01spl02.san.ssnsgs.net  1160
26  sanssnpr01spl03.san.ssnsgs.net  1160
27  sanssnpr01spl04.san.ssnsgs.net  125
28  sanssnpr01spl05.san.ssnsgs.net  160
29  sanssnpr01spl06.san.ssnsgs.net  195
30  sanssnpr01spl10     1160

I know for sure I have data ingested for these hosts.

So, how can get the exactly amount of data that is indexed? It is some rotation on the _internal index that I am missing?

Thank you,

Gerson

chrisyounger · ‎02-01-2019

Hi @GersonGarcia

The metrics.log can squash or summarise the metrics for source, sourcetype or host if there are too many. If you need exact data and you don't mind this query being slow, then you can do this: <search> | eval len = len(_raw) | stats sum(len) as bytes

View solution in original post

GersonGarcia · ‎02-04-2019

@chrisyoungerjds I believe I can find in a different log:

index=_internal sourcetype=splunkd group=tcpout_connections host=*xey* earliest="01/31/2019:1:29:00" latest="01/31/2019:2:29:00" | chart sum(kb) by host | sort - sum(kb)

1   lasxeypr01vmw01.las.ssnsgs.net  19532.55
2   lasxeypr01vmw02.las.ssnsgs.net  17314.92
3   lasxeypr01nan01.las.ssnsgs.net  1520.58
4   lasxeypr01sla01.las.ssnsgs.net  1393.90
5   lasxeypr01gpl01.las.ssnsgs.net  1360.50
6   lasxeypr01dem01.las.ssnsgs.net  1283.92
7   lasxeypr01vmw03.las.ssnsgs.net  1269.57
8   sanxeyte01dem01.san.ssnsgs.net  1233.25

GersonGarcia · ‎02-04-2019

The problem here is that if I have any transformation before ingestion, it will be lost...

chrisyounger · ‎02-01-2019

Hi @GersonGarcia

The metrics.log can squash or summarise the metrics for source, sourcetype or host if there are too many. If you need exact data and you don't mind this query being slow, then you can do this: <search> | eval len = len(_raw) | stats sum(len) as bytes

GersonGarcia · ‎02-04-2019

Humm the problem is it will take forever to complete the search for all hosts past day:

host=xey earliest=-1d@d latest=@d | eval len = len(_raw) | stats sum(len) as bytes by index host
1 main lasxeypr01slv01.las.ssnsgs.net 24430
2 os lasxeypr01dem01.las.ssnsgs.net 10702044
3 os lasxeypr01gpl01.las.ssnsgs.net 11615854
4 os lasxeypr01nan01.las.ssnsgs.net 19561100
5 os lasxeypr01sla01.las.ssnsgs.net 14134946
6 os lasxeypr01vmw01.las.ssnsgs.net 111012962
7 os lasxeypr01vmw02.las.ssnsgs.net 56708985
8 os lasxeypr01vmw03.las.ssnsgs.net 9954705
9 os sanxeyte01dem01.san.ssnsgs.net 9743627
10 ssn lasxeypr01dem01.las.ssnsgs.net 569558
11 ssn lasxeypr01slv01.las.ssnsgs.net 3102610
12 ssn lasxeypr01vmw01.las.ssnsgs.net 135302275
13 ssn lasxeypr01vmw02.las.ssnsgs.net 51478532

This search has completed and has returned 13 results by scanning 1,724,992 events in 86.817 seconds

chrisyounger · ‎02-04-2019

yes that is the downside. The only real solution I can offer is with estimation. Basically run this query over a smaller time range to find out how large events typically are:

host=xey earliest=-1d@d latest=@d | eval len = len(_raw) | stats avg(len) as avg_bytes by index host

then you can run a super fast tstats command to get the count of events per index host

|tstats count where index = _internal sourcetype=splunkd by host index

and you can multiply the numbers together to determine approx how much data was used by each host.

GersonGarcia · ‎02-04-2019

Yeah, I guess I could, but the problem is the log size depends of many factors, and it is never the same in two hosts...
Thank you for your help.

Issues with _internal metrics.log

Observability Unlocked: Kubernetes Monitoring with Splunk Observability Cloud

Update Your SOAR Apps for Python 3.13: What Community Developers Need to Know

October Community Champions: A Shoutout to Our Contributors!

Are you a member of the Splunk Community?

Issues with _internal metrics.log

Observability Unlocked: Kubernetes Monitoring with Splunk Observability Cloud

Update Your SOAR Apps for Python 3.13: What Community Developers Need to Know

October Community Champions: A Shoutout to Our Contributors!