Getting Data In

Events to Metric Conversion : Does data cloning and routing cause overhead on splunk

Poojitha
Communicator

Hi All,

I have a requirement  where I have to write metrics data to metrics index from existing events index as soon as the data is ingested. Most important consideration is this should not disturb existing events index.

Props.conf and transforms.conf of the app created for this is as below :

Props.conf :

[test:applogs]

TRANSFORMS-clone = clone_only_metrics
NO_BINARY_CHECK = true
DATETIME_CONFIG = NONE

[applogs_clone]

TRANSFORMS-extract = extract_metric_values
TRANSFORMS-routing = route_to_metrics
METRIC-SCHEMA-TRANSFORMS = metric-schema:log_to_metrics
NO_BINARY_CHECK = true
DATETIME_CONFIG = NONE


Transforms.conf : 

[extract_metric_values]
REGEX = <myregex which is working fine.
FORMAT = l_time::$1 Id::$2 metric_value::$3 metric_name::$4 pod::$5 namespace::$6 podid::$7 k8s::$8 c_name::$9 docker::$10 c_hash::$11 c_image::$12 extracted_host::$13
WRITE_META = true

clone_only_metrics]
REGEX = "metricName"
CLONE_SOURCETYPE = applogs_clone
WRITE_META = true

[route_to_metrics]
REGEX = .*
DEST_KEY = _MetaData:Index
FORMAT = new_metrics_index

[metric-schema:log_to_metrics]
METRIC-SCHEMA-MEASURES = metric_value
METRIC-SCHEMA-WHITELIST-DIMS = Id, pod, namespace, podid, c_name, c_hash, c_image, docker, extracted_host

Workflow  :
1) test:applogs is the existing sourcetype that has event logs.
2) This sourcetype is evaluated for cloning and logs matching metricName are cloned.
3) Cloned sourcetype pipeline does —> field extraction, index routing , metric schema enforcement and metric is stored.

The application logs ingested is very huge (every second) so wanted to be sure that this implementation does not create overhead on splunk. So , I have below questions :
1) Does the clone sourcetype CLONE_SOURCETYPE = applogs_clone , creates second copy of data ?
2) When I run | mpreview index=new_metrics_index , I see this sourcetype (applogs_clone). When I just run sourcetype=applogs_clone, there is no result. So, does this mean after applying the metrics schema , the original logs in the cloned sourcetype is discarded ( no more contains duplicate data ) and I should be good with this implementation.

Please share your thoughts and help me on this.



0 Karma

Poojitha
Communicator

@livehybrid  : From your response, it answers to me that there is no data duplication of the original logs ( I mean the events cloned from the original sourcetype test-applogs ) and I should be good with this implementation or Should I use nullQueue to drop this cloned sourcetype ? Does it work ?

0 Karma

livehybrid
SplunkTrust
SplunkTrust

Hi @Poojitha 

Thats right, you wont get a duplication of the entire event, the only 'duplicate' part is the metrics you have extracted. 

You do not need to do anything with nullQueue - if you did then you'd probably end up nullQueuing the metric version of your event which you wouldnt want to do.

🌟 Did this answer help you? If so, please consider:

  • Adding karma to show it was useful
  • Marking it as the solution if it resolved your issue
  • Commenting if you need any clarification

Your feedback encourages the volunteers in this community to continue contributing

0 Karma

livehybrid
SplunkTrust
SplunkTrust

Hi @Poojitha 

You are correct in that the reason you dont see the applogs_clone sourcetype with a regular search is because it has been saved as metrics, so you need to use the metric specific commands. The rest of the event from the cloned sourcetype will be dropped and only the extracted metrics will be saved. The original event will be untouched.

If you are on an ingest based license then you will pay 150 bytes per metric ingested (see https://help.splunk.com/ja-jp/data-management/splunk-enterprise-admin-manual/9.3/configure-splunk-li...

Its hard for us to determine the load on the Splunk environment when carrying out this 'metricification' because its largely the regex computation which could cause the biggest impact - a bad regex could cause unnecessary processing, for example.

🌟 Did this answer help you? If so, please consider:

  • Adding karma to show it was useful
  • Marking it as the solution if it resolved your issue
  • Commenting if you need any clarification

Your feedback encourages the volunteers in this community to continue contributing

Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Index This | What travels the world but is also stuck in place?

April 2026 Edition  Hayyy Splunk Education Enthusiasts and the Eternally Curious!   We’re back with this ...

Discover New Use Cases: Unlock Greater Value from Your Existing Splunk Data

Realizing the full potential of your Splunk investment requires more than just understanding current usage; it ...

Continue Your Journey: Join Session 2 of the Data Management and Federation Bootcamp ...

As data volumes continue to grow and environments become more distributed, managing and optimizing data ...