Solved: dynamicly assigning index based on eventsize

mihenn · ‎03-21-2018

Hi everyone,

I would like to send events based on their size in different indexes.
I'm currently using the props.conf and the transforms.conf. unfortunately this doesn't work as it should.

My goal is that all events with a length of 2048 or less get into the index normally. All events larger than this should be stored in the big index. But this does not work. I suspect that my REGEX is wrong.

This is my props.conf:

[big_events]
TRUNCATE = 0
SHOULD_LINEMERGE = TRUE
TRANSFORMS-1 = big_index
TRANSFORMS-2 = normal_index

And my transforms.conf:

[big_index]
REGEX = ^.{2049,}$
SOURCE_KEY = _raw
DEST_KEY = _MetaData:Index
FORMAT = big

[normal_index]
SOURCE_KEY = _raw
REGEX = ^.{1,2048}$
DEST_KEY = _MetaData:Index
FORMAT = normal

Currently, all events are placed in the index that is queried first or in main. So if I call the big_index first, then the events are distributed randomly between big and main. On the other hand, the events are divided between normal and main.

Does anyone know why?

Thank you very much.

P.S. The TRUNCATE=0 is necessary because the events are very large (>5MB)

mihenn · ‎03-21-2018

I don't know what happened, but everything's working now. Splunk showed me the wrong index for some events. After about 5 minutes all events were where they should be. It could be that the pipeline was full because the data is very large. Therefore, my changes may have been implemented with a considerable delay.

Sometimes you just have to be patient.

View solution in original post

mihenn · ‎03-21-2018

I don't know what happened, but everything's working now. Splunk showed me the wrong index for some events. After about 5 minutes all events were where they should be. It could be that the pipeline was full because the data is very large. Therefore, my changes may have been implemented with a considerable delay.

Sometimes you just have to be patient.

skoelpin · ‎03-21-2018

What's your purpose for wanting large events in different indexes?

Here's the reasons for using different indexes
1) Access Controls
2) Retention rates
3) Speed (in some cases)

You're using more disk if you create lots of indexes since each index needs additional tsidx files. Why not just put them in the same index and separate out by sourcetypes/eventtypes?

mihenn · ‎03-21-2018

The large events slow down the performance of splunk web. It is nearly unusable with events >1MB. So I want to seperate them. A scheduled search copies summaries of this events to index normal. This summaries contains a link to the full event. So the user can decide to open it if necessary, but the overall performance is fine.

skoelpin · ‎03-21-2018

I can see that. Let's talk about your event sizes.. Why do you have such large events?

Having events that large somewhat defeat the purpose of ingesting them in Splunk. The purpose of Splunk is to break down log samples into small events so you can run analytics on those events. Perhaps you can paste a sample event?

mihenn · ‎03-21-2018

The events come from the log of an application processing large json objects. If an error occurs during processing. The corresponding json object is transferred to Splunk for further analysis.

Unfortunately, I cannot provide such an event here, as it is confidential data.

skoelpin · ‎03-21-2018

Your approach is less than ideal for many reasons including TRUNCATE=0 and SHOULD_LINE_MERGE=true in addition to these large events. You will run into issues down the road, but ignorance is bliss I guess

mihenn · ‎03-21-2018

I know. TRUNCATE=0 is very dangerous. But you cannot set TRUNCATE higher than 999999 and that ist too small. The props.conf is not final yet. Line Breaking and time formats are missing. This was just an sandbox testing. Now I have to implement it on the real events.

dynamicly assigning index based on eventsize

Index This | What is broken 80% of the time by February?

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...

Splunk MCP & Agentic AI: Machine Data Without Limits

Join the Conversation

dynamicly assigning index based on eventsize

Index This | What is broken 80% of the time by February?

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...

Splunk MCP & Agentic AI: Machine Data Without Limits