Getting Data In

dynamicly assigning index based on eventsize

Path Finder

Hi everyone,

I would like to send events based on their size in different indexes.
I'm currently using the props.conf and the transforms.conf. unfortunately this doesn't work as it should.

My goal is that all events with a length of 2048 or less get into the index normally. All events larger than this should be stored in the big index. But this does not work. I suspect that my REGEX is wrong.

This is my props.conf:

[big_events]
TRUNCATE = 0
SHOULD_LINEMERGE = TRUE
TRANSFORMS-1 = big_index
TRANSFORMS-2 = normal_index

And my transforms.conf:

[big_index]
REGEX = ^.{2049,}$
SOURCE_KEY = _raw
DEST_KEY = _MetaData:Index
FORMAT = big

[normal_index]
SOURCE_KEY = _raw
REGEX = ^.{1,2048}$
DEST_KEY = _MetaData:Index
FORMAT = normal

Currently, all events are placed in the index that is queried first or in main. So if I call the big_index first, then the events are distributed randomly between big and main. On the other hand, the events are divided between normal and main.

Does anyone know why?

Thank you very much.

P.S. The TRUNCATE=0 is necessary because the events are very large (>5MB)

0 Karma
1 Solution

Path Finder

I don't know what happened, but everything's working now. Splunk showed me the wrong index for some events. After about 5 minutes all events were where they should be. It could be that the pipeline was full because the data is very large. Therefore, my changes may have been implemented with a considerable delay.

Sometimes you just have to be patient.

View solution in original post

0 Karma

Path Finder

I don't know what happened, but everything's working now. Splunk showed me the wrong index for some events. After about 5 minutes all events were where they should be. It could be that the pipeline was full because the data is very large. Therefore, my changes may have been implemented with a considerable delay.

Sometimes you just have to be patient.

View solution in original post

0 Karma

SplunkTrust
SplunkTrust

What's your purpose for wanting large events in different indexes?

Here's the reasons for using different indexes
1) Access Controls
2) Retention rates
3) Speed (in some cases)

You're using more disk if you create lots of indexes since each index needs additional tsidx files. Why not just put them in the same index and separate out by sourcetypes/eventtypes?

0 Karma

Path Finder

The large events slow down the performance of splunk web. It is nearly unusable with events >1MB. So I want to seperate them. A scheduled search copies summaries of this events to index normal. This summaries contains a link to the full event. So the user can decide to open it if necessary, but the overall performance is fine.

0 Karma

SplunkTrust
SplunkTrust

I can see that. Let's talk about your event sizes.. Why do you have such large events?

Having events that large somewhat defeat the purpose of ingesting them in Splunk. The purpose of Splunk is to break down log samples into small events so you can run analytics on those events. Perhaps you can paste a sample event?

0 Karma

Path Finder

The events come from the log of an application processing large json objects. If an error occurs during processing. The corresponding json object is transferred to Splunk for further analysis.

Unfortunately, I cannot provide such an event here, as it is confidential data.

0 Karma

SplunkTrust
SplunkTrust

Your approach is less than ideal for many reasons including TRUNCATE=0 and SHOULD_LINE_MERGE=true in addition to these large events. You will run into issues down the road, but ignorance is bliss I guess

0 Karma

Path Finder

I know. TRUNCATE=0 is very dangerous. But you cannot set TRUNCATE higher than 999999 and that ist too small. The props.conf is not final yet. Line Breaking and time formats are missing. This was just an sandbox testing. Now I have to implement it on the real events.

0 Karma