Getting Data In

Do not index file based on content

Random_Walk
Engager

Greetings All,

I'm indexing a bunch of metrics files written every 10 minutes. Just after midnight I get a file containing the same format metrics, but each value is the sum for the previous day. This totals file I want to ignore (It messes up all sorts of use cases of the metric data). The only way to reliably identify a totals file is that the third line holds a timestamp, and this will be all zero. Any other file will have a normal ISO timestamp in this point

REGEX = ^TimeStamp\s+:\s+0000-00-00\s00.00.00.000

Is there a way to block that file's ingestion based on the content of a single line? 

 

Thanks,

R.

Labels (4)
0 Karma

Vardhan
Contributor

Hi @Random_Walk ,

Then use a script to write those kinds of events in a separate file. And if your are ingesting these files through UF then use Blacklist option to ignore the files without reading.

0 Karma

Vardhan
Contributor

Hi,

you can drop the events before indexing with the help of below settings.

props.conf

TRANSFORMS-information = eventsDrop

Transforms.conf

[eventsDrop]
REGEX =^TimeStamp\s+:\s+0000-00-00\s00.00.00.000
DEST_KEY = queue
FORMAT = nullQueue

0 Karma

Random_Walk
Engager

Hi Vardhan,

Thanks for the hint, but unfortunately this only drops the line with the Timstamp. I need to discard the entire file in the case where the file contains this 'flag' timestamp.

I'm thinking it may need to be scripted input, but I'm wondering if there are any other clever tricks.

Thanks,

R.

0 Karma
Register for .conf21 Now! Go Vegas or Go Virtual!

How will you .conf21? You decide! Go in-person in Las Vegas, 10/18-10/21, or go online with .conf21 Virtual, 10/19-10/20.