Getting Data In

With a Splunk forwarder, what are the best ways to control a format's data?

oxthon
New Member

Hello everyone,

I hope you are fine.

So I have a question about the indexing of data in Splunk and especially the control of the data.

My configuration is an indexer distributed with a forwarder.

I receive data from a remote mount. These are CSV files.

An example of structure:
date, host, ipv4, ipv6, dns, nb_packet, size, ....

line 125: ipv4=12.32.45.255 => right
line 356: ipv4= 42.hello!.84.125 => wrong so go in index=error please and hurry up 🙂

I would like to control the content of the data. For example, that the format of ipv4 is good.
is it possible for each field to control the format of its value in transform.conf or props.conf?

Today, I run my CSV python (panda) to control them.

Is Splunk able to do it?

If you have an example with a CSV with two or three fields, I'm interested.

I thank you a thousand times.

Oxthon.

0 Karma

MuS
SplunkTrust
SplunkTrust

Hi oxthon,

Well looking at this in a pure technical way it is of course possible to do this, does it make sense to do it ¯\_(ツ)_/¯ Most likely better to follow @ddrillic 's advice and make sure those events do not come into Splunk in the first place.

But back to your question, you can use a props & transforms setup to check if an event contains a valid IP and put it into index=right anything with an invalid IP will go into index=error. This setup can be based on source, sourcetype or host.

Try something like this:

props.conf

[your sourcetype name here]
TRANSFORMS - 000-sourcetypeName-routing-based-on-ip = 001-sourcetypeName-routing, 000-default-errorRouting

transforms.conf

[000-default-errorRouting]
REGEX = =\s[\d\w\.!]+\s
DEST_KEY =  _MetaData:index
FORMAT = error

[001-sourcetypeName-routing]
REGEX = (?:\d{1,3}\.){3}\d{1,3}
DEST_KEY =  _MetaData:index
FORMAT = right

These setting must go onto the parsing Splunk instance (HWF or indexer) and you need to restart this instance.

Hope this helps ...

cheers, MuS

ddrillic
Ultra Champion

-- control the content of the data.

It's not part of the product. Actually, the product prides itself in allowing any data through and therefore, I would place safeguards before the data is being ingested, meaning, before it reaches Splunk.

Get Updates on the Splunk Community!

Understanding Generative AI Techniques and Their Application in Cybersecurity

Watch On-Demand Artificial intelligence is the talk of the town nowadays, with industries of all kinds ...

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

Using the Splunk Threat Research Team’s Latest Security Content

REGISTER HERE Tech Talk | Security Edition Did you know the Splunk Threat Research Team regularly releases ...