Splunk Search

Remove 99% of Data from a file with Transforms.conf

robertlynch2020
Influencer

I have data coming into SPLUNK [service] , but i only need the file name not the data in the file.

The data is getting in, but i need to reduce it.

So i am trying to reduce the data with REGEX before it hits the INDEX. For example i think i would have to take one character from each file so it will register the file and i can use the file name. I have the below REGEX but its not working. Any ideas?

transforms.conf
[NoInfo_100]
REGEX = .$
DEST_KEY = queue
FORMAT = nullQueue

props.conf
[service]
TRANSFORMS-filter = NoInfo_100

Thanks in Advance
Robert Lynch

0 Karma
1 Solution

micahkemp
Champion

Your configuration above routes lines to nullQueue, and therefore would skip indexing entirely. That's not what you described you're looking for. The below should rewrite the log lines to just the first character of the line.

[NoInfo_100]
REGEX = (.)
DEST_KEY = _raw
FORMAT = $1

View solution in original post

0 Karma

DalJeanis
Legend

If there is a header, or any other record that will ALWAYS be there in small but nonzero numbers, then use @mayurr98's solution to route everything but the header to the nullqueue.

If there is no qualifying type of records, then perhaps one character per record might be the best you can do.

0 Karma

mayurr98
Super Champion

This is done by defining a regex to match the necessary event(s) and send everything else to nullqueue

Here is a basic example that will drop everything except events that contain the string login

props.conf

[source::/var/log/foo]
 # Transforms must be applied in this order
 # to make sure events are dropped on the
 # floor prior to making their way to the
 # index processor
 TRANSFORMS-set = setnull, setparsing

In transforms.conf

[setnull]
 REGEX = .
 DEST_KEY = queue
 FORMAT = nullQueue

[setparsing]
 REGEX = login
 DEST_KEY = queue
 FORMAT = indexQueue

Let me know if this helps!

micahkemp
Champion

Your configuration above routes lines to nullQueue, and therefore would skip indexing entirely. That's not what you described you're looking for. The below should rewrite the log lines to just the first character of the line.

[NoInfo_100]
REGEX = (.)
DEST_KEY = _raw
FORMAT = $1
0 Karma

robertlynch2020
Influencer

This is really great guys, thanks.

Do you think it is possible to take in only one character per file, not per line?

In the perfect world i just want to look at the data in the filename, so the data inside the file is not usefull.

0 Karma

robertlynch2020
Influencer

Got this by adding

[service]
BREAK_ONLY_BEFORE=ererererererer
TRANSFORMS-filter = NoInfo_100

0 Karma

jkat54
SplunkTrust
SplunkTrust

Would it be easier just to run a script that prints out the file names in this directory?

Your current regex would remove the last character of each line in the file.

I really think you should use a script to list the names instead of using the indexing pipeline to transform the data.

Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.

Can’t make it to .conf25? Join us online!

Get Updates on the Splunk Community!

Can’t Make It to Boston? Stream .conf25 and Learn with Haya Husain

Boston may be buzzing this September with Splunk University and .conf25, but you don’t have to pack a bag to ...

Splunk Lantern’s Guide to The Most Popular .conf25 Sessions

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...

Unlock What’s Next: The Splunk Cloud Platform at .conf25

In just a few days, Boston will be buzzing as the Splunk team and thousands of community members come together ...