In my environment, as for the "csv" data to be captured,
The column that is not needed is dropped using SEDCMD.
For example, the following example excludes the third column "description".
example
Data
time,ipaddress,description
YYYY/mm/dd HH:MM:SS,192.x.x.x,this is ...
YYYY/mm/dd HH:MM:SS,172.x.x.x,this is ...
YYYY/mm/dd HH:MM:SS,10.x.x.x,this is ...
props.conf
SEDCMD-test = s/([^,]*),([^,]*),([^,]*)/\1,\2/g
When searching, it seems that the third column "description" was excluded from the displayed raw event.
But in the field list, "description" exists, and the field values corresponding to each event also remained as data.
As for the order of processing, I think that SEDCMD will move first than license calculation.
However, at the time of searching, it seemed that the data of the excluded column was captured, so I thought that the usage of licenses would not change.
Will I can reduce license usage by the SEDCMD exclusion?
I just tested this, and yes, it does reduce the license usage.
How I tested: Take one log file, and ingest it twice. Once to a normal sourcetype, and again to a sourcetype call "sed_yes2". Props for sed_yes2 are as follows:
[sed_yes2]
SEDCMD-yes = s/[^0-9]//g
This removes all characters but numbers. This way, I can see there is actually contents, but the lines are much smaller.
Here is the output from check Splunk's license usage. (Ignore the sed_yes sourcetype. On my first try, I typo'ed the SEDCMD:
The reason for the description field to still show up is probably because you apply INDEXED CSV extractions on the data and that takes place before the SEDCMD is applied?
Interesting question whether those indexed extractions count against your license, or whether it is just the raw event that counts...
I just tested this, and yes, it does reduce the license usage.
How I tested: Take one log file, and ingest it twice. Once to a normal sourcetype, and again to a sourcetype call "sed_yes2". Props for sed_yes2 are as follows:
[sed_yes2]
SEDCMD-yes = s/[^0-9]//g
This removes all characters but numbers. This way, I can see there is actually contents, but the lines are much smaller.
Here is the output from check Splunk's license usage. (Ignore the sed_yes sourcetype. On my first try, I typo'ed the SEDCMD:
Splunk support said that the amount of license usage will change depending on SEDMCD too.
Thank you for answering.
Thank you for answer supersleepwalker!
Yeah!
I tried the same way in the same way, but the license usage seemed to be decreasing on the log!
I would like someone in Splunk to tell which is right, if possible.
Hi Yutaka
One way to filter the events before hitting the index is to filter them trough a heavy forwarder.
So your normal forwarder is forwarding you csv to this heavy forwarder, which will strip the unwanted fields and then forward the reduced data load itself to the indexer.
Since i have here a few garbage generators, this is a good way to reduce the garbage to a suitable amount and save some volume on the daily license.
SEDCMD modifies the contents of the event before it hits the index - therefore before the license. Are you saying that for the events for which description was stripped the field is present at search time? If so, what is the source type of the data - "csv" by any chance? If so, there is extra index time processing done for them (csv and json) - I'm not sure what's used against the license then, maybe someone from splunk support can clarify?
Thank you for answer.
Yes, I said that "discription" column that was exclude by SEDCMD was appeared in fields list when I searched.
And data format is "csv".