I have XML input logs in Splunk.
I have already extracted the required fields, totaling 10 fields.
I need to ensure any other fields that are extracted are ignored and not indexed in Splunk.
Can I set it so that if a field is not in the extracted list, it is automatically ignored?
Is this possible?
Okay, got it.
One last thing: is there any regex to check if any field not in the extracted list can be ignored from indexing?
Hi @SalahKhattab ,
no, it's the opposite: you have to define only the regex extractions for the fields you want, and the others will not be extracted (always if you didn't defined INDEXED_EXTRACTIONS=XML).
let me know if I can help you more, or, please, accept one answer for the other people of Community.
Ciao and happy splunking
Giuseppe
P.S.: Karma Points are appreciated 😉
Sorry, I didn’t quite get your point. Let me clarify.
For example, if this is my data:
I want to extract only ActionDate and RecordNotes and ignore all other fields during ingestion. This way, the data will be cleared of unnecessary fields. In transforms.conf, I aim to create a regex pattern for ActionDate and RecordNotes to filter out other fields, making the resulting data look like this:
How can I achieve this?
Manipulating structured data with regexes is not a very good idea. It would be better to use an external tool to clean up your data before ingesting.
Hi @SalahKhattab ,
read the above link for anonymizing, you'll find the use of SEDCMD in props.conf to remove part of your logs:
SEDCOMD_reduce_fields = s/<Interceptor>(.*)\<ActionDate\>2-04-24\<\/ActionDate\>(.*)\<RecordNotes\>test\<\/RecordNotes\>(.*)\<\/Interceptor\>/<Interceptor\>\<ActionDate\>2-04-24\<\/ActionDate\>\<RecordNotes\>test\<\/RecordNotes\>\<\/Interceptor\>/g
that you can test at https://regex101.com/r/fIpO23/1
Ciao.
Giuseppe
Hello Giuseppe,
In my case, the goal is to ensure that the data is cleaned before indexing.
For instance, if the data is:
I only need the data for the <test> field, and I don’t want the <test2> field to appear. Additionally, there are many fields that I don’t require, so creating a regex for each unwanted field to remove it with SEDCMD or a blacklist would be challenging.
Is there a way to delete fields that aren’t extracted from the log before indexing?
Hi @SalahKhattab ,
if you want to avoid to index a part of data, the job is more complicated because the only way is the approach to anonymize data (https://docs.splunk.com/Documentation/Splunk/9.3.1/Data/Anonymizedata).
In other words, you should delete some parts of your logs before indexing.
Why do you want to do this: ro save some license costs or to avoid that some data are visible?
If you don't have one of the above requirements, I hint to index all the data, because the removed data could be useful for you.
Ciao.
Giuseppe
Hi @SalahKhattab ,
unless you extracted your fields at index time, fields are extracted at search time, so all the fields that you configured will be extracted.
I suppose that you extracted the fields using INDEXED_EXTRACTIONS=XML, in this case all the fields you have are extracted at search time and this doesn't consume storage or memory.
It's different is you use regex extractions and not INDEXED_EXTRACTIONS=XML, in this case, only the configured fields are extracted.
Why is so mandatory for you that the other fields aren't extracted?
Ciao.
Giuseppe