Filter Logs

SalahKhattab · ‎10-27-2024

I have XML input logs in Splunk.

I have already extracted the required fields, totaling 10 fields.

I need to ensure any other fields that are extracted are ignored and not indexed in Splunk.

Can I set it so that if a field is not in the extracted list, it is automatically ignored?

Is this possible?

SalahKhattab · ‎10-27-2024

Okay, got it.

One last thing: is there any regex to check if any field not in the extracted list can be ignored from indexing?

gcusello · ‎10-27-2024

Hi @SalahKhattab ,

no, it's the opposite: you have to define only the regex extractions for the fields you want, and the others will not be extracted (always if you didn't defined INDEXED_EXTRACTIONS=XML).

let me know if I can help you more, or, please, accept one answer for the other people of Community.

Ciao and happy splunking

Giuseppe

P.S.: Karma Points are appreciated 😉

SalahKhattab · ‎10-27-2024

Sorry, I didn’t quite get your point. Let me clarify.

For example, if this is my data:

I want to extract only ActionDate and RecordNotes and ignore all other fields during ingestion. This way, the data will be cleared of unnecessary fields. In transforms.conf, I aim to create a regex pattern for ActionDate and RecordNotes to filter out other fields, making the resulting data look like this:

How can I achieve this?

PickleRick · ‎10-27-2024

Manipulating structured data with regexes is not a very good idea. It would be better to use an external tool to clean up your data before ingesting.

gcusello · ‎10-27-2024

Hi @SalahKhattab ,

read the above link for anonymizing, you'll find the use of SEDCMD in props.conf to remove part of your logs:

SEDCOMD_reduce_fields = s/<Interceptor>(.*)\<ActionDate\>2-04-24\<\/ActionDate\>(.*)\<RecordNotes\>test\<\/RecordNotes\>(.*)\<\/Interceptor\>/<Interceptor\>\<ActionDate\>2-04-24\<\/ActionDate\>\<RecordNotes\>test\<\/RecordNotes\>\<\/Interceptor\>/g

that you can test at https://regex101.com/r/fIpO23/1

Ciao.

Giuseppe

SalahKhattab · ‎10-27-2024

Hello Giuseppe,

In my case, the goal is to ensure that the data is cleaned before indexing.

For instance, if the data is:

<test>dasdada</test><test2>asdasda</test2>

I only need the data for the <test> field, and I don’t want the <test2> field to appear. Additionally, there are many fields that I don’t require, so creating a regex for each unwanted field to remove it with SEDCMD or a blacklist would be challenging.

Is there a way to delete fields that aren’t extracted from the log before indexing?

gcusello · ‎10-27-2024

Hi @SalahKhattab ,

if you want to avoid to index a part of data, the job is more complicated because the only way is the approach to anonymize data (https://docs.splunk.com/Documentation/Splunk/9.3.1/Data/Anonymizedata).

In other words, you should delete some parts of your logs before indexing.

Why do you want to do this: ro save some license costs or to avoid that some data are visible?

If you don't have one of the above requirements, I hint to index all the data, because the removed data could be useful for you.

Ciao.

Giuseppe

gcusello · ‎10-27-2024

Hi @SalahKhattab ,

unless you extracted your fields at index time, fields are extracted at search time, so all the fields that you configured will be extracted.

I suppose that you extracted the fields using INDEXED_EXTRACTIONS=XML, in this case all the fields you have are extracted at search time and this doesn't consume storage or memory.

It's different is you use regex extractions and not INDEXED_EXTRACTIONS=XML, in this case, only the configured fields are extracted.

Why is so mandatory for you that the other fields aren't extracted?

Ciao.

Giuseppe

Filter Logs

field extraction

Prove Your Splunk Prowess at .conf25—No Prereqs Required!

Splunk Observability Cloud's AI Assistant in Action Series: Observability as Code

Splunk Answers Content Calendar, July Edition I

Are you a member of the Splunk Community?

Filter Logs

field extraction

Prove Your Splunk Prowess at .conf25—No Prereqs Required!

Splunk Observability Cloud's AI Assistant in Action Series: Observability as Code

Splunk Answers Content Calendar, July Edition I