Getting Data In

Filter Logs

SalahKhattab
Explorer

I have XML input logs in Splunk.

I have already extracted the required fields, totaling 10 fields.

I need to ensure any other fields that are extracted are ignored and not indexed in Splunk.

Can I set it so that if a field is not in the extracted list, it is automatically ignored?

Is this possible? 

Labels (1)
0 Karma

SalahKhattab
Explorer

Okay, got it.

One last thing: is there any regex to check if any field not in the extracted list can be ignored from indexing?

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @SalahKhattab ,

no, it's the opposite: you have to define only the regex extractions for the fields you want, and the others will not be extracted (always if you didn't defined INDEXED_EXTRACTIONS=XML).

let me know if I can help you more, or, please, accept one answer for the other people of Community.

Ciao and happy splunking

Giuseppe

P.S.: Karma Points are appreciated 😉

0 Karma

SalahKhattab
Explorer

 

 


Sorry, I didn’t quite get your point. Let me clarify.

For example, if this is my data: 

<Interceptor> <AttackCoords>-423423445345345.10742916222947</AttackCoords> <Outcome>2</Outcome> <Infiltrators>20</Infiltrators> <Enforcer>2</Enforcer> <ActionDate>2-04-24</ActionDate> <ActionTime>00:2:00</ActionTime> <RecordNotes>test</RecordNotes> <NumEscaped>0</NumEscaped> <LaunchCoords>-222222</LaunchCoords> <AttackVessel>111</AttackVessel> </Interceptor>

 

I want to extract only ActionDate and RecordNotes and ignore all other fields during ingestion. This way, the data will be cleared of unnecessary fields. In transforms.conf, I aim to create a regex pattern for ActionDate and RecordNotes to filter out other fields, making the resulting data look like this:

<Interceptor> <ActionDate>2-04-24</ActionDate> <RecordNotes>test</RecordNotes> </Interceptor>

How can I achieve this?

Tags (1)
0 Karma

PickleRick
SplunkTrust
SplunkTrust

Manipulating structured data with regexes is not a very good idea. It would be better to use an external tool to clean up your data before ingesting.

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @SalahKhattab ,

read the above link for anonymizing, you'll find the use of SEDCMD in props.conf to remove part of your logs:

SEDCOMD_reduce_fields = s/<Interceptor>(.*)\<ActionDate\>2-04-24\<\/ActionDate\>(.*)\<RecordNotes\>test\<\/RecordNotes\>(.*)\<\/Interceptor\>/<Interceptor\>\<ActionDate\>2-04-24\<\/ActionDate\>\<RecordNotes\>test\<\/RecordNotes\>\<\/Interceptor\>/g

that you can test at https://regex101.com/r/fIpO23/1 

Ciao.

Giuseppe

0 Karma

SalahKhattab
Explorer

 

Hello Giuseppe,

In my case, the goal is to ensure that the data is cleaned before indexing.

For instance, if the data is:

<test>dasdada</test><test2>asdasda</test2>

I only need the data for the <test> field, and I don’t want the <test2> field to appear. Additionally, there are many fields that I don’t require, so creating a regex for each unwanted field to remove it with SEDCMD or a blacklist would be challenging.

Is there a way to delete fields that aren’t extracted from the log before indexing?

 

 

 

 

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @SalahKhattab ,

if you want to avoid to index a part of data, the job is more complicated because the only way is the approach to anonymize data (https://docs.splunk.com/Documentation/Splunk/9.3.1/Data/Anonymizedata).

In other words, you should delete some parts of your logs before indexing.

Why do you want to do this: ro save some license costs or to avoid that some data are visible?

If you don't have one of the above requirements, I hint to index all the data, because the removed data could be useful for you.

Ciao.

Giuseppe

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @SalahKhattab ,

unless you extracted your fields at index time, fields are extracted at search time, so all the fields that you configured will be extracted.

I suppose that you extracted the fields using INDEXED_EXTRACTIONS=XML, in this case all the fields you have are extracted at search time and this doesn't consume storage or memory.

It's different is you use regex extractions and not INDEXED_EXTRACTIONS=XML, in this case, only the configured fields are extracted.

Why is so mandatory for you that the other fields aren't extracted?

Ciao.

Giuseppe

0 Karma
Get Updates on the Splunk Community!

Prove Your Splunk Prowess at .conf25—No Prereqs Required!

Your Next Big Security Credential: No Prerequisites Needed We know you’ve got the skills, and now, earning the ...

Splunk Observability Cloud's AI Assistant in Action Series: Observability as Code

This is the sixth post in the Splunk Observability Cloud’s AI Assistant in Action series that digs into how to ...

Splunk Answers Content Calendar, July Edition I

Hello Community! Welcome to another month of Community Content Calendar series! For the month of July, we will ...