Getting Data In

Filter Logs

SalahKhattab
Loves-to-Learn Lots

I have XML input logs in Splunk.

I have already extracted the required fields, totaling 10 fields.

I need to ensure any other fields that are extracted are ignored and not indexed in Splunk.

Can I set it so that if a field is not in the extracted list, it is automatically ignored?

Is this possible? 

Labels (1)
0 Karma

SalahKhattab
Loves-to-Learn Lots

Okay, got it.

One last thing: is there any regex to check if any field not in the extracted list can be ignored from indexing?

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @SalahKhattab ,

no, it's the opposite: you have to define only the regex extractions for the fields you want, and the others will not be extracted (always if you didn't defined INDEXED_EXTRACTIONS=XML).

let me know if I can help you more, or, please, accept one answer for the other people of Community.

Ciao and happy splunking

Giuseppe

P.S.: Karma Points are appreciated 😉

0 Karma

SalahKhattab
Loves-to-Learn Lots

 

 


Sorry, I didn’t quite get your point. Let me clarify.

For example, if this is my data: 

<Interceptor> <AttackCoords>-423423445345345.10742916222947</AttackCoords> <Outcome>2</Outcome> <Infiltrators>20</Infiltrators> <Enforcer>2</Enforcer> <ActionDate>2-04-24</ActionDate> <ActionTime>00:2:00</ActionTime> <RecordNotes>test</RecordNotes> <NumEscaped>0</NumEscaped> <LaunchCoords>-222222</LaunchCoords> <AttackVessel>111</AttackVessel> </Interceptor>

 

I want to extract only ActionDate and RecordNotes and ignore all other fields during ingestion. This way, the data will be cleared of unnecessary fields. In transforms.conf, I aim to create a regex pattern for ActionDate and RecordNotes to filter out other fields, making the resulting data look like this:

<Interceptor> <ActionDate>2-04-24</ActionDate> <RecordNotes>test</RecordNotes> </Interceptor>

How can I achieve this?

Tags (1)
0 Karma

PickleRick
SplunkTrust
SplunkTrust

Manipulating structured data with regexes is not a very good idea. It would be better to use an external tool to clean up your data before ingesting.

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @SalahKhattab ,

read the above link for anonymizing, you'll find the use of SEDCMD in props.conf to remove part of your logs:

SEDCOMD_reduce_fields = s/<Interceptor>(.*)\<ActionDate\>2-04-24\<\/ActionDate\>(.*)\<RecordNotes\>test\<\/RecordNotes\>(.*)\<\/Interceptor\>/<Interceptor\>\<ActionDate\>2-04-24\<\/ActionDate\>\<RecordNotes\>test\<\/RecordNotes\>\<\/Interceptor\>/g

that you can test at https://regex101.com/r/fIpO23/1 

Ciao.

Giuseppe

0 Karma

SalahKhattab
Loves-to-Learn Lots

 

Hello Giuseppe,

In my case, the goal is to ensure that the data is cleaned before indexing.

For instance, if the data is:

<test>dasdada</test><test2>asdasda</test2>

I only need the data for the <test> field, and I don’t want the <test2> field to appear. Additionally, there are many fields that I don’t require, so creating a regex for each unwanted field to remove it with SEDCMD or a blacklist would be challenging.

Is there a way to delete fields that aren’t extracted from the log before indexing?

 

 

 

 

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @SalahKhattab ,

if you want to avoid to index a part of data, the job is more complicated because the only way is the approach to anonymize data (https://docs.splunk.com/Documentation/Splunk/9.3.1/Data/Anonymizedata).

In other words, you should delete some parts of your logs before indexing.

Why do you want to do this: ro save some license costs or to avoid that some data are visible?

If you don't have one of the above requirements, I hint to index all the data, because the removed data could be useful for you.

Ciao.

Giuseppe

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @SalahKhattab ,

unless you extracted your fields at index time, fields are extracted at search time, so all the fields that you configured will be extracted.

I suppose that you extracted the fields using INDEXED_EXTRACTIONS=XML, in this case all the fields you have are extracted at search time and this doesn't consume storage or memory.

It's different is you use regex extractions and not INDEXED_EXTRACTIONS=XML, in this case, only the configured fields are extracted.

Why is so mandatory for you that the other fields aren't extracted?

Ciao.

Giuseppe

0 Karma
Get Updates on the Splunk Community!

Observability Release Update: AI Assistant, AppD + Observability Cloud Integrations & ...

This month’s releases across the Splunk Observability portfolio deliver earlier detection and faster ...

Stay Connected: Your Guide to February Tech Talks, Office Hours, and Webinars!

&#x1f48c;Keep the new year’s momentum going with our February lineup of Community Office Hours, Tech Talks, ...

Preparing your Splunk Environment for OpenSSL3

The Splunk platform will transition to OpenSSL version 3 in a future release. Actions are required to prepare ...