Getting Data In

SEDCMD not actually replacing data during indexing

DaClyde
Contributor

In the past, I have used SEDCMD statements in my props.conf to remove text and whole lines from events so they would not clog our indexes and clutter the search results wiht extraneous text.

However, today, I tried to set up an SEDCMD string to remove part of a line in our IIS logs, and it appeared to work, but somehow the actual effect is that it just removed the text from the regular search result, but if you expand the actual event, where you can see the 'Event Actions' button and all the fields, the string we were trying to remove is still available and searchable (still shows in the Interesting Fields).

Here is my script:

SEDCMD-Auth = s/Basic+.*//

I am trying to remove the content of the Authorization field that has been added to our IIS logs, along with everything else to the end of the line after that field.

This is a script I used earlier to remove a line from a Windows Event Log, and it is working fine:

SEDCMD-EventType = s/EventType=4\r\n//

What is the difference (aside from not removing a CR/LF)? What did I do wrong in the first one to cause Splunk to not actually prevent the data from being indexed? These are both in the etc/system/local/props.conf on our indexer, and we only use light forwarders.

0 Karma
1 Solution

woodcock
Esteemed Legend

You probably have an inputs.conf on your Universal Forwarder like this:

[monitor://C:\inetpub\logs\LogFiles\W3SVC1]
sourcetype=iis

This works because IIS is a pre-trained sourcetype and if you go look in $SPLUNK_HOME/etc/system/default/props.conf you will see something that looks like this:

[iis]
...
INDEXED_EXTRACTIONS = w3c

In previous versions of Splunk it used CHECK_FOR_HEADER = True but because the order and number of fields that IIS logs can be modified at any time, splunk replaced this with the stanza INDEXED_EXTRACTIONS = w3c. However, this new feature works very differently and unfortunately for you, this "easy button" creates index-time fields at a time BEFORE the SEDCMD code runs. The only thing that you can do is either pre-process the logs before the forwarder gets them, or revert to the older method of handling sourcetype iis which happens after SEDCMD runs. Just download a version 6.* Splunk and do it that way (I just showed you what file to check earlier in this answer).

View solution in original post

0 Karma

woodcock
Esteemed Legend

You probably have an inputs.conf on your Universal Forwarder like this:

[monitor://C:\inetpub\logs\LogFiles\W3SVC1]
sourcetype=iis

This works because IIS is a pre-trained sourcetype and if you go look in $SPLUNK_HOME/etc/system/default/props.conf you will see something that looks like this:

[iis]
...
INDEXED_EXTRACTIONS = w3c

In previous versions of Splunk it used CHECK_FOR_HEADER = True but because the order and number of fields that IIS logs can be modified at any time, splunk replaced this with the stanza INDEXED_EXTRACTIONS = w3c. However, this new feature works very differently and unfortunately for you, this "easy button" creates index-time fields at a time BEFORE the SEDCMD code runs. The only thing that you can do is either pre-process the logs before the forwarder gets them, or revert to the older method of handling sourcetype iis which happens after SEDCMD runs. Just download a version 6.* Splunk and do it that way (I just showed you what file to check earlier in this answer).

0 Karma

DaClyde
Contributor

That is indeed how we are set up. I will test the old way and see what happens.

woodcock
Esteemed Legend

Thanks for coming back to Accept. How did you end up handling it? There is another new feature called INGEST_EVAL that might be able to help you here. I believe that it can modify index-time values in the manner that you require. Be aware of the difference between INGEST_EVAL = and INGEST_EVAL :=.

0 Karma

DaClyde
Contributor

On my test system, we went with the older method, which worked, but ultimately I was able to convince my superiors that including the extra data in the log in the first place was a security risk, and we were able to just stop sending the information to the log (it was literally a hashed version of the credentials, and was only being logged because someone felt they had to check a box in a security checklist). As a result, the problem is no more.

But now I am curious to see what INGEST_EVAL can do for us.

0 Karma

DavidHourani
Super Champion

Hi @DaClyde,

Have you found an answer to your question yet ? Are you using index time extractions ? this could be why your raw data is not showing events anymore but the fields are still saved.

0 Karma

DaClyde
Contributor

I have not, and yes, we are using index time extractions. Do you have any suggestions? My confusion is that one SED script is working fine, but the other is not, and they are in the same props.conf on the indexer.

My short term, security by obscurity solution was to create a bogus field alias for the Authorization field, and that at least prevents the field from being searchable, but the data it still appears in the raw event in the index.

0 Karma

somesoni2
Revered Legend

Are you looking for literal "Basic+" (the plus sign is also literal string)? If yes, then try this

 SEDCMD-Auth = s/Basic\+.*//
0 Karma

DaClyde
Contributor

Ah, that could be it. There is a literal plus sign in the string. I will add that and see what happens.

0 Karma
Get Updates on the Splunk Community!

Automatic Discovery Part 1: What is Automatic Discovery in Splunk Observability Cloud ...

If you’ve ever deployed a new database cluster, spun up a caching layer, or added a load balancer, you know it ...

Real-Time Fraud Detection: How Splunk Dashboards Protect Financial Institutions

Financial fraud isn't slowing down. If anything, it's getting more sophisticated. Account takeovers, credit ...

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...

 Are you tired of troubleshooting delays caused by siloed frontend, application, and network data? We've got a ...