Getting Data In

Parse IIS logs (structured data) on Universal Forwarder

montgomeryam
Path Finder

We are trying to parse or drop a number of fields on IIS Logs from our Exchange environment. I have done as much digging as I could and have found a forum post that tried to answer this exact question, but it is unfortunately not working. The forum post I found is:
https://answers.splunk.com/answers/118668/filter-iis-logs-before-indexing.html
The issue I am running into is when trying to send some IIS fields to either the nullQueue or to an empty string, it simply doesn't work. It should be easy enough to just send the Source Key to either of those, but for what ever reason, I can't get it to work.

Our environment is up to date on all components running 6.4.3.

*SAMPLE IIS log file from MS TechNet (is identical to our log format)*

Software: Microsoft Internet Information Services 8.5

Version: 1.0

Date: 2002-05-02 17:42:15

Fields: date time c-ip cs-username s-ip s-port cs-method cs-uri-stem cs-uri-query sc-status cs(User-Agent)

2002-05-02 17:42:15 172.22.255.255 - 172.30.255.255 80 GET /images/picture.jpg - 200 Mozilla/4.0+(compatible;MSIE+5.5;+Windows+2000+Server)

Here is what I have setup on the UF where the IIS logs originate:
On the UF in $Splunk\etc\system\local
props.conf

[iis_extraction]
TRANSFORMS-throw_some_away=throw_some_away

I want to throw the cs-method field away, i.e. not index that field. So I want to either make it an empty string so Splunk will drop it, or I want to send it to the nullQueue.
On the UF in $Splunk\etc\system\local
transforms.conf

[throw_some_away]
SOURCE_KEY=field:cs-method
REGEX=.
FORMAT=

As a backup plan, we have some options, but it would be nice to be able to do it either here on the Universal Forwarder, or on a Heavy Forwarder before hitting the Indexers. As I understand it, according to the Splunk documentation, all transforms should occur for structured data before getting to the Indexer as INDEXED_EXTRACTIONS bypass the parsing, merging, and typing queues. I can't get that to work. I appreciate all the advice I have received getting to this point, so if you see this and have offered help, I thank you.

If anyone has any advice or clues as to why I am not seeing the parsing of this field, I would greatly appreciate any and all help!

0 Karma
1 Solution

lukejadamec
Super Champion

This must be done on the forwarder, because the parsing will be complete when it leaves the forwarder (the indexer cannot change it).

You need to be sure that field extractions are not done by a Splunk easy button, i.e. sourcetype=iis, or done by field deliminators in props. Field extractions must be done by REPORT-, or possibly EXTRACT-. This is because when field extractions are done by field deliminators in props or by Splunk code they are done at index time and will conflict with the anonymization functions of sedcmd in props and regex in transforms.

FYI, for anyone doing anonymization this is a major problem, because while _raw and the events in the search results show the 'anonymized' value the event Information and Interesting Fields both show the non-anonymized data, ie, the index contains both the anonymized and non-anonymized data and both are easily found by most novice user.

So, for iis logs you cannot use the Splunk sourcetype=iis. This sourcetype invokes the INDEXED_EXTRACTIONS = w3c in system/default/props.conf, which is described very nicely in this Splunk blog post: http://blogs.splunk.com/2013/10/18/iis-logs-and-splunk-6/. The details from the blog post give insight into the problems of indexing iis logs, and the details of the props.conf stanza required to duplicate the automated sourcetype=iis feature.

These are the config files used for testing:
NOTE: The configs for anonymizing sourcetype=iis data are not included because they do not work (see above), but you can modify the stanzas below to see for yourself.

For sourcetype=iis License Volume Testing:
inputs.conf

[monitor://C:\temp\Splunk\test\FilterFields\FFtest.log]
disabled = false
host = Test
index = test
sourcetype = iis

For Custom IIS License Volume Testing:
inputs.conf

[monitor://C:\temp\Splunk\test\FilterFields\LicenseTest5.log]
disabled = false
host = Test
index = test2
sourcetype = iistest2

props.conf

[iistest2]
FIELD_HEADER_REGEX = ^#Fields:\s*(.*)
TIME_FORMAT = %Y-%m-%d %H:%M:%S
TZ = GMT
SEDCMD-dropcsmethod = s/(.*\s\d+\s\w+\s)(\/.*)(\s.*\s\d\d\d\s.*)/\1F\3/g
REPORT-iisFields = REPORT-iisFields2

transforms.conf

[REPORT-iisFields2]
DELIMS = " "
FIELDS = "date","time","c_ip","cs_username","s_ip","s_port","cs_method","cs_uri_stem","cs_uri_querie","cs_status","cs_userAgent"

Large file based on repetition of the event in the iis log example posted in the question with the cs_uri_stem field increased to 120 characters, total size = 12MB.

I checked the licence volume with this search:

  index=_internal source=*license_usage.log type="Usage" splunk_server=* earliest=-1w@d | eval Date=strftime(_time, "%Y/%m/%d") | eventstats sum(b) as volume by idx, Date | eval MB=round(volume/1024/1024,5)| timechart first(MB) AS volume by idx 

The results were:
Licence Volume with custom sourcetype and anonymization (120 characters replaced with 1 F character) of cs-uri-stem = 5.813 MB
License Volume with sourcetype=iis = 11.683 MB

Regarding the sedcmd, the example included above will isolate the cs_uri_stem field with capture groups and change the value of the field to F. This example will isolate the cs_method field and change the value of the field to F:

SEDCMD-dropcsmethod = s/(.*\s\d+\s+)(\w+)(\s+\/.*)/\1F\3/g

View solution in original post

0 Karma

lukejadamec
Super Champion

This must be done on the forwarder, because the parsing will be complete when it leaves the forwarder (the indexer cannot change it).

You need to be sure that field extractions are not done by a Splunk easy button, i.e. sourcetype=iis, or done by field deliminators in props. Field extractions must be done by REPORT-, or possibly EXTRACT-. This is because when field extractions are done by field deliminators in props or by Splunk code they are done at index time and will conflict with the anonymization functions of sedcmd in props and regex in transforms.

FYI, for anyone doing anonymization this is a major problem, because while _raw and the events in the search results show the 'anonymized' value the event Information and Interesting Fields both show the non-anonymized data, ie, the index contains both the anonymized and non-anonymized data and both are easily found by most novice user.

So, for iis logs you cannot use the Splunk sourcetype=iis. This sourcetype invokes the INDEXED_EXTRACTIONS = w3c in system/default/props.conf, which is described very nicely in this Splunk blog post: http://blogs.splunk.com/2013/10/18/iis-logs-and-splunk-6/. The details from the blog post give insight into the problems of indexing iis logs, and the details of the props.conf stanza required to duplicate the automated sourcetype=iis feature.

These are the config files used for testing:
NOTE: The configs for anonymizing sourcetype=iis data are not included because they do not work (see above), but you can modify the stanzas below to see for yourself.

For sourcetype=iis License Volume Testing:
inputs.conf

[monitor://C:\temp\Splunk\test\FilterFields\FFtest.log]
disabled = false
host = Test
index = test
sourcetype = iis

For Custom IIS License Volume Testing:
inputs.conf

[monitor://C:\temp\Splunk\test\FilterFields\LicenseTest5.log]
disabled = false
host = Test
index = test2
sourcetype = iistest2

props.conf

[iistest2]
FIELD_HEADER_REGEX = ^#Fields:\s*(.*)
TIME_FORMAT = %Y-%m-%d %H:%M:%S
TZ = GMT
SEDCMD-dropcsmethod = s/(.*\s\d+\s\w+\s)(\/.*)(\s.*\s\d\d\d\s.*)/\1F\3/g
REPORT-iisFields = REPORT-iisFields2

transforms.conf

[REPORT-iisFields2]
DELIMS = " "
FIELDS = "date","time","c_ip","cs_username","s_ip","s_port","cs_method","cs_uri_stem","cs_uri_querie","cs_status","cs_userAgent"

Large file based on repetition of the event in the iis log example posted in the question with the cs_uri_stem field increased to 120 characters, total size = 12MB.

I checked the licence volume with this search:

  index=_internal source=*license_usage.log type="Usage" splunk_server=* earliest=-1w@d | eval Date=strftime(_time, "%Y/%m/%d") | eventstats sum(b) as volume by idx, Date | eval MB=round(volume/1024/1024,5)| timechart first(MB) AS volume by idx 

The results were:
Licence Volume with custom sourcetype and anonymization (120 characters replaced with 1 F character) of cs-uri-stem = 5.813 MB
License Volume with sourcetype=iis = 11.683 MB

Regarding the sedcmd, the example included above will isolate the cs_uri_stem field with capture groups and change the value of the field to F. This example will isolate the cs_method field and change the value of the field to F:

SEDCMD-dropcsmethod = s/(.*\s\d+\s+)(\w+)(\s+\/.*)/\1F\3/g

View solution in original post

0 Karma

montgomeryam
Path Finder

Ok... I have put this through the wringer and this is definitely the way to do this on the Universal Forwarders. Huge thanks to @lukejadamec for the time and effort to show me the light!

I think the point to take away here is what @lukejadamec pointed out by identifying that all of this has to happen exclusively of the Splunk Magic buttons. By using those magic buttons, no further parsing or filtering can happen to those events. They seem to skip every queue even on the originating Universal Forwarder. For that reason, still relying on the old way of defining and thus extracting log fields allows for one to filter and/or drop the desired fields to save on licensing.

Thanks again for everyone's help! I hope to pay it forward in kind!

0 Karma

lukejadamec
Super Champion

Ha. Your question moved an outstanding requirement of mine up in the queue so to speak. Thanks for validating the results!

0 Karma

yannK
Splunk Employee
Splunk Employee

As the IIS are using indexed_extractions=IIS in props.conf on the forwarder, you cannot reparse them on the indexers.
Also many fields are parsed as indextime parsed fields, so if you modify the raw events, the fields may still be indexed (to be tested), so the size saving on the indexes may be partial.
Are you trying to remove an event, or a field, or edit the raw data ?

You can put your parsing on the forwarder directly using props/transforms
- nullQueue is good to delete a full event (see other answers)
- sedcmd allows you to rewrite the _raw event, so you can do a special replacement to remove one string or a location.
see https://docs.splunk.com/Documentation/Splunk/latest/Admin/Propsconf**strong text**
( you can test at search time with | eval before=_raw| rex mode=sed "s/removeme//g" | table before _raw

However as mentioned above, the indexed_extractions may have already generated the indexed fields, so they may still exists in the index, even if you modified the _raw data.

0 Karma

montgomeryam
Path Finder

Good info so thank you for your time!

To answer your question, we are trying to remove all data that would be extracted to a specific iis log field. So in this example, I would want all cs-method data that could ever exist to simply be dropped. We don't care how, either format it to blank, or send it to the nullQueue, or any other easy solution. @amrit had proposed the solution I originally posted in response to someone asking the same question. He alluded to the Universal Forwarder being able to simply format the desired iis field to blank which in return would be dropped by splunk as a blank value. This would in turn save us on unnecessary licence usage. The link to that post by @amrit is here. (It won't let me post links yet... https://answers.splunk.com/answers/118668/filter-iis-logs-before-indexing.html)

Testing the sedcmd that you posted, I am able to drop specific values in the _raw table and that is cool, but I don't want to have to hard code each value as I won't possibly ever know what we will see when it comes to the other iis fields like EASCmd. To hardcode those would be way too difficult. Is there a way with that sedcmd to do an extracted field? I tried this, but no love...
index=metrics_test_iis | eval before=_raw | rex mode=sed "s/cs-method//g" | table before _raw
as well as
index=metrics_test_iis | eval before=_raw | rex mode=sed "s/field:cs-method//g" | table before _raw

Do you think it would be better or easier to drop these on an intermediate Heavy Forwarder instead? If I don't get anywhere on the UF, I will point my attention towards a sedcmd workflow on an intermediate HF.

Thanks again!

0 Karma

montgomeryam
Path Finder

Found another interesting Splunk Answers post looking to do the same.
https://answers.splunk.com/answers/386805/filter-client-ip-from-iis-75-logs.html

Down towards the bottom, @lguinn from Splunk points out that at this point of the parsing process we can't use the field names. She then said that she was going to look up a way to do this, but then never responded. This all happened back in April so I doubt she will respond to my post there.

If someone has any idea what we could use to filter by at this stage of the parsing process, it would be of immense help!

0 Karma

lukejadamec
Super Champion

I've got the solution for cs_method and any other field that can be singled out by sedcmd with capture groups. I've tested both license usage and anonymization quality for iis logs. I'm writing the answer now, stand by....

0 Karma

lukejadamec
Super Champion

I'm confused. I'm pretty sure nullQueue is for events, not fields. You use the field values to decide which events to sent to nullQueue.

Have you tried:

 [throw_some_away]
 SOURCE_KEY=field:cs-method
 REGEX=.
 DEST_KEY=queue
 FORMAT=nullQueue

This should drop all events that have a cs-method field that has any value.

0 Karma

montgomeryam
Path Finder

Unfortunately this won't work for us either as all of our logs have a cs-method value so effectively, by sending them to the nullQueue in this manner, we would be throwing away every event log file because they would match every event as cs-method would have something as a value.

We simply want to format or ensure that certain fields that are being extracted in IIS logs don't get sent to the indexers. The reason being, in some of our Exchange logs we have gigantic field values such as EASCmd which reports back what command was issued. That log entry is cool, but we don't want it to be indexed in order that we can save on license. I simply chose cs-method as it's an easy field to try and test exactly how to parse or drop it.

0 Karma

JDukeSplunk
Builder

We basically use the methods described here

http://blogs.splunk.com/2013/10/18/iis-logs-and-splunk-6/

Might you be able to simply blacklist cs-method in the inputs, once the field is identified and named?

blacklist1=cs-method=*

0 Karma

JDukeSplunk
Builder

I remember now. I believe that blacklist works for the WinEvent log as well as it does due to the nature of the TA_Windows APP. I think that generally in inputs.conf blacklist only applies to filenames. Might be worth a look at that app to see how it enables event-level blacklisting.

So your method of using props and transforms would be something like this. Being that cs-method is generally GET POST HEAD PUT DELETE something like this might work.

HOWEVER. I think doing this will nullQueue the entire event because of the match.

[iis*]
TRANSFORMS-set = dropcsmethod

Then I have /etc/system/local/transforms.conf like this:

[dropcsmethod]
REGEX = (GET|POST|HEAD|PUT|DELETE)
DEST_KEY = queue
FORMAT = nullQueue

I used something similar to this to nullQueue traffic with loadbalancer noise in it, and it /dev/null's the whole line.

So..I am converting my answer to a comment.. and watching for someone smarter to come along and bang out a 1 line answer for you.

0 Karma

montgomeryam
Path Finder

I see where you are going here, but by using this method, you would be dropping the entire log entry. I only want the field to be dropped, or at least set to a blank or no value format so that it won't be indexed.

0 Karma

lukejadamec
Super Champion

Your last comment brought to mind the Splunk ability to 'Anonymize' incoming data so that it is not indexed (like passwords). I think this is what you're looking for.

http://docs.splunk.com/Documentation/Splunk/6.4.3/Data/Anonymizedata

0 Karma

montgomeryam
Path Finder

I did think about that for a hot minute, but thought the complexity would be too much of barrier. I think you are right in the fact that I might have to figure out a way to format the data through this type of process.

I really do appreciate the time and effort by the way! This is a cumbersome use case, and the support from y'all has been awesome.

I will report back after I try this technique out.

0 Karma

montgomeryam
Path Finder

Appreciate the help!

In regards the blog you posted, we are using that same method in order to extract the different fields, but I didn't see a way in that post about how to parse or drop the extracted fields in that post.

Unfortunately no love from the blacklist. I tried a few different stanzas like:
blacklist1=cs-method=*
blacklist1=cs-method=.
blacklist1=cs-method=(.)

I have feeling that I am missing something due to the formatting that is happening on the indexing queue. I don't honestly know how to look into what's happening in those different queues to see if we are just missing the correct formatting or not.

0 Karma
Did you miss .conf21 Virtual?

Good news! The event's keynotes and many of its breakout sessions are now available online, and still totally FREE!