Hi. I'm brand new to using Splunk and just downloaded the Splunk Light trial.
I've followed the tutorial video for setting up Data Inputs, and then set up one of my own. While I can see that the data is getting into Splunk, the search results don't appear to be parsed the way I expect. There's probably a very simple answer that experienced users will know, but to me it's a mystery. Can someone please point me in the right direction?
Here's what's happening:
I'm bringing in IIS 7.5 logs in default W3C format. Due to the way our product works, I am sending these logs to Splunk over a TCP Data Input, whose sourcetype I have explicitly set to "iis". Note that this is just the "raw" IIS data without any syslog headers, for example. The logs are received by Splunk correctly, but when I search, the only "Interesting Fields" I see are generic stuff like dates, but none of the IIS specific fields. Otherwise, the search results look fine (no weird parsing problems, other than the fact that the IIS specific stuff isn't there).
As a test, I imported the exact same IIS log files into Splunk using the "Files & directories" input, for which I also explicitly set the sourcetype to "iis". This time the search shows exactly what I would expect, in which the "Interesting Fields" shows IIS specific stuff like "cs_uri_stem", "s_ip", etc.
Is there some step I'm missing so that the data I send over the TCP data input is recognized as IIS data when I search for it? I would have though that setting the "sourcetype" to "iis" would be enough, but maybe not?
(For what it's worth, I have to get the TCP method working. Using the "Files & directories" isn't an option I can use in this situation.)
Thanks for your help.
Ah, I think I just realized something important. IIS is a type of log which, similarly to csv, comes in this format:
# something microsoft iis # version 1 # fields: date time level keyA keyB 2015-05-05 10:10:10 INFO valueA valueB
This works fine with splunk as long as you index the file, because then splunk knows which key-value pairs there are based on the header of the file. Obviously, this does not work just as easy with a tcp stream, as there's no way to determine the keys from just the data (i.e. how would you determine what INFO, valueA and valueB are?) See here for a little background on IIS and splunk.
Therefore, to solve your problem, you need to explicitly tell splunk which keys to use for the incoming tcp data stream. That means you need to go to your %SPLUNK_HOME/etc/system/local folder, check for a
props.conf and create it if not there already, and add the following lines to the file:
[iis-stream] FIELD_NAMES = list-of-your-headers-here
where the list of your headers is the one after # fields in your sample IIS file with commas inbetween, such as date,time,level,keyA,keyB for the above example. This will explicitly tell splunk in which order the incoming stream contains which fields. Now all you'll have to do is assign the incoming tcp stream the new sourcetype iis-stream and restart your splunk.
I'm not sure this is all you need to do though; maybe you need further settings for the new sourcetype, such as
TIME_FORMAT = %Y-%m-%d %H:%M:%S or
TIMESTAMP_FIELDS = date,time. Those would go into the props.conf under your [iis-stream] stanza as well.
As you see, this should be possible, but it's not as easy as reading a file. I would advise that if possible somehow, you change your method to ingest this data to file reads. For one thing, the settings are easier. You also get the advantage of having the files on disk act as a buffer, so that while your splunk is down, your logs don't get lost. Furthermore, you can more easily read more than one file with different headers this way.
You mentioned that "Due to the way our product works, I am sending these logs to Splunk over a TCP Data Input" - I'm curious to know why that is.
Thanks for the detailed reply. I'll give this a try just as soon as I can and let you know how it goes.
The reason for sending the data via TCP is because we're already collecting the IIS log files from multiple different machines, doing some processing, and then forwarding them on to various destinations, such as ArcSight or syslog consumers. I'm adapting the component that was originally developed to do that task, just using Splunk as the destination instead. If it's particularly problematic to do that, then I realize that we could use the file system, but I was hoping to avoid it and just make the necessary tweaks to the TCP component.
Thanks again for your help and I'll let you know how it goes.
So I tried creating a props.conf file and re-inputting the data via the TCP connection, but it's still not parsing the IIS data apparently.
The first try was exactly as you suggested, with props.conf looking like:
[iis-stream] FIELD_NAMES = date,time,s-ip,cs-method,cs-uri-stem,cs-uri-query,s-port,cs-username,c-ip,cs(User-Agent),sc-status,sc-substatus,sc-win32-status,time-taken
After running a search, here are the "Interesting Fields". Notice that the IIS-specific stuff is still missing:
# date_hour 2 # date_mday 1 # date_minute 2 a date_month 1 # date_second 2 a date_wday 1 # date_year 1 a date_zone 1 a index 1 # linecount 3 a punct 8 a splunk_server 1 # timeendpos 2 # timestartpos 2
I then various additions to props.conf, such as time format and time stamp fields, but that didn't work either. Eventually, I had the props.conf looking like this, mostly out of desperation (based on this reference: http://blogs.splunk.com/2013/10/18/iis-logs-and-splunk-6/):
[iis-stream] FIELD_DELIMITER = whitespace FIELD_HEADER_REGEX = ^#Fields:\\s*(.*) MISSING_VALUE_REGEX = - TIME_FORMAT = %Y-%m-%d %H:%M:%S TZ = GMT TIMESTAMP_FIELDS = date,time
And the resulting "Interesting Fields" were even worse...
a index 1 # linecount 3 a punct 7 a splunk_server 1 a timestamp 1
These are the Interesting Fields that I'm trying to get. (This is from when I import the data via the "Files & directories input.)
a c_ip 1 a cs_method 1 a cs_uri_stem 3 a cs_User_Agent 1 a date 2 # date_hour 1 # date_mday 2 # date_minute 3 a date_month 1 # date_second 7 a date_wday 2 # date_year 1 # date_zone 1 a index 1 # linecount 1 a punct 3 a s_ip 1 # s_port 1 # sc_status 4 # sc_substatus 2 # sc_win32_status 2 a splunk_server 1 a time 7 # time_taken 6
So, it looks like I still don't have the right configuration in place for Splunk to recognize the IIS data coming across the TCP connection. Hmm...
Ah, there was one sublte thing missing from the idea above: splunk doesn't recognize whitespaces as default field delimiters, so it should work with the following settings:
[iis-stream] FIELD_NAMES = date,time,s-ip,cs-method,cs-uri-stem,cs-uri-query,s-port,cs-username,c-ip,cs(User-Agent),sc-status,sc-substatus,sc-win32-status,time-taken FIELD_DELIMITER = whitespace TIMESTAMP_FIELDS = date,time
This should tell splunk which fields there are (FIELD_NAMES), and how to distinguish the actual fields (FIELD_DELIMITER).
I tried the most recent [iis-stream] that you suggested but still no luck at all when trying using it with a TCP stream. What I get for "Interesting Fields" is just
a index 1
# linecount 2
a punct 4
a splunk_server 1
a timestamp 1
I think I'm going to abandon the idea of getting the TCP thing to work for now. You're clearly right about the "File & directories" input handling this much better and for the time being it's not worth it to pursue the TCP thing further at this point. I can see why the [iis-stream] properties that you recommended SHOULD work, but not sure why they don't.
Thanks for your valuable input.
Oh, before you do that - have you only looked at the "interesting fields"? Maybe the fields are there, just not in that category. Try clicking either the "All Fields" button above the field list or the "X more fields" below the list of fields, and then make sure that to select "All fields" at the top to the left of the filter field.
Thanks for pointing out the "All Fields" option. I hadn't looked there. Unfortunately, the IIS-specific fields don't appear there either. There are just 8 fields total there, the 5 I mentioned before (index, linecount, etc.), plus host, source, and sourcetype. I'll keep the "All Fields" in mind though from now on whenever I think something may be missing.
By the way, I assume you have checked that your data coming in over tcp is exactly in the format you specified your headers in and doesn't change? Because you can't diverge from them once you set
As a different solution, you can always use search time field extractions to get your fields. If you haven't already changed to file reads, you might want to have a look at how that works here and see if that's an option for you.
Thanks for the suggestions. The TCP data was in the exact format that I specified in the FIELD_NAMES during this particular test, but still wasn't working. However, your point is well taken because for our particular application, the TCP data would (potentially) be sending records from IIS sites which heterogeneous W3C logging fields specified, so this solution just wouldn't work anyway, since any given record may not match the FIELD_NAMES spec.
What I've decided on for now is using the Splunk CEF add-on for Splunk Enterprise, and formatting the data we send over TCP (or using a file) into CEF format. I have successfully tested this both with and without CEF headers for TCP and File & Destination inputs. Of course, the "Interesting Fields" you get with this method are the CEF fields with their IIS data mapped to them, rather than the "raw" IIS fields that you see when you specify "iis" as the sourcetype, but I think this is a reasonable solution for our use case.
Thanks again for your help.
The sourcetype is set to "iis" for the TCP data input. Specifically, in the Splunk Light web interface, under the properties for my TCP data input, I have these settings:
Set sourcetype: Manual
Source type: iis
Ah, ok...thanks for the clarification.
So, when I run a search of the data that was input from my TCP input (i.e., the data which is not parsing as expected), and I look under "Selected Fields" at "sourcetype", there is a single value for all of the records and it is "iis". I guess that's because that's what I assigned as the sourcetype on the TCP input's properties. However, it still doesn't seem to parse as if it were IIS data when I view the search results (i.e., the "Interesting Fields" doesn't show any IIS specific fields).
When I do a search of the same data that was input from the "Files & directories" input, the "sourcetype" under "Selected Fields" is also "iis" (exactly like it is for the TCP input data). However, these search results DO show the IIS specific fields.
Any clues why this may be?
Thanks again for your help.