Getting Data In

Possible to define a sub-sourcetype?

Motivator

We are ingesting IIS logs in json format as we are adding some additional fields to the log file that contain information we need to pull. However, IIS uses the W3C format in which the fields are pre-defined as follows:

Fields: date time s-ip cs-method cs-uri-stem cs-uri-query s-port cs-username c-ip cs(User-Agent) sc-status sc-substatus sc-win32-status time-taken

These key/value pairs reside in the 'event' key:

{"EventReceivedTime":"2017-02-21 08:00:20","SourceModuleName":"EWIPRD","SourceModuleType":"im_file","FileName":"L:\\Logs\\W3SVC1\\u_ex170221.log","SiteId":"1","WebServer":"<servername>","Event":"2017-02-21 13:00:00 x.x.x.x POST /autodiscover/autodiscover.xml - 443 - x.x.x.x Microsoft+Office/16.0+(Windows+NT+6.2;+Microsoft+Outlook+16.0.7571;+Pro) - 301 0 0 0"}

Is it possible to define the 'event' key [auto_kv_for_iis_default] in transforms.conf as below:
DELIMS = " "
FIELDS = date time s-sitename s-ip cs-method cs-uri-stem cs-uri-query s-port cs-username c-ip cs(User-Agent) sc-status sc-substatus sc-win32-status

Thx

SplunkTrust
SplunkTrust

Okay, so let's level-set here. You have IIS logs, arriving into HDFS via some mechanism that you are then searching via Hunk. The IIS events are basic w3c formatted strings inside of a JSON wrapper. Your goal is to extract the "sub-fields" from within the larger field.

Let's talk about some things that won't work.

First, INDEXED_EXTRACTIONS won't work because Splunk isn't actually indexing your data. Data that is picked up via virtual indexes using Hunk does not do INDEXED_EXTRACTIONS (because it's never actually indexed by Splunk)

Second, SOURCETYPE_CLONE won't work either, because same issue. You're not indexing the data.

Third, while there is a naming convention of vendor:product:logtype that is common, this is only a naming convention. There is no hierarchical configuration support for this.

Fourth, while sourcetype renaming is a useful feature I don't think it actually does what you want to do here.


I've got an idea that may work, watch this space. UPDATE!

Thanks @dshpritz for solving one final bugaboo for me. Turns out you can do DELIMS extraction inside of existing fields. But, those fields have to have been brought to life using a REPORT or an EXTRACT - JSON and other auto-kv things don't bring the field to life in time.

So assuming we make a new sourcetype for this. It doesn't match any of the existing sourcetypes well.

[foo]
EXTRACT-foo = "Event":"(?<Event>[^"]+)"
REPORT-foo = foobarbaz

So our new foo sourcetype uses a regex based extraction to get Event from the JSON data, treating it as a quoted string. Then our REPORT uses DELIMS on that new field to extract what needs to be extracted from it.

[foobarbaz]
SOURCE_KEY = Event
DELIMS = " "
FIELDS = date,time,s-sitename,s-ip,cs-method,cs-uri-stem,cs-uri-query,s-port,cs-username,c-ip,cs(User-Agent),sc-status,sc-substatus,sc-win32-status

Note this is terrible. I'm glad it works, but I'm made sad by its necessity. In an environment where a forwarder is collecting IIS logs, you'd be far better off to let INDEXED_EXTRACTIONS handle the IIS sourcetype natively. But, in your situation where collection is happening elsewhere and everything you have to do is based whollly on search time via Hunk, this may be the best you can get.

Motivator

Thx for the reply and info.

Correct - IIS events are basic w3c formatted strings inside of a JSON wrapper as we're using nxlog on the servers to forward the IIS logs (and as I said, add some addition k/v pairs for additional info).

Duly noted on INDEXED_EXTRACTIONS as I'll remove that line from the stanza.

Looking forward to your idea.

Thx

0 Karma

SplunkTrust
SplunkTrust

My first question would be "how did you collect w3c inside of JSON?" Normally, we'd allow a forwarder to pick up w3c files directly and use INDEXED_EXTRACTIONS=w3c and that'd be the end of it.

Esteemed Legend

Cisco, Palo Alto (and others) do this by using colons in the soucetype, such as cisco:esa:textmail, cisco:wsa:squid, pan:wildfire_report, pan:newapps, pan:logs, etc.

Then when searching, you can do stuff like sourcetype=pan:* or sourcetype=cisco:*mail, etc.

You use rename to update the sourcetype at search-time to something new, when taking a second pass at parsing your stuff.

As far as the other part of your question (adding extra fields), anything can be done SO LONG AS the values that you are adding are inside the raw event. If the data is not inside the raw event, you will need to add the data using SEDCMD so that it is inside the raw event. Fields (at index time) always point to data inside _raw.

So I think that all the nuts-and-bolts are there for you and if I understood you better, I might be able to assemble them for you.

It is a bad idea to allow Splunk to sourcetype your stuff for you; you should always explicitly set the sourcetype.

Motivator

Thx for the reply and information.

technically, i can rename the sourcetype as we're ingesting these logs into Hunk, so everytime we start to ingest a new log source I have to define the sourcetype in /opt/splunk/etc/apps/search/local/props.conf.

Here is the stanza for our IIS logs:

[source::/LogCentral/IIS/EWI_PRD/...]
sourcetype = _json
INDEXED_EXTRACTIONS = JSON

Is it possible to rename the sourcetype from _json to something else like ms:iis:default with still keeping the automated field extractions via json?

My worry is renaming the sourcetype from _json to ms:iis:default, I'll wipe out the automated field extractions unless the INDEXED_EXTRACTIONS = JSON is what will keep the automated extractions in place even though the sourcetype will be set to something other than _json.

Thx

0 Karma

Esteemed Legend

Yes, the rename is a search-time alias. Read about it here:
http://docs.splunk.com/Documentation/Splunk/latest/admin/propsconf

In particular:

# The following attribute/value pairs can only be set for a stanza that
# begins with [<sourcetype>]:

rename = <string>
* Renames [<sourcetype>] as <string> at search time
* With renaming, you can search for the [<sourcetype>] with sourcetype=<string>
* To search for the original source type without renaming it, use the field _sourcetype.
* Data from a a renamed sourcetype will only use the search-time configuration for the target sourcetype. Field extractions  (REPORTS/EXTRACT) for this stanza sourcetype will be ignored.
* Defaults to empty.
0 Karma

Contributor

Not sure if I understand what you're trying to do exactly. The subject of your post mentions creating a sub-sourcetype (which I don't think you can do), but I see you mentioning extracting additional fields from a key. Are you trying to extract fields from the values of that other field? Or are you trying to separate events like that into another area (like an eventtype)?

0 Karma

Motivator

I'm trying to extract fields from the 'event' key in an automated way as I created a regex to extract the IIS fields

So "Event" contains the following information in my example:

"2017-02-21 13:00:00 x.x.x.x POST /autodiscover/autodiscover.xml - 443 - x.x.x.x Microsoft+Office/16.0+(Windows+NT+6.2;+Microsoft+Outlook+16.0.7571;+Pro) - 301 0 0 0"}`

which breakdown as IIS key/value pairs:
Date - 2017-02-21
Time - 13:00:00
s_ip - x.x.x.x
cs_method - POST
cs-uri-stem - /autodiscover/autodiscover.xml
cs-uri-query -
s-port - 443
cs-username -
c-ip - x.x.x.x
cs(User-Agent) - Microsoft+Office/16.0+(Windows+NT+6.2;+Microsoft+Outlook+16.0.7571;+Pro)
cs(Referer) -
sc-status - 301
sc-substatus - 0
sc-win32-status - 0
time-taken - 0

If we weren't ingesting these files as json, I'd simply modify the transforms.conf file as follows:

[auto_kv_for_iis_default]
DELIMS = " "
FIELDS = date time s-sitename s-ip cs-method cs-uri-stem cs-uri-query s-port cs-username c-ip cs(User-Agent) sc-status sc-substatus sc-win32-status

Wasn't sure if it possible to do another field extraction from within an event that has already had its sourcetype defined. Perhaps the regex I have extracting field names at search is the way to go.

Thx

0 Karma

Contributor

Oh, ok, so you were trying to create field extractions without specifying them in a search string? If yes, you can always add them either via the gui at Settings > Fields > Field extractions and specify your sourcetype there, or you can do it manually in props.conf for the sourcetype specified for each event. Is that what you were asking?

0 Karma

Motivator

Kind of.

The events are set to the sourcetype of json already, so before I created the regex to extract fields at searchtime from the key/multivalue field "Event", some fields were already being extracted automatically, such as "SourceModuleName", "SourceModuleType", "FileName" ,"SiteId" ,"WebServer", and "Event".

What I thought might be possible would be to define the key/multivalue field"Event" ala
http://docs.splunk.com/Documentation/AddOns/released/MSIIS/Setupaddon (Perform additional steps for search-time field extraction) by modifying the transforms.conf file, but wasn't sure how to extract the fields/values from the already defined "Event" field.

Thx

0 Karma

Contributor

It shouldn't matter if the "Event" field is parsed at index time or search time - there shouldn't be an issue creating additional field extractions on the search head for any sourcetype or even the same events, even if the fields were already extracted. New search time field extractions can be made from raw events where there are configuration files with stanzas already parsing fields (either from the indexer or search head) with same or similar values from the same events.

If you want to create new key/multivalue fields from the field "Event" within the same sourcetype where you aren't specifying the regex in the search string, you can do that by going to Settings > Fields > Field extractions and specify your sourcetype there, or you can do it manually in props.conf in the app. Your fields would be extracted "automatically" instead of you having to specify it during search time.

If you want to create new key/multivalue fields from the events (in the json sourcetype) and have those specific events sent to another sourcetype, you may want to explore cloning the events to a different sourcetype of your choosing in order to run your own custom field extractions for that sourcetype. This can be accomplished using CLONE_SOURCETYPE in transforms.conf. Basically, it just clones the data from that sourcetype into another sourcetype for you to play with. You would need to do this on your indexers and make sure to specify the appropriate stanzas in both your props.conf and transforms.conf

http://docs.splunk.com/Documentation/Splunk/latest/admin/propsconf
http://docs.splunk.com/Documentation/Splunk/latest/admin/transformsconf

If I misunderstood you a third time, please let me know.

0 Karma
State of Splunk Careers

Access the Splunk Careers Report to see real data that shows how Splunk mastery increases your value and job satisfaction.

Find out what your skills are worth!