I am monitoring files that land in the same directory that I wish to be considered as different source types. The way
I want to distinguish them is with their names. There will be three different source types and they will be csv files.
The naming conventions will be
time_*.csv, pulse_*.csv, and flow_*.csv.
I actually have this working using the following in inputs.conf:
[monitor://C:\tpg\leamcsv\dualgamma_logs\...\pulse_*.csv] sourcetype = DGC_PULSE index=main host_segment = 4 crcSalt = <SOURCE> [monitor://C:\tpg\leamcsv\dualgamma_logs\...\flow_*.csv] sourcetype = DGC_FLOW index=main host_segment = 4 crcSalt = <SOURCE> [monitor://C:\tpg\leamcsv\dualgamma_logs\...\time_*.csv] sourcetype = DGC_TIME index=main host_segment = 4 crcSalt = <SOURCE>
This works exactly as I want. The use of crcSalt turns out to be necessary as many of the files have meta information that
is identical and this forces the indexer to consider them all.
As I said, the above works fine as long as the files to be monitored are landed as .csv files. My requirements have changed
and I will now be landing *.zip files containing the desired .csv files.
It is not clear to me why, but splunk is not indexing the zip files using the above configuration. Everything I read would seem
to indicate that it should index the zip files. Perhaps the monitor stanza is excluding the zip files - I haven't been able to figure
that one out.
I can say that if the monitor stanza is left open(
[monitor://C:\tpg\leamcsv\dualgamma_logs\...\]), it will index the contents of the zip files, but that leaves me unable to distingush
the different sourcetypes(at least not in the way that I was doing).
After doing some research I read that attempting to index multiple sourcetypes from a common directory could lead to inconsistent
results(I dont have that link handy at the moment). At any rate, the suggestion was to use a more open qualification as I mentioned
in the previous paragraph and assign the sourcetype on a per event basis or in props.conf. I chose to do this in props.conf. I
am using the following configuration:
[monitor://C:\tpg\leamcsv\dualgamma_logs\...\] index=main host_segment = 4 crcSalt = <SOURCE>
[source::...\pulse_*\.csv] sourcetype=DGC_PULSE [source::...\flow_*\.csv] sourcetype=DGC_FLOW [source::...\time_*\.csv] sourcetype=DGC_TIME
The problem I see now is that none of my expected sourcetypes are assigned. Instead, I get csv, csv1, csv2, etc... for sourcetypes.
I suspect the issue is with my regular expressions I have used in props.conf. From everything I have read, these look like they
are correct, but I haven't been able to figure out what I am missing.
Does any have any suggestions about my approach, and/or what might be wrong with my regular expressions?
[monitor://C:\tpg\leamcsv\dualgamma_logs\...\pulse_*] sourcetype = DGC_PULSE index=main host_segment = 4 crcSalt = <SOURCE>
that would work regardless if they are .zip or .csv
Are they being bundled inside of a single .zip?
[monitor://C:\tpg\leamcsv\dualgamma_logs\...\] sourcetype = DGC_TIME index=main host_segment = 4 crcSalt = <SOURCE>
[transform_name1] SOURCE_KEY = MetaData:Source REGEX = pulse_*\.csv DEST_KEY = MetaData:Sourcetype FORMAT = sourcetype::DGC_PULSE [transform_name2] SOURCE_KEY = MetaData:Source REGEX = flow_*\.csv DEST_KEY = MetaData:Sourcetype FORMAT = sourcetype::DGC_FLOW
[DGC_TIME] TRANSFORMS-transform_name = transform_name1, transform_name2 TIME_FORMAT = timeformat SHOULD_LINEMERGE = false|true
After you do this, you will need to either go to yoursplunkrul:8000/info and click reload EAI Objects where ever these configs are deployed to: UF (will need instance restart), Indexer, ect.
You may even want to restart the instance just for good measure.
Thanks for the input. I tried this and am still getting csv, csv_1, etc for sourcetype. I did splunk clean all on both my splunk instance and my universal forwarder.
I think I understand what you have suggested and it looks very similar to what I was initially trying. Is it substantially different?
I am guessing that it is still failing on the regexes being used.
this is what I am using in transforms.conf:
SOURCEKEY = MetaData:Source
REGEX = pulse*.csv
DESTKEY = MetaData:Sourcetype
FORMAT = sourcetype::DGC_PULSE
and this is what I am using in props.conf:
TRANSFORMS-transformname = transform_name1
I am not sure about this one - not sure about the mapping of the stanza name to sourcetype although I must admit I haven't look at the doc on this yet...
Btw, you can replace transform_name 1,2 with anything you want, I was just using it as a filler name. Just make sure the names get put into the props.conf
Ah, also the last bit goes in the props.conf.
What we are doing is saying by default, all data from the inputs path are to be known as source type DGCTIME. Then in the props.conf (by way of the transforms.conf) we say that if the source matches pulse.csv that it's source type should be DGCPULSE, if it matches flow.csv then it should be source type DGC_FLOW
And I just noticed I did not escape the . So, replace _.csv in regex with _..csv
the source names look something like this:
so are you saying the regex ought to look something like this:
pulse.*csv or pulse..csv or pulse_.*\csv? None of those seem obvious to me.