Basic problem: in smart mode my fields are not getting extracted. All works in verbose mode. Also the time searching does work so I know that how I specify the time field does work.
Search that fails: index=foo | stats count by hii (or any field that isn't partitioned)
I have looked at the previous questions on Hunk extractions and smart mode (e.g. https://answers.splunk.com/answers/147879/why-hunks-field-extractor-behaves-differently-in-smart-mod...) but I cannot get mine to work.
indexes.conf
vix.input.1.splitter.hive.fileformat = orc
vix.input.1.splitter.hive.columnnames = cqtq, ttms, chi, crc, pssc, psql, cqhm, cquc, caun, phr, psct, cquuc, cqtr, cqssl, cqssr, pitag, sstc, psqql, ttsfb,ttrq, cqbl, pttsfb, tfstoc, sscl, UA, tsso, sscc, phi, chp, Carpcqh, sssc, cqssv, cqssc, hii
vix.input.1.splitter.hive.columntypes = string:int:string:string:int:bigint:string:string:string:string:string,string,int:int:int:string:int:int:int:int:bigint:int:bigint:bigint:string,int:int:string:int:string:string:string:string:string
vix.input.1.required_fields = cqtq,ttms,UA,hii
# Completely made up values to satisfy Splunk
vix.input.1.splitter.hive.tablename = transfered
vix.input.1.splitter.hive.dbname = default
vix.splunk.search.splitter = HiveSplitGenerator
props.conf
[source::/projects/flickr/flopsa/ycpi_spark/orc/...]
priority = 202
sourcetype = foo
NO_BINARY_CHECK = true
[foo]
NO_BINARY_CHECK = 1
SHOULD_LINEMERGE = false
TIME_PREFIX = cqtq\":
TIME_FORMAT = %s.%3N
(Note I also tried these two which also get the time search to work but still not fields)
eval-_time=strptime('cqtq',"%s.%3N")
EXTRACT-_time=strptime('cqtq',"%s.%3N")
I got help from Splunk (thanks Raanan!) and this is my solution so others will know.
My indexes.conf
In my long columntypes list i had some commas instead of colons as separators. What I learned from Raanan was to just pull out the first few columns and when that works add in other fields. So above where I have string, int
etc that should be string:int
and so the columnnames didn't align with the columntypes
I shouldn't need to specify a dummy database and table name. We are filing a bug report.
In the index (not provider) I specified the following. This way you don't have to have several different providers. You can reuse:
vix.input.1.splitter.hive.fileformat = orc
vix.input.1.splitter = HiveSplitGenerator
So altogether this is what worked (I changed the columnnames to be shorter and shortened the list to make things clearer)
[foo]
vix.provider = bt
vix.input.1.path = /my/path/...
vix.input.1.accept = \.orc$
vix.input.1.ignore = .+SUCCESS
vix.input.1.et.regex = /my/path/regex...
vix.input.1.et.format = yyyyMMddHH
vix.input.1.et.offset = 0
vix.input.1.lt.regex = /my/path/regex...
vix.input.1.lt.format = yyyyMMddHH
vix.input.1.lt.offset = 3600
vix.input.1.splitter.hive.fileformat = orc
vix.input.1.splitter = HiveSplitGenerator
vix.input.1.required_fields = cqtq,b
vix.input.1.splitter.hive.columnnames = cqtq,b,c,d,e,f,g,h,i
vix.input.1.splitter.hive.columntypes = string:int:string:string:int:bigint:string:string:string.. etc
# Completely made up values to satisfy Splunk bug
vix.input.1.splitter.hive.tablename = default
vix.input.1.splitter.hive.dbname = default
Props.conf
To get search by time to properly work, in my props.conf I used the following. My time field is called cqtq.
It is 10 digit unix timestamp followed by a period then 3 digits. And at the beginning of each record.
eval-_time = strptime('cqtq',"%s.%3N")
I got help from Splunk (thanks Raanan!) and this is my solution so others will know.
My indexes.conf
In my long columntypes list i had some commas instead of colons as separators. What I learned from Raanan was to just pull out the first few columns and when that works add in other fields. So above where I have string, int
etc that should be string:int
and so the columnnames didn't align with the columntypes
I shouldn't need to specify a dummy database and table name. We are filing a bug report.
In the index (not provider) I specified the following. This way you don't have to have several different providers. You can reuse:
vix.input.1.splitter.hive.fileformat = orc
vix.input.1.splitter = HiveSplitGenerator
So altogether this is what worked (I changed the columnnames to be shorter and shortened the list to make things clearer)
[foo]
vix.provider = bt
vix.input.1.path = /my/path/...
vix.input.1.accept = \.orc$
vix.input.1.ignore = .+SUCCESS
vix.input.1.et.regex = /my/path/regex...
vix.input.1.et.format = yyyyMMddHH
vix.input.1.et.offset = 0
vix.input.1.lt.regex = /my/path/regex...
vix.input.1.lt.format = yyyyMMddHH
vix.input.1.lt.offset = 3600
vix.input.1.splitter.hive.fileformat = orc
vix.input.1.splitter = HiveSplitGenerator
vix.input.1.required_fields = cqtq,b
vix.input.1.splitter.hive.columnnames = cqtq,b,c,d,e,f,g,h,i
vix.input.1.splitter.hive.columntypes = string:int:string:string:int:bigint:string:string:string.. etc
# Completely made up values to satisfy Splunk bug
vix.input.1.splitter.hive.tablename = default
vix.input.1.splitter.hive.dbname = default
Props.conf
To get search by time to properly work, in my props.conf I used the following. My time field is called cqtq.
It is 10 digit unix timestamp followed by a period then 3 digits. And at the beginning of each record.
eval-_time = strptime('cqtq',"%s.%3N")