Solved: How to define a sourcetype based on a TSV file wit...

dominiquevocat · ‎09-07-2015

I have datasets in TSV format where there is no header in the file. I tried to use the wizard to import the data, base it on TSV, define the header and set the (long list) of headers. For some reason the custom headers were not accepted. Has someone a sample props.conf for a TSV file with a custom header that works? 😞

Maybe the header is too long, don't know.
Here are the header fields as comma separated list:
accept_language,browser,browser_height,browser_width,c_color,campaign,channel,click_action,click_action_type,click_context,click_context_type,click_sourceid,click_tag,code_ver,color,connection_type,cookies,country,ct_connect_type,curr_factor,curr_rate,currency,cust_hit_time_gmt,cust_visid,daily_visitor,date_time,domain,duplicate_events,duplicate_purchase,duplicated_from,evar1-250,event_list,exclude_hit,first_hit_page_url,first_hit_pagename,first_hit_referrer,first_hit_time_gmt,geo_city,geo_country,geo_dma,geo_region,geo_zip,hier1-5,hier3,hier4,hier5,hit_source,hit_time_gmt,hitid_high,hitid_low,homepage,hourly_visitor,ip,ip2,j_jscript,java_enabled,javascript,language,last_hit_time_gmt,last_purchase_num,last_purchase_time_gmt,mcvisid,mobile* post_mobile*,mobile_id,monthly_visitor,mvvar1-3,namespace,new_visit,os,p_plugins,page_event,page_event_var1,page_event_var2,page_event_var3,page_type,page_url,pagename,paid_search,partner_plugins,persistent_cookie,plugins,post_ page_event,post_ page_type,post_browser_height,post_browser_width,post_campaign,post_channel,post_cookies,post_currency,post_cust_hit_time_gmt,post_cust_visid,post_evar1-75,post_event_list,post_hier1-5,post_java_enabled,post_keywords,post_mvvar1-3,post_page_event_var1,post_page_event_var2,post_page_event_var3,post_page_url,post_pagename,post_pagename_no_url,post_partner_plugins,post_persistent_cookie,post_product_list,post_prop1-75,post_purchaseid,post_referrer,post_search_engine,post_state,post_survey,post_t_time_info,post_tnt,post_tnt_action,post_transactionid,post_visid_high,post_visid_low,post_visid_type,post_zip,prev_page,product_list,product_merchandising,prop1-75,purchaseid,quarterly_visitor,ref_domain,ref_type,referrer,resolution,s_resolution,sampled_hit,search_engine,search_page_num,secondary_hit,service,social*,post_social*,sourceid,state,stats_server,t_time_info,tnt,tnt_action,tnt_post_vista,transactionid,truncated_hit,ua_color,ua_os,ua_pixels,user_agent,user_hash,user_server,userid,username,va_closer_detail,va_closer_id,va_finder_detail,va_finder_id,va_instance_event,va_new_engagement,video*,post_video*,visid_high,visid_low,visid_new,visid_timestamp,visid_type,visit_keywords,visit_num,visit_page_num,visit_referrer,visit_search_engine,visit_start_page_url,visit_start_pagename,visit_start_time_gmt,weekly_visitor,yearly_visitor,zip

dominiquevocat · ‎09-10-2015

there were several things off...

there are differences in the onboarding and in this case the one in "Data inputs » Files & directories" worked better then the one available from the "Data inputs" dialog.
the extra "," in header delimiter stems from trying to get the headers to match (and looking in the wrong place i.e. suspecting an issue with long lists of headers etc) and is unnecessary...
the headers as documented online are not matching but rather change from time to time and are delivered in a separate .tsv file (rejoice, rejoice)
the preview of the data import failing to reflect the way data is after import (i.e. correct ... )

View solution in original post

dominiquevocat · ‎09-10-2015

there were several things off...

there are differences in the onboarding and in this case the one in "Data inputs » Files & directories" worked better then the one available from the "Data inputs" dialog.
the extra "," in header delimiter stems from trying to get the headers to match (and looking in the wrong place i.e. suspecting an issue with long lists of headers etc) and is unnecessary...
the headers as documented online are not matching but rather change from time to time and are delivered in a separate .tsv file (rejoice, rejoice)
the preview of the data import failing to reflect the way data is after import (i.e. correct ... )

woodcock · ‎09-08-2015

Your configuration files look fine but I would keep only the FIELD_NAMES and INDEXED_EXTRACTIONS = TSV lines (change tsv to TSV) and remove everything else. Then double-check this list:

The sourcetype matches mysourcetype exactly (casing, punctuation, etc.).
The props.conf and transforms.conf configuration files are deployed to the Indexers or Heavy Forwarders (or Universal Forwarders in some cases, such as INDEXED_EXTRACTIONS = TSV).
The inputs.conf configuration file is deployed to the Forwarder.
You must restart/bounce all Splunk instances on the servers where you deploy these files.
There are no configuration errors during restart (watch the response text during startup on one server of each type).
You are verifying proper current function by looking at NEW data (post-deploy/post-bounce), not previously indexed data (which is immutable).

dominiquevocat · ‎09-08-2015

yeah, there were several things off...

there are differences in the onboarding and in this case the one in "Data inputs » Files & directories" worked better then the one available from the "Data inputs" dialog.
the extra "," in header delimiter stems from trying to get the headers to match (and looking in the wrong place i.e. suspecting an issue with long lists of headers etc) and is unnecessary...
the headers as documented online are not matching but rather change from time to time and are delivered in a separate .tsv file (rejoice, rejoice)
the preview of the data import failing to reflect the way data is after import (i.e. correct ... )

It seems to work nicely with the settings... how do i close this issue when no reply is quite correct 🙂

woodcock · ‎09-08-2015

Answer your own questions and then click "Accept" on it.

woodcock · ‎09-07-2015

What is in your configuration files right now?

dominiquevocat · ‎09-08-2015

[mysourcetype]
FIELD_DELIMITER = tab
FIELD_NAMES = accept_language,browser,browser_height,browser_width,c_color,campaign,channel,click_action,click_action_type,click_context,click_context_type,click_sourceid,click_tag,code_ver,color,connection_type,cookies,country,ct_connect_type,curr_factor,curr_rate,currency,cust_hit_time_gmt,cust_visid,daily_visitor,date_time,domain,duplicate_events,duplicate_purchase,duplicated_from,evar1-250,event_list,exclude_hit,first_hit_page_url,first_hit_pagename,first_hit_referrer,first_hit_time_gmt,geo_city,geo_country,geo_dma,geo_region,geo_zip,hier1-5,hier3,hier4,hier5,hit_source,hit_time_gmt,hitid_high,hitid_low,homepage,hourly_visitor,ip,ip2,j_jscript,java_enabled,javascript,language,last_hit_time_gmt,last_purchase_num,last_purchase_time_gmt,mcvisid,mobile* post_mobile*,mobile_id,monthly_visitor,mvvar1-3,namespace,new_visit,os,p_plugins,page_event,page_event_var1,page_event_var2,page_event_var3,page_type,page_url,pagename,paid_search,partner_plugins,persistent_cookie,plugins,post_ page_event,post_ page_type,post_browser_height,post_browser_width,post_campaign,post_channel,post_cookies,post_currency,post_cust_hit_time_gmt,post_cust_visid,post_evar1-75,post_event_list,post_hier1-5,post_java_enabled,post_keywords,post_mvvar1-3,post_page_event_var1,post_page_event_var2,post_page_event_var3,post_page_url,post_pagename,post_pagename_no_url,post_partner_plugins,post_persistent_cookie,post_product_list,post_prop1-75,post_purchaseid,post_referrer,post_search_engine,post_state,post_survey,post_t_time_info,post_tnt,post_tnt_action,post_transactionid,post_visid_high,post_visid_low,post_visid_type,post_zip,prev_page,product_list,product_merchandising,prop1-75,purchaseid,quarterly_visitor,ref_domain,ref_type,referrer,resolution,s_resolution,sampled_hit,search_engine,search_page_num,secondary_hit,service,social*,post_social*,sourceid,state,stats_server,t_time_info,tnt,tnt_action,tnt_post_vista,transactionid,truncated_hit,ua_color,ua_os,ua_pixels,user_agent,user_hash,user_server,userid,username,va_closer_detail,va_closer_id,va_finder_detail,va_finder_id,va_instance_event,va_new_engagement,video*,post_video*,visid_high,visid_low,visid_new,visid_timestamp,visid_type,visit_keywords,visit_num,visit_page_num,visit_referrer,visit_search_engine,visit_start_page_url,visit_start_pagename,visit_start_time_gmt,weekly_visitor,yearly_visitor,zip
HEADER_FIELD_DELIMITER = tab
INDEXED_EXTRACTIONS = tsv
disabled = false

dominiquevocat · ‎09-08-2015

hm, the preview and guidance when using the older import wizard seems to fail me, the indexed data looks fine 😕

MuS · ‎09-08-2015

Why do you set HEADER_FIELD_DELIMITER if there is no header in your file?

How to define a sourcetype based on a TSV file with a long list of fields?

Can’t make it to .conf25? Join us online!

Can’t Make It to Boston? Stream .conf25 and Learn with Haya Husain

Splunk Lantern’s Guide to The Most Popular .conf25 Sessions

Unlock What’s Next: The Splunk Cloud Platform at .conf25

Are you a member of the Splunk Community?

How to define a sourcetype based on a TSV file with a long list of fields?

Can’t make it to .conf25? Join us online!

Can’t Make It to Boston? Stream .conf25 and Learn with Haya Husain

Splunk Lantern’s Guide to The Most Popular .conf25 Sessions

Unlock What’s Next: The Splunk Cloud Platform at .conf25