Getting Data In

How to define a sourcetype based on a TSV file with a long list of fields?

dominiquevocat
SplunkTrust
SplunkTrust

I have datasets in TSV format where there is no header in the file. I tried to use the wizard to import the data, base it on TSV, define the header and set the (long list) of headers. For some reason the custom headers were not accepted. Has someone a sample props.conf for a TSV file with a custom header that works? 😞

Maybe the header is too long, don't know.
Here are the header fields as comma separated list:
accept_language,browser,browser_height,browser_width,c_color,campaign,channel,click_action,click_action_type,click_context,click_context_type,click_sourceid,click_tag,code_ver,color,connection_type,cookies,country,ct_connect_type,curr_factor,curr_rate,currency,cust_hit_time_gmt,cust_visid,daily_visitor,date_time,domain,duplicate_events,duplicate_purchase,duplicated_from,evar1-250,event_list,exclude_hit,first_hit_page_url,first_hit_pagename,first_hit_referrer,first_hit_time_gmt,geo_city,geo_country,geo_dma,geo_region,geo_zip,hier1-5,hier3,hier4,hier5,hit_source,hit_time_gmt,hitid_high,hitid_low,homepage,hourly_visitor,ip,ip2,j_jscript,java_enabled,javascript,language,last_hit_time_gmt,last_purchase_num,last_purchase_time_gmt,mcvisid,mobile* post_mobile*,mobile_id,monthly_visitor,mvvar1-3,namespace,new_visit,os,p_plugins,page_event,page_event_var1,page_event_var2,page_event_var3,page_type,page_url,pagename,paid_search,partner_plugins,persistent_cookie,plugins,post_ page_event,post_ page_type,post_browser_height,post_browser_width,post_campaign,post_channel,post_cookies,post_currency,post_cust_hit_time_gmt,post_cust_visid,post_evar1-75,post_event_list,post_hier1-5,post_java_enabled,post_keywords,post_mvvar1-3,post_page_event_var1,post_page_event_var2,post_page_event_var3,post_page_url,post_pagename,post_pagename_no_url,post_partner_plugins,post_persistent_cookie,post_product_list,post_prop1-75,post_purchaseid,post_referrer,post_search_engine,post_state,post_survey,post_t_time_info,post_tnt,post_tnt_action,post_transactionid,post_visid_high,post_visid_low,post_visid_type,post_zip,prev_page,product_list,product_merchandising,prop1-75,purchaseid,quarterly_visitor,ref_domain,ref_type,referrer,resolution,s_resolution,sampled_hit,search_engine,search_page_num,secondary_hit,service,social*,post_social*,sourceid,state,stats_server,t_time_info,tnt,tnt_action,tnt_post_vista,transactionid,truncated_hit,ua_color,ua_os,ua_pixels,user_agent,user_hash,user_server,userid,username,va_closer_detail,va_closer_id,va_finder_detail,va_finder_id,va_instance_event,va_new_engagement,video*,post_video*,visid_high,visid_low,visid_new,visid_timestamp,visid_type,visit_keywords,visit_num,visit_page_num,visit_referrer,visit_search_engine,visit_start_page_url,visit_start_pagename,visit_start_time_gmt,weekly_visitor,yearly_visitor,zip

0 Karma
1 Solution

dominiquevocat
SplunkTrust
SplunkTrust

there were several things off...

there are differences in the onboarding and in this case the one in "Data inputs » Files & directories" worked better then the one available from the "Data inputs" dialog.
the extra "," in header delimiter stems from trying to get the headers to match (and looking in the wrong place i.e. suspecting an issue with long lists of headers etc) and is unnecessary...
the headers as documented online are not matching but rather change from time to time and are delivered in a separate .tsv file (rejoice, rejoice)
the preview of the data import failing to reflect the way data is after import (i.e. correct ... )

View solution in original post

0 Karma

dominiquevocat
SplunkTrust
SplunkTrust

there were several things off...

there are differences in the onboarding and in this case the one in "Data inputs » Files & directories" worked better then the one available from the "Data inputs" dialog.
the extra "," in header delimiter stems from trying to get the headers to match (and looking in the wrong place i.e. suspecting an issue with long lists of headers etc) and is unnecessary...
the headers as documented online are not matching but rather change from time to time and are delivered in a separate .tsv file (rejoice, rejoice)
the preview of the data import failing to reflect the way data is after import (i.e. correct ... )

0 Karma

woodcock
Esteemed Legend

Your configuration files look fine but I would keep only the FIELD_NAMES and INDEXED_EXTRACTIONS = TSV lines (change tsv to TSV) and remove everything else. Then double-check this list:

  • The sourcetype matches mysourcetype exactly (casing, punctuation, etc.).
  • The props.conf and transforms.conf configuration files are deployed to the Indexers or Heavy Forwarders (or Universal Forwarders in some cases, such as INDEXED_EXTRACTIONS = TSV).
  • The inputs.conf configuration file is deployed to the Forwarder.
  • You must restart/bounce all Splunk instances on the servers where you deploy these files.
  • There are no configuration errors during restart (watch the response text during startup on one server of each type).
  • You are verifying proper current function by looking at NEW data (post-deploy/post-bounce), not previously indexed data (which is immutable).
0 Karma

dominiquevocat
SplunkTrust
SplunkTrust

yeah, there were several things off...

  • there are differences in the onboarding and in this case the one in "Data inputs » Files & directories" worked better then the one available from the "Data inputs" dialog.
  • the extra "," in header delimiter stems from trying to get the headers to match (and looking in the wrong place i.e. suspecting an issue with long lists of headers etc) and is unnecessary...
  • the headers as documented online are not matching but rather change from time to time and are delivered in a separate .tsv file (rejoice, rejoice)
  • the preview of the data import failing to reflect the way data is after import (i.e. correct ... )

It seems to work nicely with the settings... how do i close this issue when no reply is quite correct 🙂

0 Karma

woodcock
Esteemed Legend

Answer your own questions and then click "Accept" on it.

0 Karma

woodcock
Esteemed Legend

What is in your configuration files right now?

0 Karma

dominiquevocat
SplunkTrust
SplunkTrust
[mysourcetype]
FIELD_DELIMITER = tab
FIELD_NAMES = accept_language,browser,browser_height,browser_width,c_color,campaign,channel,click_action,click_action_type,click_context,click_context_type,click_sourceid,click_tag,code_ver,color,connection_type,cookies,country,ct_connect_type,curr_factor,curr_rate,currency,cust_hit_time_gmt,cust_visid,daily_visitor,date_time,domain,duplicate_events,duplicate_purchase,duplicated_from,evar1-250,event_list,exclude_hit,first_hit_page_url,first_hit_pagename,first_hit_referrer,first_hit_time_gmt,geo_city,geo_country,geo_dma,geo_region,geo_zip,hier1-5,hier3,hier4,hier5,hit_source,hit_time_gmt,hitid_high,hitid_low,homepage,hourly_visitor,ip,ip2,j_jscript,java_enabled,javascript,language,last_hit_time_gmt,last_purchase_num,last_purchase_time_gmt,mcvisid,mobile* post_mobile*,mobile_id,monthly_visitor,mvvar1-3,namespace,new_visit,os,p_plugins,page_event,page_event_var1,page_event_var2,page_event_var3,page_type,page_url,pagename,paid_search,partner_plugins,persistent_cookie,plugins,post_ page_event,post_ page_type,post_browser_height,post_browser_width,post_campaign,post_channel,post_cookies,post_currency,post_cust_hit_time_gmt,post_cust_visid,post_evar1-75,post_event_list,post_hier1-5,post_java_enabled,post_keywords,post_mvvar1-3,post_page_event_var1,post_page_event_var2,post_page_event_var3,post_page_url,post_pagename,post_pagename_no_url,post_partner_plugins,post_persistent_cookie,post_product_list,post_prop1-75,post_purchaseid,post_referrer,post_search_engine,post_state,post_survey,post_t_time_info,post_tnt,post_tnt_action,post_transactionid,post_visid_high,post_visid_low,post_visid_type,post_zip,prev_page,product_list,product_merchandising,prop1-75,purchaseid,quarterly_visitor,ref_domain,ref_type,referrer,resolution,s_resolution,sampled_hit,search_engine,search_page_num,secondary_hit,service,social*,post_social*,sourceid,state,stats_server,t_time_info,tnt,tnt_action,tnt_post_vista,transactionid,truncated_hit,ua_color,ua_os,ua_pixels,user_agent,user_hash,user_server,userid,username,va_closer_detail,va_closer_id,va_finder_detail,va_finder_id,va_instance_event,va_new_engagement,video*,post_video*,visid_high,visid_low,visid_new,visid_timestamp,visid_type,visit_keywords,visit_num,visit_page_num,visit_referrer,visit_search_engine,visit_start_page_url,visit_start_pagename,visit_start_time_gmt,weekly_visitor,yearly_visitor,zip
HEADER_FIELD_DELIMITER = tab
INDEXED_EXTRACTIONS = tsv
disabled = false
0 Karma

dominiquevocat
SplunkTrust
SplunkTrust

hm, the preview and guidance when using the older import wizard seems to fail me, the indexed data looks fine 😕

0 Karma

MuS
Legend

Why do you set HEADER_FIELD_DELIMITER if there is no header in your file?

0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.

Can’t make it to .conf25? Join us online!

Get Updates on the Splunk Community!

Can’t Make It to Boston? Stream .conf25 and Learn with Haya Husain

Boston may be buzzing this September with Splunk University and .conf25, but you don’t have to pack a bag to ...

Splunk Lantern’s Guide to The Most Popular .conf25 Sessions

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...

Unlock What’s Next: The Splunk Cloud Platform at .conf25

In just a few days, Boston will be buzzing as the Splunk team and thousands of community members come together ...