Getting Data In
Highlighted

How to define a sourcetype based on a TSV file with a long list of fields?

Motivator

I have datasets in TSV format where there is no header in the file. I tried to use the wizard to import the data, base it on TSV, define the header and set the (long list) of headers. For some reason the custom headers were not accepted. Has someone a sample props.conf for a TSV file with a custom header that works? 😞

Maybe the header is too long, don't know.
Here are the header fields as comma separated list:
acceptlanguage,browser,browserheight,browserwidth,ccolor,campaign,channel,clickaction,clickactiontype,clickcontext,clickcontexttype,clicksourceid,clicktag,codever,color,connectiontype,cookies,country,ctconnecttype,currfactor,currrate,currency,custhittimegmt,custvisid,dailyvisitor,datetime,domain,duplicateevents,duplicatepurchase,duplicatedfrom,evar1-250,eventlist,excludehit,firsthitpageurl,firsthitpagename,firsthitreferrer,firsthittimegmt,geocity,geocountry,geodma,georegion,geozip,hier1-5,hier3,hier4,hier5,hitsource,hittimegmt,hitidhigh,hitidlow,homepage,hourlyvisitor,ip,ip2,jjscript,javaenabled,javascript,language,lasthittimegmt,lastpurchasenum,lastpurchasetimegmt,mcvisid,mobile* postmobile*,mobileid,monthlyvisitor,mvvar1-3,namespace,newvisit,os,pplugins,pageevent,pageeventvar1,pageeventvar2,pageeventvar3,pagetype,pageurl,pagename,paidsearch,partnerplugins,persistentcookie,plugins,post pageevent,post pagetype,postbrowserheight,postbrowserwidth,postcampaign,postchannel,postcookies,postcurrency,postcusthittimegmt,postcustvisid,postevar1-75,posteventlist,posthier1-5,postjavaenabled,postkeywords,postmvvar1-3,postpageeventvar1,postpageeventvar2,postpageeventvar3,postpageurl,postpagename,postpagenamenourl,postpartnerplugins,postpersistentcookie,postproductlist,postprop1-75,postpurchaseid,postreferrer,postsearchengine,poststate,postsurvey,postttimeinfo,posttnt,posttntaction,posttransactionid,postvisidhigh,postvisidlow,postvisidtype,postzip,prevpage,productlist,productmerchandising,prop1-75,purchaseid,quarterlyvisitor,refdomain,reftype,referrer,resolution,sresolution,sampledhit,searchengine,searchpagenum,secondaryhit,service,social*,postsocial,sourceid,state,statsserver,ttimeinfo,tnt,tntaction,tntpostvista,transactionid,truncatedhit,uacolor,uaos,uapixels,useragent,userhash,userserver,userid,username,vacloserdetail,vacloserid,vafinderdetail,vafinderid,vainstanceevent,vanew_engagement,video,postvideo*,visidhigh,visidlow,visidnew,visidtimestamp,visidtype,visitkeywords,visitnum,visitpagenum,visitreferrer,visitsearchengine,visitstartpageurl,visitstartpagename,visitstarttimegmt,weeklyvisitor,yearly_visitor,zip

0 Karma
Highlighted

Re: How to define a sourcetype based on a TSV file with a long list of fields?

Esteemed Legend

What is in your configuration files right now?

0 Karma
Highlighted

Re: How to define a sourcetype based on a TSV file with a long list of fields?

Motivator
[mysourcetype]
FIELD_DELIMITER = tab
FIELD_NAMES = accept_language,browser,browser_height,browser_width,c_color,campaign,channel,click_action,click_action_type,click_context,click_context_type,click_sourceid,click_tag,code_ver,color,connection_type,cookies,country,ct_connect_type,curr_factor,curr_rate,currency,cust_hit_time_gmt,cust_visid,daily_visitor,date_time,domain,duplicate_events,duplicate_purchase,duplicated_from,evar1-250,event_list,exclude_hit,first_hit_page_url,first_hit_pagename,first_hit_referrer,first_hit_time_gmt,geo_city,geo_country,geo_dma,geo_region,geo_zip,hier1-5,hier3,hier4,hier5,hit_source,hit_time_gmt,hitid_high,hitid_low,homepage,hourly_visitor,ip,ip2,j_jscript,java_enabled,javascript,language,last_hit_time_gmt,last_purchase_num,last_purchase_time_gmt,mcvisid,mobile* post_mobile*,mobile_id,monthly_visitor,mvvar1-3,namespace,new_visit,os,p_plugins,page_event,page_event_var1,page_event_var2,page_event_var3,page_type,page_url,pagename,paid_search,partner_plugins,persistent_cookie,plugins,post_ page_event,post_ page_type,post_browser_height,post_browser_width,post_campaign,post_channel,post_cookies,post_currency,post_cust_hit_time_gmt,post_cust_visid,post_evar1-75,post_event_list,post_hier1-5,post_java_enabled,post_keywords,post_mvvar1-3,post_page_event_var1,post_page_event_var2,post_page_event_var3,post_page_url,post_pagename,post_pagename_no_url,post_partner_plugins,post_persistent_cookie,post_product_list,post_prop1-75,post_purchaseid,post_referrer,post_search_engine,post_state,post_survey,post_t_time_info,post_tnt,post_tnt_action,post_transactionid,post_visid_high,post_visid_low,post_visid_type,post_zip,prev_page,product_list,product_merchandising,prop1-75,purchaseid,quarterly_visitor,ref_domain,ref_type,referrer,resolution,s_resolution,sampled_hit,search_engine,search_page_num,secondary_hit,service,social*,post_social*,sourceid,state,stats_server,t_time_info,tnt,tnt_action,tnt_post_vista,transactionid,truncated_hit,ua_color,ua_os,ua_pixels,user_agent,user_hash,user_server,userid,username,va_closer_detail,va_closer_id,va_finder_detail,va_finder_id,va_instance_event,va_new_engagement,video*,post_video*,visid_high,visid_low,visid_new,visid_timestamp,visid_type,visit_keywords,visit_num,visit_page_num,visit_referrer,visit_search_engine,visit_start_page_url,visit_start_pagename,visit_start_time_gmt,weekly_visitor,yearly_visitor,zip
HEADER_FIELD_DELIMITER = tab
INDEXED_EXTRACTIONS = tsv
disabled = false
0 Karma
Highlighted

Re: How to define a sourcetype based on a TSV file with a long list of fields?

SplunkTrust
SplunkTrust

Why do you set HEADER_FIELD_DELIMITER if there is no header in your file?

0 Karma
Highlighted

Re: How to define a sourcetype based on a TSV file with a long list of fields?

Motivator

hm, the preview and guidance when using the older import wizard seems to fail me, the indexed data looks fine 😕

0 Karma
Highlighted

Re: How to define a sourcetype based on a TSV file with a long list of fields?

Esteemed Legend

Your configuration files look fine but I would keep only the FIELD_NAMES and INDEXED_EXTRACTIONS = TSV lines (change tsv to TSV) and remove everything else. Then double-check this list:

  • The sourcetype matches mysourcetype exactly (casing, punctuation, etc.).
  • The props.conf and transforms.conf configuration files are deployed to the Indexers or Heavy Forwarders (or Universal Forwarders in some cases, such as INDEXED_EXTRACTIONS = TSV).
  • The inputs.conf configuration file is deployed to the Forwarder.
  • You must restart/bounce all Splunk instances on the servers where you deploy these files.
  • There are no configuration errors during restart (watch the response text during startup on one server of each type).
  • You are verifying proper current function by looking at NEW data (post-deploy/post-bounce), not previously indexed data (which is immutable).
0 Karma
Highlighted

Re: How to define a sourcetype based on a TSV file with a long list of fields?

Motivator

yeah, there were several things off...

  • there are differences in the onboarding and in this case the one in "Data inputs » Files & directories" worked better then the one available from the "Data inputs" dialog.
  • the extra "," in header delimiter stems from trying to get the headers to match (and looking in the wrong place i.e. suspecting an issue with long lists of headers etc) and is unnecessary...
  • the headers as documented online are not matching but rather change from time to time and are delivered in a separate .tsv file (rejoice, rejoice)
  • the preview of the data import failing to reflect the way data is after import (i.e. correct ... )

It seems to work nicely with the settings... how do i close this issue when no reply is quite correct 🙂

0 Karma
Highlighted

Re: How to define a sourcetype based on a TSV file with a long list of fields?

Esteemed Legend

Answer your own questions and then click "Accept" on it.

0 Karma
Highlighted

Re: How to define a sourcetype based on a TSV file with a long list of fields?

Motivator

there were several things off...

there are differences in the onboarding and in this case the one in "Data inputs » Files & directories" worked better then the one available from the "Data inputs" dialog.
the extra "," in header delimiter stems from trying to get the headers to match (and looking in the wrong place i.e. suspecting an issue with long lists of headers etc) and is unnecessary...
the headers as documented online are not matching but rather change from time to time and are delivered in a separate .tsv file (rejoice, rejoice)
the preview of the data import failing to reflect the way data is after import (i.e. correct ... )

View solution in original post

0 Karma