Splunk Search

how to associate schema with data in hunk

anupkpurushu
New Member

The schema file and data file both reside on hdfs.

Hunk is able to read the data file and show the raw data but it doesn't associate it with the schema. That means i would have to manually extract each field.
Is there a way to point hunk to a schema so that it can understand the raw data better ?

Tags (2)
0 Karma
1 Solution

Ledion_Bitincka
Splunk Employee
Splunk Employee

So, it seems like the file is a headerless CSV file - correct? If so, then you can use delimiter based KV - then you can do something like this:

etc/apps/search/local/props.conf
[source::/path/to/your/file]
REPORT-kv = my-delim-kv


etc/apps/search/local/transforms.conf
[my-delim-kv]
FIELDS = <comma delimited list of field names>
DELIM = ,

You can look at this answer for a similar issue

View solution in original post

0 Karma

Ledion_Bitincka
Splunk Employee
Splunk Employee

So, it seems like the file is a headerless CSV file - correct? If so, then you can use delimiter based KV - then you can do something like this:

etc/apps/search/local/props.conf
[source::/path/to/your/file]
REPORT-kv = my-delim-kv


etc/apps/search/local/transforms.conf
[my-delim-kv]
FIELDS = <comma delimited list of field names>
DELIM = ,

You can look at this answer for a similar issue

0 Karma

anupkpurushu
New Member

I tried out the transforms.conf configuration and it seems to work properly. I guess I will have to manually create similar confs for all required files/schemas. Not the best way but at least it works with manual configuration.

0 Karma

rdagan_splunk
Splunk Employee
Splunk Employee

Hunk supports the following Schema options: hive schema, Structure files (Parquet, Json, Avro, ORC, RC, Seq, TSV, CSV, etc ..), and Many different type of log files (just call one of the known sourcetypes)

0 Karma

anupkpurushu
New Member

If you can elaborate a bit I can give it a shot.

In my case the schema looks like following and the schema is in a separate file on hdfs:

column_1_name partition_key - - - - - long - "ends '\054'"
column_1_name partition_key - - - - - integer - "ends '\054'"
...
....

The data is another directory having multiple bzip files.

0 Karma

Ledion_Bitincka
Splunk Employee
Splunk Employee

What type of schema are you referring to? Are the data files plain text (compressed) files? Can you post a sample data record?

0 Karma

anupkpurushu
New Member

data file has multiple lines, each of which would be something like: (114 fields/columns)

1,2,2,9,0,14781,6394,29742,141962,65134,4,1,10,1510,301,0,0,76,726,4162,-1,2,59,-1,-1,-1,-1,1,0,0,2,,website1.com,,2,QwgAAAAEAsssssAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACrtt0J58VMjIAAAAAAssssAAAAA==,2014-12-29T15:00:00.000Z,0.000000000,0.001110000,0.002247780,1,0,0,0,0,0,0,0,0,0,,0,1419866452476,0,0,0,0,,,,0,website1.com/m,,,1,47215,1,-1,29227,,,,,,,,1,0,,2557,19,20,0.0025962923,0.0060005000,0,2,0,20,153,0.006,0.002962923,0.00224778,0.00111,1,0,9,,0,1,2,0,0,0,,,,,0,,,,-1,0,

The definition of each column/field will be in the schema file. The schema file will have 114 lines; each defining the specific column/field:

column_1_name partition_key - - - - - long - "ends '054'"
column_1_name partition_key - - - - - integer - "ends '054'"
...
....

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Announcing Modern Navigation: A New Era of Splunk User Experience

We are excited to introduce the Modern Navigation feature in the Splunk Platform, available to both cloud and ...

Modernize your Splunk Apps – Introducing Python 3.13 in Splunk

We are excited to announce that the upcoming releases of Splunk Enterprise 10.2.x and Splunk Cloud Platform ...

Step into “Hunt the Insider: An Splunk ES Premier Mystery” to catch a cybercriminal ...

After a whole week of being on call, you fell asleep on your keyboard, and you hit a sequence of buttons that ...