Getting Data In

Hunk - Conditional Record Format

tt1
Explorer

Hi,

I have an input file in the format as follows;

1|{json_data}

1|{more_json_data}

2|aa|bb|cc

3|11|aa|bb|dd

The 1's would always be the JSON, and the 2's and 3's would always be the csv (2 format being slightly different to 3).

I would probably use the 1's separately from the others.

How can you handle different formats?

Any thoughts appreciated?

Tags (4)
0 Karma
1 Solution

Ledion_Bitincka
Splunk Employee
Splunk Employee

Having different kinds of formatted data in the same file is pretty unusual, but there are a couple of ways you can go about parsing this:

  1. Use props/transforms.conf to parse the data (see examples below)
  2. Write a custom data preprocessor to parse the data

.

system/local/props.conf
[source::/path/to/source]
KV_MODE = JSON
SHOULD_LINEMERGE = false
# uncomment line below if your data has no timestamps
#DATETIME_CONFIG = NONE

REPORT-recs = handle-record-2, handle-record-3
SEDCMD-json = s/^1\|(.*)/\1/g

system/local/transforms.conf
[handle-record-2]
REGEX = ^2\|(?<field1>[^\|]+)\|

[handle-record-3]
REGEX = ^3\|(?<field1>[^\|]+)\|

Here's an link that shows how you can anonymize data in Splunk which you might find useful.

View solution in original post

Ledion_Bitincka
Splunk Employee
Splunk Employee

Having different kinds of formatted data in the same file is pretty unusual, but there are a couple of ways you can go about parsing this:

  1. Use props/transforms.conf to parse the data (see examples below)
  2. Write a custom data preprocessor to parse the data

.

system/local/props.conf
[source::/path/to/source]
KV_MODE = JSON
SHOULD_LINEMERGE = false
# uncomment line below if your data has no timestamps
#DATETIME_CONFIG = NONE

REPORT-recs = handle-record-2, handle-record-3
SEDCMD-json = s/^1\|(.*)/\1/g

system/local/transforms.conf
[handle-record-2]
REGEX = ^2\|(?<field1>[^\|]+)\|

[handle-record-3]
REGEX = ^3\|(?<field1>[^\|]+)\|

Here's an link that shows how you can anonymize data in Splunk which you might find useful.

tt1
Explorer

Many thanks for answering.

I will work through these solutions, but overall I think you are right in that this data is pretty unusual. Splitting the data prior to HDFS might well be the best idea.

0 Karma
Get Updates on the Splunk Community!

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...