Getting Data In

Hunk - Conditional Record Format

tt1
Explorer

Hi,

I have an input file in the format as follows;

1|{json_data}

1|{more_json_data}

2|aa|bb|cc

3|11|aa|bb|dd

The 1's would always be the JSON, and the 2's and 3's would always be the csv (2 format being slightly different to 3).

I would probably use the 1's separately from the others.

How can you handle different formats?

Any thoughts appreciated?

Tags (4)
0 Karma
1 Solution

Ledion_Bitincka
Splunk Employee
Splunk Employee

Having different kinds of formatted data in the same file is pretty unusual, but there are a couple of ways you can go about parsing this:

  1. Use props/transforms.conf to parse the data (see examples below)
  2. Write a custom data preprocessor to parse the data

.

system/local/props.conf
[source::/path/to/source]
KV_MODE = JSON
SHOULD_LINEMERGE = false
# uncomment line below if your data has no timestamps
#DATETIME_CONFIG = NONE

REPORT-recs = handle-record-2, handle-record-3
SEDCMD-json = s/^1\|(.*)/\1/g

system/local/transforms.conf
[handle-record-2]
REGEX = ^2\|(?<field1>[^\|]+)\|

[handle-record-3]
REGEX = ^3\|(?<field1>[^\|]+)\|

Here's an link that shows how you can anonymize data in Splunk which you might find useful.

View solution in original post

Ledion_Bitincka
Splunk Employee
Splunk Employee

Having different kinds of formatted data in the same file is pretty unusual, but there are a couple of ways you can go about parsing this:

  1. Use props/transforms.conf to parse the data (see examples below)
  2. Write a custom data preprocessor to parse the data

.

system/local/props.conf
[source::/path/to/source]
KV_MODE = JSON
SHOULD_LINEMERGE = false
# uncomment line below if your data has no timestamps
#DATETIME_CONFIG = NONE

REPORT-recs = handle-record-2, handle-record-3
SEDCMD-json = s/^1\|(.*)/\1/g

system/local/transforms.conf
[handle-record-2]
REGEX = ^2\|(?<field1>[^\|]+)\|

[handle-record-3]
REGEX = ^3\|(?<field1>[^\|]+)\|

Here's an link that shows how you can anonymize data in Splunk which you might find useful.

tt1
Explorer

Many thanks for answering.

I will work through these solutions, but overall I think you are right in that this data is pretty unusual. Splitting the data prior to HDFS might well be the best idea.

0 Karma
Get Updates on the Splunk Community!

Developer Spotlight with Paul Stout

Welcome to our very first developer spotlight release series where we'll feature some awesome Splunk ...

State of Splunk Careers 2024: Maximizing Career Outcomes and the Continued Value of ...

For the past four years, Splunk has partnered with Enterprise Strategy Group to conduct a survey that gauges ...

Data-Driven Success: Splunk & Financial Services

Splunk streamlines the process of extracting insights from large volumes of data. In this fast-paced world, ...