Getting Data In

Hunk - Conditional Record Format

tt1
Explorer

Hi,

I have an input file in the format as follows;

1|{json_data}

1|{more_json_data}

2|aa|bb|cc

3|11|aa|bb|dd

The 1's would always be the JSON, and the 2's and 3's would always be the csv (2 format being slightly different to 3).

I would probably use the 1's separately from the others.

How can you handle different formats?

Any thoughts appreciated?

Tags (4)
0 Karma
1 Solution

Ledion_Bitincka
Splunk Employee
Splunk Employee

Having different kinds of formatted data in the same file is pretty unusual, but there are a couple of ways you can go about parsing this:

  1. Use props/transforms.conf to parse the data (see examples below)
  2. Write a custom data preprocessor to parse the data

.

system/local/props.conf
[source::/path/to/source]
KV_MODE = JSON
SHOULD_LINEMERGE = false
# uncomment line below if your data has no timestamps
#DATETIME_CONFIG = NONE

REPORT-recs = handle-record-2, handle-record-3
SEDCMD-json = s/^1\|(.*)/\1/g

system/local/transforms.conf
[handle-record-2]
REGEX = ^2\|(?<field1>[^\|]+)\|

[handle-record-3]
REGEX = ^3\|(?<field1>[^\|]+)\|

Here's an link that shows how you can anonymize data in Splunk which you might find useful.

View solution in original post

Ledion_Bitincka
Splunk Employee
Splunk Employee

Having different kinds of formatted data in the same file is pretty unusual, but there are a couple of ways you can go about parsing this:

  1. Use props/transforms.conf to parse the data (see examples below)
  2. Write a custom data preprocessor to parse the data

.

system/local/props.conf
[source::/path/to/source]
KV_MODE = JSON
SHOULD_LINEMERGE = false
# uncomment line below if your data has no timestamps
#DATETIME_CONFIG = NONE

REPORT-recs = handle-record-2, handle-record-3
SEDCMD-json = s/^1\|(.*)/\1/g

system/local/transforms.conf
[handle-record-2]
REGEX = ^2\|(?<field1>[^\|]+)\|

[handle-record-3]
REGEX = ^3\|(?<field1>[^\|]+)\|

Here's an link that shows how you can anonymize data in Splunk which you might find useful.

tt1
Explorer

Many thanks for answering.

I will work through these solutions, but overall I think you are right in that this data is pretty unusual. Splitting the data prior to HDFS might well be the best idea.

0 Karma
Get Updates on the Splunk Community!

Introducing the Splunk Community Dashboard Challenge!

Welcome to Splunk Community Dashboard Challenge! This is your chance to showcase your skills in creating ...

Built-in Service Level Objectives Management to Bridge the Gap Between Service & ...

Wednesday, May 29, 2024  |  11AM PST / 2PM ESTRegister now and join us to learn more about how you can ...

Get Your Exclusive Splunk Certified Cybersecurity Defense Engineer Certification at ...

We’re excited to announce a new Splunk certification exam being released at .conf24! If you’re headed to Vegas ...