Getting Data In

Hunk - Conditional Record Format

tt1
Explorer

Hi,

I have an input file in the format as follows;

1|{json_data}

1|{more_json_data}

2|aa|bb|cc

3|11|aa|bb|dd

The 1's would always be the JSON, and the 2's and 3's would always be the csv (2 format being slightly different to 3).

I would probably use the 1's separately from the others.

How can you handle different formats?

Any thoughts appreciated?

Tags (4)
0 Karma
1 Solution

Ledion_Bitincka
Splunk Employee
Splunk Employee

Having different kinds of formatted data in the same file is pretty unusual, but there are a couple of ways you can go about parsing this:

  1. Use props/transforms.conf to parse the data (see examples below)
  2. Write a custom data preprocessor to parse the data

.

system/local/props.conf
[source::/path/to/source]
KV_MODE = JSON
SHOULD_LINEMERGE = false
# uncomment line below if your data has no timestamps
#DATETIME_CONFIG = NONE

REPORT-recs = handle-record-2, handle-record-3
SEDCMD-json = s/^1\|(.*)/\1/g

system/local/transforms.conf
[handle-record-2]
REGEX = ^2\|(?<field1>[^\|]+)\|

[handle-record-3]
REGEX = ^3\|(?<field1>[^\|]+)\|

Here's an link that shows how you can anonymize data in Splunk which you might find useful.

View solution in original post

Ledion_Bitincka
Splunk Employee
Splunk Employee

Having different kinds of formatted data in the same file is pretty unusual, but there are a couple of ways you can go about parsing this:

  1. Use props/transforms.conf to parse the data (see examples below)
  2. Write a custom data preprocessor to parse the data

.

system/local/props.conf
[source::/path/to/source]
KV_MODE = JSON
SHOULD_LINEMERGE = false
# uncomment line below if your data has no timestamps
#DATETIME_CONFIG = NONE

REPORT-recs = handle-record-2, handle-record-3
SEDCMD-json = s/^1\|(.*)/\1/g

system/local/transforms.conf
[handle-record-2]
REGEX = ^2\|(?<field1>[^\|]+)\|

[handle-record-3]
REGEX = ^3\|(?<field1>[^\|]+)\|

Here's an link that shows how you can anonymize data in Splunk which you might find useful.

tt1
Explorer

Many thanks for answering.

I will work through these solutions, but overall I think you are right in that this data is pretty unusual. Splitting the data prior to HDFS might well be the best idea.

0 Karma
Get Updates on the Splunk Community!

Splunk Enterprise Security: Your Command Center for PCI DSS Compliance

Every security professional knows the drill. The PCI DSS audit is approaching, and suddenly everyone's asking ...

Developer Spotlight with Guilhem Marchand

From Splunk Engineer to Founder: The Journey Behind TrackMe    After spending over 12 years working full time ...

Cisco Catalyst Center Meets Splunk ITSI: From 'Payments Are Down' to Root Cause in ...

The Problem: When Networks and Services Don't Talk Payment systems fail at a retail location. Customers are ...