Solved: Hunk - Conditional Record Format

tt1 · ‎02-16-2014

Hi,

I have an input file in the format as follows;

1|{json_data}

1|{more_json_data}

2|aa|bb|cc

3|11|aa|bb|dd

The 1's would always be the JSON, and the 2's and 3's would always be the csv (2 format being slightly different to 3).

I would probably use the 1's separately from the others.

How can you handle different formats?

Any thoughts appreciated?

Ledion_Bitincka · ‎02-16-2014

Having different kinds of formatted data in the same file is pretty unusual, but there are a couple of ways you can go about parsing this:

Use props/transforms.conf to parse the data (see examples below)
Write a custom data preprocessor to parse the data

.

system/local/props.conf
[source::/path/to/source]
KV_MODE = JSON
SHOULD_LINEMERGE = false
# uncomment line below if your data has no timestamps
#DATETIME_CONFIG = NONE

REPORT-recs = handle-record-2, handle-record-3
SEDCMD-json = s/^1\|(.*)/\1/g

system/local/transforms.conf
[handle-record-2]
REGEX = ^2\|(?<field1>[^\|]+)\|

[handle-record-3]
REGEX = ^3\|(?<field1>[^\|]+)\|

Here's an link that shows how you can anonymize data in Splunk which you might find useful.

View solution in original post

Ledion_Bitincka · ‎02-16-2014

Having different kinds of formatted data in the same file is pretty unusual, but there are a couple of ways you can go about parsing this:

Use props/transforms.conf to parse the data (see examples below)
Write a custom data preprocessor to parse the data

.

system/local/props.conf
[source::/path/to/source]
KV_MODE = JSON
SHOULD_LINEMERGE = false
# uncomment line below if your data has no timestamps
#DATETIME_CONFIG = NONE

REPORT-recs = handle-record-2, handle-record-3
SEDCMD-json = s/^1\|(.*)/\1/g

system/local/transforms.conf
[handle-record-2]
REGEX = ^2\|(?<field1>[^\|]+)\|

[handle-record-3]
REGEX = ^3\|(?<field1>[^\|]+)\|

Here's an link that shows how you can anonymize data in Splunk which you might find useful.

tt1 · ‎02-17-2014

Many thanks for answering.

I will work through these solutions, but overall I think you are right in that this data is pretty unusual. Splitting the data prior to HDFS might well be the best idea.

Hunk - Conditional Record Format

Announcing Scheduled Export GA for Dashboard Studio

Extending Observability Content to Splunk Cloud

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!