Hi,
I have an input file in the format as follows;
1|{json_data}
1|{more_json_data}
2|aa|bb|cc
3|11|aa|bb|dd
The 1's would always be the JSON, and the 2's and 3's would always be the csv (2 format being slightly different to 3).
I would probably use the 1's separately from the others.
How can you handle different formats?
Any thoughts appreciated?
Having different kinds of formatted data in the same file is pretty unusual, but there are a couple of ways you can go about parsing this:
.
system/local/props.conf
[source::/path/to/source]
KV_MODE = JSON
SHOULD_LINEMERGE = false
# uncomment line below if your data has no timestamps
#DATETIME_CONFIG = NONE
REPORT-recs = handle-record-2, handle-record-3
SEDCMD-json = s/^1\|(.*)/\1/g
system/local/transforms.conf
[handle-record-2]
REGEX = ^2\|(?<field1>[^\|]+)\|
[handle-record-3]
REGEX = ^3\|(?<field1>[^\|]+)\|
Here's an link that shows how you can anonymize data in Splunk which you might find useful.
Having different kinds of formatted data in the same file is pretty unusual, but there are a couple of ways you can go about parsing this:
.
system/local/props.conf
[source::/path/to/source]
KV_MODE = JSON
SHOULD_LINEMERGE = false
# uncomment line below if your data has no timestamps
#DATETIME_CONFIG = NONE
REPORT-recs = handle-record-2, handle-record-3
SEDCMD-json = s/^1\|(.*)/\1/g
system/local/transforms.conf
[handle-record-2]
REGEX = ^2\|(?<field1>[^\|]+)\|
[handle-record-3]
REGEX = ^3\|(?<field1>[^\|]+)\|
Here's an link that shows how you can anonymize data in Splunk which you might find useful.
Many thanks for answering.
I will work through these solutions, but overall I think you are right in that this data is pretty unusual. Splitting the data prior to HDFS might well be the best idea.