Getting Data In

Multiple sourcetypes on SQS based S3

johnansett
Communicator

Hello Splunkers,

I have a bit of an issue onboarding some AWS Canaries from S3.  We have deployed the SQS/SNS and S3 and the files are coming in fine.  However each canary writes 4 files:

  1. TXT - detailed report -  I want this
  2. JSON - summary report - I want this
  3. PNG - image - I don't want this
  4. HTML - I don't want this

Because they are all coming on a single queue they are getting the same sourcetype which is not working as they are very different structures.  As such I've built the following props:

[aws:canaries:summary]
DATETIME_CONFIG = 
INDEXED_EXTRACTIONS = json
KV_MODE = none
LINE_BREAKER = ([\r\n]+)
NO_BINARY_CHECK = true
category = Structured
description = AWS Canary JSON Summary File
disabled = false
pulldown_type = true

[aws:canaries:detailed]
BREAK_ONLY_BEFORE_DATE = 
DATETIME_CONFIG = 
LINE_BREAKER = (Start Canary)
MAX_TIMESTAMP_LOOKAHEAD = 50
NO_BINARY_CHECK = true
SHOULD_LINEMERGE = false
TIME_PREFIX = timestamp\:\s
TRUNCATE = 0
category = Application
description = AWS Canary Detailed TXT file
disabled = false
pulldown_type = true

This is on the HF and the Indexer cluster.

In order to separate and drop the files the following is also added to the props/transforms:

Props:

[aws:canaries]
TRANSFORMS-aws_canaries = set_aws_canary_json, set_aws_canary_txt, drop_aws_canary_png, drop_aws_canary_html

Transforms:

##### AWS Canaries - change sourcetype for SQS based S3 files and drop PNG files #####
[set_aws_canary_json]
SOURCE_KEY = MetaData:Source
REGEX = ^source::.*json
FORMAT = sourcetype::aws:canaries:summary
DEST_KEY = MetaData:Sourcetype

[set_aws_canary_txt]
SOURCE_KEY = MetaData:Source
REGEX = ^source::.*txt
FORMAT = sourcetype::aws:canaries:detailed
DEST_KEY = MetaData:Sourcetype

[drop_aws_canary_png]
SOURCE_KEY = MetaData:Source
REGEX = ^source::.*png
FORMAT = nullQueue
DEST_KEY = queue

[drop_aws_canary_html]
SOURCE_KEY = MetaData:Source
REGEX = ^source::.*html
FORMAT = nullQueue
DEST_KEY = queue

Again, both applied to HF and Indexers.

On the inputs I have the following:

[aws_sqs_based_s3://canaries]
aws_account = SplunkForwarderRole
aws_iam_role = canaries
index = canaries
interval = 300
s3_file_decoder = CustomLogs
sourcetype = aws:canaries
sqs_batch_size = 10
sqs_queue_region = us-east-1
sqs_queue_url = https://queue.amazonaws.com/1111111111/canaries
disabled = 1

 

The idea being that the input receives the data and sourcetypes it aws:canaries then parsing/transorms alters the sourcetype.

On the SH I can see the files are being source typed correctly (and dropped) however event breaking is not working... the JSON is everyline and it looks like the TXT file breaks at the date.

Anyone configured something similar?

I suspect the event breaking is happening as part of the aws:canaries sourcetype?  Just not sure

Any help appreciated!!

Labels (3)
0 Karma
1 Solution

johnansett
Communicator

In the end, we created two queues and used filters to route the data.

View solution in original post

0 Karma

johnansett
Communicator

In the end, we created two queues and used filters to route the data.

0 Karma
Get Updates on the Splunk Community!

Splunk App for Anomaly Detection End of Life Announcement

Q: What is happening to the Splunk App for Anomaly Detection?A: Splunk is officially announcing the ...

Aligning Observability Costs with Business Value: Practical Strategies

 Join us for an engaging Tech Talk on Aligning Observability Costs with Business Value: Practical ...

Mastering Data Pipelines: Unlocking Value with Splunk

 In today's AI-driven world, organizations must balance the challenges of managing the explosion of data with ...