Getting Data In

Splunk doesn't format Sysmon JSON correctly

Vyber90
Explorer

That's the problem. I have a Sysmon JSON to examine but, although in the "Add Data" section everything looks OK, once I get to the search time Events are joined and split without pattern.

After I thought about it a second, I noticed that the document did not have timestamps and Splunk was complaining about it, so I solved it but the issue was still there.

The problem looks like this:

Vyber90_0-1624704356997.png

Sometimes the Event begins OK but melts with other events before its end, and others (like the upper example) don't have a heading.

I'm not using any plug-ins or apps. Just clicked on Add Data, selected the document as a non-timestamped JSON, and started searching. How would you solve this? the JSON to be analyzed is downloadable from Blue Team Labs Online.

 
Labels (3)
0 Karma
1 Solution

verbal_666
Contributor

You have to check:

1) the inputs.conf, which sourcetype are you using?
2) the size of events... i have many json which are larger than 10.000 bytes per event, so Splunk truncate them (by default, to lighten the aggregation/parsing process)!
3) if, in point 2, events are more than 10.000 bytes each, you have to aggregate them in props.conf with a TRUNCATE = 0 in the [searchtype_used] stanza,

TRUNCATE = <non-negative integer>
* The default maximum line length, in bytes.
* Although this is in bytes, line length is rounded down when this would
  otherwise land mid-character for multi-byte characters.
* Set to 0 if you never want truncation (very long lines are, however, often
  a sign of garbage data).
* Default: 10000

You can watch truncated events in splunkd.log of the Indexer(s). If you have an HF, could be it to truncate before Indexer(s),

WARN LineBreakingProcessor - Truncating line because limit of 10000 bytes has been exceeded with a line length >= 65536 - data_source="/xxxxxxxxxx/xxxxxxxxxx", data_host="xxxxxxxxxx", data_sourcetype="xxxxxxxxxx"

 4) if the problem is not the truncate, maybe you could use the default "_json" sourcetype or the

INDEXED_EXTRACTIONS = <CSV|TSV|PSV|W3C|JSON|HEC>

method to parse the correct events.

 

You have to try which method is better.

View solution in original post

0 Karma

verbal_666
Contributor

You have to check:

1) the inputs.conf, which sourcetype are you using?
2) the size of events... i have many json which are larger than 10.000 bytes per event, so Splunk truncate them (by default, to lighten the aggregation/parsing process)!
3) if, in point 2, events are more than 10.000 bytes each, you have to aggregate them in props.conf with a TRUNCATE = 0 in the [searchtype_used] stanza,

TRUNCATE = <non-negative integer>
* The default maximum line length, in bytes.
* Although this is in bytes, line length is rounded down when this would
  otherwise land mid-character for multi-byte characters.
* Set to 0 if you never want truncation (very long lines are, however, often
  a sign of garbage data).
* Default: 10000

You can watch truncated events in splunkd.log of the Indexer(s). If you have an HF, could be it to truncate before Indexer(s),

WARN LineBreakingProcessor - Truncating line because limit of 10000 bytes has been exceeded with a line length >= 65536 - data_source="/xxxxxxxxxx/xxxxxxxxxx", data_host="xxxxxxxxxx", data_sourcetype="xxxxxxxxxx"

 4) if the problem is not the truncate, maybe you could use the default "_json" sourcetype or the

INDEXED_EXTRACTIONS = <CSV|TSV|PSV|W3C|JSON|HEC>

method to parse the correct events.

 

You have to try which method is better.

0 Karma
Get Updates on the Splunk Community!

Understanding Generative AI Techniques and Their Application in Cybersecurity

Watch On-Demand Artificial intelligence is the talk of the town nowadays, with industries of all kinds ...

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

Using the Splunk Threat Research Team’s Latest Security Content

REGISTER HERE Tech Talk | Security Edition Did you know the Splunk Threat Research Team regularly releases ...