Getting Data In

Index only Specific Lines from a Strucutred Log File

shocko
Contributor

I’m using Splunk Enterprise 9 with Universal Forwarder 9 on Windows. I'd like to monitor several structured log files but only ingest specific lines from these files (basically each line begins with a well-defined string so easy to create matching regular expression or simple match against it). I’m wondering where this can be achieved?

Q: Can the UF do this natively or do I need to monitor the file as a whole then drop certain lines at the indexer?

0 Karma
1 Solution

PickleRick
SplunkTrust
SplunkTrust

It doesn't work that way.

You should do

TRANSFORMS-netlogon_send_to_nullqueue = netlogon_send_all_to_nullqueue, netlogon_keep_some

And have the netlogon_send_all_to_nullqueue transform send completely _everything_ to nullQueue

[netlogon_send_all_to_nullqueue]
REGEX = .
DEST_KEY = queue
FORMAT = nullQueue

And then keep only some of them - matching the string you want

[netlogon_keep_some]
REGEX = NO_CLIENT_SITE
DEST_KEY = queue
FORMAT = indexQueue

View solution in original post

tscroggins
Influencer

Hi @shocko,

The typical approach discards lines at an intermediate heavy forwarder or indexer by sending them to nullQueue:

# props.conf

[my_sourcetype]
# add line and event-breaking and timestamp extraction here
TRANSFORMS-my_sourcetype_send_to_nullqueue = my_sourcetype_send_to_nullqueue

# transforms.conf

[my_sourcetype_send_to_nullqueue]
# replace foo with a string or expression matching "keep" events
REGEX = ^(?!foo).
DEST_KEY = queue
FORMAT = nullQueue

As with @PickleRick, I've not seen a common use case for force_local_processing. I often say I don't want my application servers turning into Splunk servers, so I prioritize a lightweight forwarder configuration over data transfer. If CPU cores (fast growing files) and memory (large numbers of files) cost you less than network I/O, you may prefer the force_local_processing option; you won't save on disk I/O either way.

If you need a refresher on the functions performed by the uft8, linebreaker, aggregator, and regexreplacement processors, see https://community.splunk.com/t5/Getting-Data-In/Diagrams-of-how-indexing-works-in-the-Splunk-platfor....

shocko
Contributor

@tscroggins  thanks for the steer. I'm close ot getting this working but when I implemenet the transform it drops my event. The even tline looks as follows

SOMEDATA NO_CLIENT_SITE: MYSYSTEM 10.15.37.48

My props.conf is as follows:

[netlogon]
DATETIME_CONFIG =
LINE_BREAKER = ([\r\n]+)
NO_BINARY_CHECK = true
SHOULD_LINEMERGE = false
category = Custom
pulldown_type = 1
TRANSFORMS-netlogon_send_to_nullqueue = netlogon_send_to_nullqueue
                                                                     

My transforms.conf 

[netlogon_send_to_nullqueue]
REGEX = ^(?!NO_CLIENT_SITE).
DEST_KEY = queue
FORMAT = nullQueue

Is it the regEx at fault here? I have been playing with it at regex101: build, test, and debug regex but I cannot see the issue.

0 Karma

tscroggins
Influencer

As configured, the transform will match and discard all events that do not start with NO_CLIENT_SITE. An event starting with SOMEDATA (any string that isn't NO_CLIENT_SITE) would be discarded. Was that your intent?

0 Karma

shocko
Contributor

My intent is that any event message without the string NO_CLIENT_SITE anywhere in it is discarded. 

0 Karma

PickleRick
SplunkTrust
SplunkTrust

It doesn't work that way.

You should do

TRANSFORMS-netlogon_send_to_nullqueue = netlogon_send_all_to_nullqueue, netlogon_keep_some

And have the netlogon_send_all_to_nullqueue transform send completely _everything_ to nullQueue

[netlogon_send_all_to_nullqueue]
REGEX = .
DEST_KEY = queue
FORMAT = nullQueue

And then keep only some of them - matching the string you want

[netlogon_keep_some]
REGEX = NO_CLIENT_SITE
DEST_KEY = queue
FORMAT = indexQueue

shocko
Contributor

OK got it so basically:

  • UF gathers a lines and send to heavy forwarder/indexer
  • Indexer drops all lines except those not matched by the reg ex.

I'll give it a whirl! Thanks @PickleRick  and @tscroggins 

PickleRick
SplunkTrust
SplunkTrust

Firstly - what do you mean by "structured" here. If you mean INDEXED_EXTRACTIONS, the situation is getting complicated because UF does the parsing and the event is not touched after that (except for ingest actions)

If you just mean a well-known and well-formed events, you could try enabling force_local_processing on your UF

force_local_processing = <boolean>
* Forces a universal forwarder to process all data tagged with this sourcetype
  locally before forwarding it to the indexers.
* Data with this sourcetype is processed by the linebreaker,
  aggerator, and the regexreplacement processors in addition to the existing
  utf8 processor.
* Note that switching this property potentially increases the cpu
  and memory consumption of the forwarder.
* Applicable only on a universal forwarder.
* Default: false

 It' s worth noting though that it's not a recommended setting and it not widely used so you can get problems finding support in case anything goes wrong.

shocko
Contributor

I mean structured in terms of each line in the log following a defined structure (space delimited fields) that lends itself to easy parsing.

0 Karma
Get Updates on the Splunk Community!

Automatic Discovery Part 1: What is Automatic Discovery in Splunk Observability Cloud ...

If you’ve ever deployed a new database cluster, spun up a caching layer, or added a load balancer, you know it ...

Real-Time Fraud Detection: How Splunk Dashboards Protect Financial Institutions

Financial fraud isn't slowing down. If anything, it's getting more sophisticated. Account takeovers, credit ...

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...

 Are you tired of troubleshooting delays caused by siloed frontend, application, and network data? We've got a ...