Getting Data In

Index only Specific Lines from a Strucutred Log File

shocko
Contributor

I’m using Splunk Enterprise 9 with Universal Forwarder 9 on Windows. I'd like to monitor several structured log files but only ingest specific lines from these files (basically each line begins with a well-defined string so easy to create matching regular expression or simple match against it). I’m wondering where this can be achieved?

Q: Can the UF do this natively or do I need to monitor the file as a whole then drop certain lines at the indexer?

0 Karma
1 Solution

PickleRick
SplunkTrust
SplunkTrust

It doesn't work that way.

You should do

TRANSFORMS-netlogon_send_to_nullqueue = netlogon_send_all_to_nullqueue, netlogon_keep_some

And have the netlogon_send_all_to_nullqueue transform send completely _everything_ to nullQueue

[netlogon_send_all_to_nullqueue]
REGEX = .
DEST_KEY = queue
FORMAT = nullQueue

And then keep only some of them - matching the string you want

[netlogon_keep_some]
REGEX = NO_CLIENT_SITE
DEST_KEY = queue
FORMAT = indexQueue

View solution in original post

tscroggins
Influencer

Hi @shocko,

The typical approach discards lines at an intermediate heavy forwarder or indexer by sending them to nullQueue:

# props.conf

[my_sourcetype]
# add line and event-breaking and timestamp extraction here
TRANSFORMS-my_sourcetype_send_to_nullqueue = my_sourcetype_send_to_nullqueue

# transforms.conf

[my_sourcetype_send_to_nullqueue]
# replace foo with a string or expression matching "keep" events
REGEX = ^(?!foo).
DEST_KEY = queue
FORMAT = nullQueue

As with @PickleRick, I've not seen a common use case for force_local_processing. I often say I don't want my application servers turning into Splunk servers, so I prioritize a lightweight forwarder configuration over data transfer. If CPU cores (fast growing files) and memory (large numbers of files) cost you less than network I/O, you may prefer the force_local_processing option; you won't save on disk I/O either way.

If you need a refresher on the functions performed by the uft8, linebreaker, aggregator, and regexreplacement processors, see https://community.splunk.com/t5/Getting-Data-In/Diagrams-of-how-indexing-works-in-the-Splunk-platfor....

shocko
Contributor

@tscroggins  thanks for the steer. I'm close ot getting this working but when I implemenet the transform it drops my event. The even tline looks as follows

SOMEDATA NO_CLIENT_SITE: MYSYSTEM 10.15.37.48

My props.conf is as follows:

[netlogon]
DATETIME_CONFIG =
LINE_BREAKER = ([\r\n]+)
NO_BINARY_CHECK = true
SHOULD_LINEMERGE = false
category = Custom
pulldown_type = 1
TRANSFORMS-netlogon_send_to_nullqueue = netlogon_send_to_nullqueue
                                                                     

My transforms.conf 

[netlogon_send_to_nullqueue]
REGEX = ^(?!NO_CLIENT_SITE).
DEST_KEY = queue
FORMAT = nullQueue

Is it the regEx at fault here? I have been playing with it at regex101: build, test, and debug regex but I cannot see the issue.

0 Karma

tscroggins
Influencer

As configured, the transform will match and discard all events that do not start with NO_CLIENT_SITE. An event starting with SOMEDATA (any string that isn't NO_CLIENT_SITE) would be discarded. Was that your intent?

0 Karma

shocko
Contributor

My intent is that any event message without the string NO_CLIENT_SITE anywhere in it is discarded. 

0 Karma

PickleRick
SplunkTrust
SplunkTrust

It doesn't work that way.

You should do

TRANSFORMS-netlogon_send_to_nullqueue = netlogon_send_all_to_nullqueue, netlogon_keep_some

And have the netlogon_send_all_to_nullqueue transform send completely _everything_ to nullQueue

[netlogon_send_all_to_nullqueue]
REGEX = .
DEST_KEY = queue
FORMAT = nullQueue

And then keep only some of them - matching the string you want

[netlogon_keep_some]
REGEX = NO_CLIENT_SITE
DEST_KEY = queue
FORMAT = indexQueue

shocko
Contributor

OK got it so basically:

  • UF gathers a lines and send to heavy forwarder/indexer
  • Indexer drops all lines except those not matched by the reg ex.

I'll give it a whirl! Thanks @PickleRick  and @tscroggins 

PickleRick
SplunkTrust
SplunkTrust

Firstly - what do you mean by "structured" here. If you mean INDEXED_EXTRACTIONS, the situation is getting complicated because UF does the parsing and the event is not touched after that (except for ingest actions)

If you just mean a well-known and well-formed events, you could try enabling force_local_processing on your UF

force_local_processing = <boolean>
* Forces a universal forwarder to process all data tagged with this sourcetype
  locally before forwarding it to the indexers.
* Data with this sourcetype is processed by the linebreaker,
  aggerator, and the regexreplacement processors in addition to the existing
  utf8 processor.
* Note that switching this property potentially increases the cpu
  and memory consumption of the forwarder.
* Applicable only on a universal forwarder.
* Default: false

 It' s worth noting though that it's not a recommended setting and it not widely used so you can get problems finding support in case anything goes wrong.

shocko
Contributor

I mean structured in terms of each line in the log following a defined structure (space delimited fields) that lends itself to easy parsing.

0 Karma
Get Updates on the Splunk Community!

New Case Study Shows the Value of Partnering with Splunk Academic Alliance

The University of Nevada, Las Vegas (UNLV) is another premier research institution helping to shape the next ...

How to Monitor Google Kubernetes Engine (GKE)

We’ve looked at how to integrate Kubernetes environments with Splunk Observability Cloud, but what about ...

Index This | How can you make 45 using only 4?

October 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with this ...