Splunk Search

Field Aliasing and any performance impact


Good morning! I am having to parse out Bro log files and with the help of the forum I was more than successful at doing so... Here is another question though. There are many bro source types but they all have the same first 7 elements. The rest of them I am just naming f8 - f21. These are the variable fields so what I thought I could do for each specific source type (bro_conn, bro_dns etc..) is create field aliases for these remaining fields. Here is my props.conf:

REPORT-format = BaseBroFields, TrashComments

FIELDALIAS-f8 = f8 AS dns_field8
FIELDALIAS-f9 = f9 AS dns_field9

FIELDALIAS-f8 = f8 AS conn_field8
FIELDALIAS-f9 = f9 AS conn_field9

And my transforms.conf:

REGEX = ^\s*#
DEST_KEY = queue
FORMAT = nullQueue

DELIMS = "\t"
FIELDS = "","conn_id","src_ip","src_port","dest_ip","dest_port","protocol","f8","f9","f10","f11","f12","f13","f14","f15","f16","f17","f18","f19","f20","f21","f22","f23","f24"

So you see in props.conf I am doing field alias for the remaining fields that are specific to a source type (The naming of these fields is just for testing, I will eventually make them CIM compliant). This works just fine. I am wondering what, if any performance implications there will be at scale? We are in fact in a distrubted environment and I would be putting this config only on the search heads so this does not index this way.

Any thoughts, ideas, comments are more than welcome.

PS - For those reading this and wondering why I am not using the Bro App for IDS... For one it is only for version <=2.4 and we are running 2.5. But beyond that we are aggregating the bro sensor logs which strips the logs of the header necessary to name bro fields in the app. If anyone has any thoughts are this I am also more than happy to hear them.


0 Karma


Some generic pointers:

  • [(?::){0}bro_*] is technically not supported
  • I'd never set TRUNCATE = 0, this leaves the door open for broken things on the input side to also break Splunk. Pick a large enough number, then add a zero or two... that'll at least allow Splunk to chop up utterly broken events instead of trying to cram infinite chars into an event

On the topic of field aliases, while searching isolated of other data this approach will work fine. If you're in a CIM-heavy environment (you are), each TA affects each other TA. Whether that's a problem or not depends on your entire environment and the overall use of CIM fields, mostly in eventtypes.
If the only motivation for this approach is laziness / "I don't have to list out the default fields a dozen times" then I'd skip the aliases and list out a dozen sets of per-sourcetype fields. The one-time investment of writing them out a dozen times will be worth it.

0 Karma


Not a lot of bro, no... here's a pointer re wildcard sourcetypes: http://blogs.splunk.com/2014/07/31/quick-tip-wildcard-sourcetypes-in-props-conf/
TL;DR: It mostly works, shouldn't be relied on, and large customers should feature request a supported documented sourcetype wildcarding mechanism hint 😄

0 Karma


Thank you for this guidance. I will swallow me pride and list them over and over :). I am also getting rid of [(?::){0}bro_]. I will be more explicit in my reference to a source type. You info has also led me to realize that the bro app is not supported since I took [(?::){0}bro_] from that app.

Have you ever had to parse bro logs?

0 Karma
Did you miss .conf21 Virtual?

Good news! The event's keynotes and many of its breakout sessions are now available online, and still totally FREE!