About jrodman

jrodman · ‎11-04-2016

Search has long lacked a good way to distinguish between a field that is not present vs a field that is present but has an empty-string value for the current record. What I've generally had to do is (modelName="*" OR modelName="") This isn't a full answer because it doesn't tell you how to handle this in a view. Sorry I'm not sure.

jrodman · ‎11-02-2016

If you're writing custom search commands, the update is the python sdk offering significant support for doing so, which should enable you to work with the model without a lot of difficulty. The fundamental behavior of the interaction hasn't changed to my knowledge. Splunk did some work on long-running python processes a few releases ago, but I don't think we "leveraged" it for search commands.

jrodman · ‎10-31-2016

It sure would be great if someone could communicate what behavior is known to correlate with this message. That is, is it a nuisance? Is this a performance affecting situation? do you get incomplete search results?

jrodman · ‎10-29-2016

My viewpoint is that in situations where a workaround can solve a problem with monitoring, probably Splunk should try to auto-solve that problem so workarounds are not needed. Not to set unrealistic expectations: there could be types of problems that would require redisigns of the splunk file monitoring component and could be very expensive and take along time to become available. However, as a rule, we have continuously added fixes and improvements to handle edge cases like this over the years. The linked answer is about one such specific situation. I do not know, at the moment, whether we have shipped improvements to better handle that case since 2014.

jrodman · ‎10-29-2016

For now, I wrote about this a little more over here: https://answers.splunk.com/answers/49663/log-rotation-best-practices.html#answer-468630 However, the aim of writing these was to build content that I hope to hoover in the main splunk web documentation soon.

jrodman · ‎10-29-2016

Splunk has had scalability problems with large numbers of source values, and in the future could have them again as accounts may grow larger. For files named by day, however, we've never seen such problems. If you generate hundreds or thousands of files a minute, named by subsecond, for example, it's more likely to run into problems. There is also the managability concern. How do you zero in on the data you want if source isn't a useful key? For many situations, host and sourcetype are sufficient, but for others you might need to rename, or use wildcards, or extract parts of the path from the source as the main informative element. It's annoying, but you'd really rather solve those problems in Splunk, typically, than struggle with a logging app that crashes because it gets an error when it tries to open it's logfile for creation when the file was already deleted but is still open for writing. (Yes, windows remembers the NAME of files that are open but deleted, and blocks reuse of the NAME). Though I agree, on at least UNIX, stable name + rotate is often the most convenient option for all parties.

jrodman · ‎10-29-2016

On this specific topic, there's no real reason to use copy-truncate for apache. It can be told to reopen its logs to avoid the races. Ironically, the recommended strategy is listed immediately above the topic linked to: http://httpd.apache.org/docs/2.4/logs.html#rotation They are using the rename + recreate pattern.

jrodman · ‎10-29-2016

FWIW, I feel the answer by chris is the far more on-target one for the specific situation. This just felt like a good place to talk about these issues.

jrodman · ‎10-29-2016

The copy-truncate pattern has some quality issues because there is no way to ensure all data is retained. There is an inherent race condition between the logging application and the program performing the copy & truncate. Data can be written to the file after the copy and before the truncate. This data will be lost. Additionally, copy-truncate requires two extra I/Os for every log-write. Every log-write will need to be later read back, and written out again by the copy operation. Therefore, this pattern will exhaust I/O resources more readily. With Splunk specifically, copy-truncate requires handling a large number of additional edge-cases, such as encountering the copy in the process of being built (you would want us to recognize this as an already-handled file), and reading from an open logfile during truncation. The latter problem is potentially not solvable in complex situations. For example, Splunk could be in a situation where it reads the first half of an event (more likely for large events), and then the file is truncated (reduced to zero length) before we can read the second half of an event. Should we send it on as is, potentially delivering a broken half-event to the index? Should we drop it, potentially losing the only half of the data we will ever gain access to? In general, of course, Splunk should be well-behaved to the extent possible in the face of copy-truncate. Also, for applications which log to stderr or other applications which have no support for ever reopening their logfile, there may be no other option for file management than copy-truncate.

jrodman · ‎10-29-2016

Log rotation best practice is to use rename-and-recreate, or time-based logfile names (which in truth is not rotation at all, but still a good practice in some cases.) Rename-and-recreate is the most common pattern, and is used often by both external rotation programs, and by programs that self-rotate. In this pattern, the logging application typically writes to a file with a fixed name. For example if the application is called Frog Decorator, the output might got to /var/log/frog-decorator.log or /opt/frog-decorator/logs/frog-decorator.log etc. If the application is self-rotating, it will at some point (daily, perhaps, or based on size-policy), rename frog-decorator.log to another name, eg frog-decorator.log.1, and then it will open a new frog-decorator.log file and write there. It is possible for self-rotating programs to cause all writes to always land in the stable name, but in multithreaded applications, it is common that some writes will land in the renamed file during the rotation, because continued operation of those threads is valued over neat and clean write patterns to log files. External rename & recreate is essentially the same, but requires a method of co-ordination between the rotation program and the logging application. After the rotation program has renamed the original log file, it must tell the original application that this has occurred, so the logging application knows it is time to re-open the 'frog-decorator.log', causing the application to create the new log, and land its log writes in that location. The typical communication path on unix is to send a signal to the application, often via the Hangup signal (SIGHUP), but to some degree it varies. Some programs provide a tool to tell them to reopen their logfile. For the external rename and recreate it is unavoidable that some log writes will land in the renamed logfile.1, but this is not particularly harmful. However it is essential that the application be notified that it need to recreate the original file BEFORE you compress logfile.1. Remember that "compressing a log file" is actually multiple steps: 1 - Read the contents of the original file 2 - compress those contents in memory 3 - write out the compressed contents to a new file 4 - delete the original file If you "compress" logfile.1 too early, you run the risk of hitting step 4, deleting the original, while the application is still writing to the file. This will cause data loss. In general, compressing the first logfile can introduce races with the original app, as well as some races with Splunk. I strongly suggest waiting until the second backup, logfile.2 to begin compression. The other high-quality pattern, with its own tradeoffs for management, is to simply name the logfiles uniquely on original creation. For example, if the application "rotates" daily, it could create filenames by the date, such as frog-decorator-2016-05-03.log . When the day changes, it can simply create a new logfile and redirect its output to the new file. Time-based filenames can be valuable especially on Microsoft Windows platforms, where renaming open files is a very complicated and difficult business. It is also well-behaved, like rename-and-recreate, in that the data persistence story is very simple. The copy-truncate pattern is lower quality, because there is no way to ensure all data is retained. There is an inherent race condition between the logging application and the program performing the copy & truncate. Data can be written to the file after the copy and before the truncate. This data will be lost. Additionally, copy-truncate requires two extra I/Os for every log-write. Every log-write will need to be later read back, and written out again by the copy operation. Therefore, this pattern will exhaust I/O resources more readily. With Splunk specifically, copy-truncate requires handling a large number of additional edge-cases, such as encountering the copy in the process of being built (you would want us to recognize this as an already-handled file), and reading from an open logfile during truncation. The latter problem is potentially not solvable in complex situations. For example, Splunk could be in a situation where it reads the first half of an event (more likely for large events), and then the file is truncated (reduced to zero length) before we can read the second half of an event. Should we send it on as is, potentially delivering a broken half-event to the index? Should we drop it, potentially losing the only half of the data we will ever gain access to? In general, of course, Splunk should be well-behaved to the extent possible in the face of copy-truncate. The above specific situation was a case where we needed to improve and did so. Also, for applications which log to stderr or other applications which have no support for ever reopening their logfile, there may be no other option for file management than copy-truncate.

jrodman · ‎10-29-2016

I've been told that the copy-truncate pattern is a poor choice for log rotation, and that it should only be used when there is no other choice. Why is this?

jrodman · ‎10-27-2016

It's a possibility. Assuming we're talking about "indexed fields" here, because the INDEXED_EXTRACTIONS function doesn't have the necessary flexibility to handle this dataset. It moves the "problem" to parsing time, which could be better or worse depending upon many factors. If this data is a large portion of the incoming datastream, the rate of indexing could slow. It's harder to troubleshoot index-time transforms, and even harder to correct errors because the data is already produced. It will make the events in the journal significantly larger, which will slow search somewhat in its own way -- larger events means more I/O and more decompression for same events. The larger events problem can be offset by retrieving fewer events sometimes. If there's a significant collision in the values present in these fields (ie, many fields can have values like 0 and 1), then making them indexed will allow Splunk to retrieve a much smaller event set, so the performance could be significantly better.

jrodman · ‎10-27-2016

Got it; many types of events with different extractions, all of which are syntactically similar. Splunk's support for handling a lot of implicit information like this is less than stellar. If I had my way, we'd have shipped some programmtic extensions to the parsing pipeline that you could use to cause indexed extractions to fully handle this dataset. I'm not getting my way though. Splunk does a good job of telling you if any particular extraction is very slow, but it won't do a good job of telling you exactly what the costs are for a large set of extractions, none of which are that slow. Despite this, you can get an idea with a large dataset and the job inspector. If you're willing to pay some cost per event for this data, that's maybe fine. It gets a little messy if you're triggering those costs when you don't want to, like the data cohabitates in an index with other commonly used data, and users aren't categorically excluding these events from their searches. The cost in 'fast mode' where none of the fields for this data are wanted should not be significant, but for verbose mode it could become unfortunate. Again, possibly all obvious. Usually for this type of complex scenario, I challenge whether it might be worth changing or preprocessing to self-describing format, but you've probably already rejected that, based on what you've said so far.

jrodman · ‎10-26-2016

I can't find any limit in the codebase that would prevent this from working. It doesn't seem very manageable, and sounds like it could have pretty significant performance impact. Usually, a need for this many extractions arises when an event stream has a large number of similar fields. If so, have you considered the repeat-match functions where a single regex can sometimes extract many fields? It's also a little awkward to have all of these in one REPORT line. You can do it, but are they all for one single purpose? The performance of multiple REPORT lines will be identical to one giant line, and it's hard to imagine a 280 step dependency graph of extractions.

jrodman · ‎10-25-2016

The search-optimizer runs at the initial comprehension of the search string, so should only be relevant or be used on a search head. As always it may be useful to bring search config to indexers just for the case of troubleshooting by launching searches directly on an indexer, but that's very much an edge-case scenario.

jrodman · ‎10-22-2016

This is a difficult one to answer 😜 I should mention that if for some reason you want to disable on a search-by-search basis, you can by adding a command to the search |noop search_optimization=false

jrodman · ‎10-13-2016

Splunk the software may currently permit this configuration, but it should not be used. Among other possibilities, scheduled search time-to-run expressions are not timezone-independent.

jrodman · ‎10-13-2016

Okay following up to myself: One reason that a cluster should have agreeing timzeones is that search scheduling is distributed across the cluster. However, the time to execute searches is expressed in the local timezone of the search head. This means you can have problems like a user scheduling a search at one time, but the scheduler evaluating it in a different timezone. Also, when the captaincy of the cluster moves, it could change timezones, which I cannot even imagine a strategy to handle.

jrodman · ‎10-13-2016

This is a bit odd. I'm not aware of any requirement that search heads have timezones that agree with indexers. I'd prefer it for administration sanity however.

jrodman · ‎10-11-2016

For what it's worth, a lot of people I trust are now quite bullish on ext4 with lvm2 snapshots as a completely legit solution. I don't have direct experience.

jrodman · ‎10-03-2016

This should really be an independent question, though probably this answer should link to that information. When interacting at the api level, the client has an explicitly choice of the first command, and can select rtsearch instead of search. However, you'll have to select different values for et / lt typically, such as the above discussed rt-5m.

jrodman · ‎07-26-2016

It's a common misconception that indexed fields have notably different performance characteristics from text tokens. They don't. We look them up the same way. Indexed fields only behave notably differently when the field name and value together are drastically less common than the value alone. However, the fields source, sourcetype, and host in Splunk are afforded a fairly special place and afford much more powerful abilities to apply implicit processing by data category, among other things. sourcetype is best thought of "a type of data", such as the kind of data produced by a particular application, or for complex applications one type of datastream it produces. Something that you can create a rich configuration to automatically extract further data from by its format and structure.

jrodman · ‎07-26-2016

post cleanup, you might want to try 'splunk validate files' to be sure the files on disk now match the files in the provided manifest.

jrodman · ‎06-23-2016

The limitation was that pulling the set of all sources, for example, from all search peers, and then merging all that data on the search head lead to Bad Situations. The total amount of information that's being processed here is certainly a data quantity that could be handled on splunk, but |metadata was built to be a quick preview tool in the UI, and (used to) operate largely in-memory, so these large datasets were operating in an in-memory fashion inside main splunkd, which would cause it to explode and fall over. At some point between then and now, I believe metadata was banished to a proper search process, which causes some of the concerns to go away. However there was also a sort of unreasonable overhead in maintaining complete index-level information on all sources/sourcetypes/etc so at some point (~5.0 or so?) we stopped maintaining complete index-level records of all sources, for example, which meant we couldn't efficiently answer the query for "what are all the sources". At that time, the intent was to build an intentionally incomplete index-level dataset with low maintenance overhead, and answer queries from that. I'm not sure of the current status.

jrodman · ‎06-21-2016

Is the claim here that for usability reasons you should not set up members with different timezones? It seems like you could definitely have a case where particular users are directed to particular sets of search heads, and you could also configure all users with per-user timezones. So I don't understand why this is needed.

Posts	949
Solutions	172
Karma Given	397
Karma Received	987
Member Since	‎01-15-2010

Online Status	Offline
Date Last Visited	‎06-05-2020 02:02 AM

Why is copy-truncate a low-quality log-rotation st...

In LDAP integration for user authentication, what ...

Can I limit the total memory used by Splunk on my ...

After upgrading to Splunk 6.1, I have searches ret...

What is a splunk search in "zombie" state? What d...

How can I run a windowed realtime seach from the c...

Changes to search configuration (field extractions...

I've updated to the latest version of the PDF Serv...

Why doesn't the upload image feature of answers wo...

How can I install a splunk 4.2+ license from the c...

Re: How to edit multiselect token to include a fie...

Re: Prevent splunk from streaming results to a cus...

Re: Why am I getting "message Max Raw Size Limit E...

Re: Why is copy-truncate a low-quality log-rotatio...

Re: Why is copy-truncate a low-quality log-rotatio...

Re: Log rotation best practices

Re: Log rotation best practices

Re: Log rotation best practices

Re: Why is copy-truncate a low-quality log-rotatio...

Re: Log rotation best practices

Why is copy-truncate a low-quality log-rotation st...

Re: Is there a maximum number of transforms that c...

Re: Is there a maximum number of transforms that c...

Re: Is there a maximum number of transforms that c...

Re: If you're running 6.5.0, you should disable se...

Re: If you're running 6.5.0, you should disable se...

Re: Is it possible to have different search head c...

Re: Is it possible to have different search head c...

Re: Is it possible to have different search head c...

Re: Recommended filesystem for hot/warm buckets

Re: Querying a Real Time search

Re: What is best way to use sourcetype with HTTP E...

Re: Splunk Universal Forwarder On AIX Fails To Sta...

Re: Accuracy of metadata command in large environm...

Re: Is it possible to have different search head c...

Join the Conversation