About dwaddle

dwaddle · ‎02-21-2017

My first question would be "how did you collect w3c inside of JSON?" Normally, we'd allow a forwarder to pick up w3c files directly and use INDEXED_EXTRACTIONS=w3c and that'd be the end of it.

dwaddle · ‎02-20-2017

With the way the app is currently designed, you cannot avoid saving the password in plain text. Core splunk provides a built-in credentials management endpoint that could be leveraged, but it is clearly not. See http://blogs.splunk.com/2011/03/15/storing-encrypted-credentials/ . The code that needs to be changed to support using the passwords endpoint in lieu of clear text passwords is all Python code, so it's something that could be changed with some effort. BUT the EULA for the Bamboo app does not allow for derivative works, as it it licensed under the Splunk EULA. If you want this changed in a way that is compatible with the license, you will have to file an enhancement request. https://answers.splunk.com/answers/4844/how-can-i-submit-an-enhancement-request.html

dwaddle · ‎02-20-2017

https://answers.splunk.com/answers/50048/splunk-consuming-email-alerts-from-other-monitoring-tools.html You'll need a forwarder on premises that can forward into splunk cloud.

dwaddle · ‎02-20-2017

why would Splunk stop running after a restart if the Linux server doesn't authenticate to AD? Unix has a "setuid" system call that allows you to change the user authority a process is running under - this is how Splunk started as root "becomes" the desired user. (It is also the basis for the su and sudo commands.) But, to allow a process to setuid to a user .. that user must exist! This is not an authentication thing, but a user enumeration thing. if it doesn't authenticate I see no reason as to why the password can't change. Correct. You can change the AD account password repeatedly and it does not affect Splunk's ability to be started by the OS, because the start process does not "log in". the account is this same service account and if that ID/PASSWORD configured must be good or else when it binds to AD it won't work Again, correct. If you do change the AD account's password and Splunk attempts to use that as the LDAP bind account, a password is required there.

dwaddle · ‎02-20-2017

Vague question is slightly vague. But, there are some ways you can do it. The ... | script search command. http://docs.splunk.com/Documentation/Splunk/6.5.2/SearchReference/script A custom search command. http://dev.splunk.com/view/python-sdk/SP-CAAAEU2 A custom REST API endpoint that runs your script I think more information is needed to be able to provide more useful advice.

dwaddle · ‎02-19-2017

Yes, this is a Unix question and not a Splunk one. Fundamentally, how Linux daemons and Windows services work is similar but there are some rather large differences. One being that, unlike Windows Services, a Linux daemon does not "log in" when the daemon starts. The root user on Linux is able to "become" the service account user without any login password required. When you did this: splunk enable boot-start -user someuser Then you set up Linux to initially start Splunk as root, and the for Splunk itself to use root's ability to become another user to become someuser . Whether someuser is an Active Directory account or not, or what its password is, is not material to the ability for Splunk to run as a process under this user's authority on the operating system. You can change the password, even delete the account ... and the running process will stay running. (Granted, if you delete the account, then restarting the process later will be much more difficult)

dwaddle · ‎02-19-2017

So let me make sure I understand this ... You're ingesting some data, and then running a search that is just: index=akamai and then comparing the speed in smart versus fast mode? And you're looking at the total system CPU usage and only seeing a small amount of CPU in use on your search head? I think several fundamental concepts need to be reinforced. First, a search of the form index=xxx is one of the densest searches you can possibly do. You are asking Splunk to bring back ALL of the events in the index for the time range, without any type of statistics or reporting commands being run. This is guaranteed to saturate indexer CPU core(s) with decompression. And, because you're asking Splunk to return events in a table view, most of the batch mode optimizations cannot take effect (which I will try to cover briefly) Second, what are you going to do with these 9+ million records? Scan through them by hand, using eyeballs? Even if you can scan a page of 100 events per second, Splunk is still out running you. Third, you should not expect to see heavy CPU utilization for field extraction at your search head, but rather at your indexer(s). Fourth, I don't see why you would expect INDEXED_EXTRACTIONS to speed up a super dense search that does not have any search terms built into it. So let's try to build some concepts and work from there. Performance of a search is dependent on the type of search you're trying to do (dense versus sparse, "reporting" versus not, and your fields of interest). A super dense search without a reporting command in smart mode is going to perform very differently than that same command had you only added a reporting command like ... | stats count . When you write a search and ask Splunk to execute it, the search head dispatches the job in parallel to all of your indexers. Each indexer allocates a minimum of one single-threaded search process for your search. In batch mode - where possible, up to your search parallelization limit - additional single-threaded search processes will be started. The first key here is that batch mode is not guaranteed - Splunk knows that for certain types of searches the order the events get returned in does not matter. For example, a reporting search of | stats count by field1, field2, field3 has no strict ordering requirement because the stats command can count things without requiring a strict time ordering. But, if you leave off the stats command (or another reporting command) then Splunk realizes that you're piping a table of raw events to a user, who expects them to be sorted in a certain time order ... reducing the effectiveness of batch mode. Field extractions happen on the indexer. No amount of field extraction will effect CPU usage on the search head dramatically. The search head sends its configuration bundle to the indexers via the bundle replication process, and the indexer processes use that configuration bundle as the configuration files for search processes launched in support of that search head. All searches begin by converting your specified search into a "literal search" (see litsearch in the Job Inspector), along with a remoteSearch (the search to be run at the indexers) and a reportingSearch (if applicable). The search at the indexers begins generally with picking out "in scope" buckets by looking first at your selected indexes and selected time range. Any buckets that fall outside of the selected indexes or do not overlap the search time range are considered out of scope. In-scope buckets are then checked via the tsidx lexicon against the LISPY generated by Splunk against everything up to the first | character of the remoteSearch. (Ignoring some push-left activity in 6.5 that attempts to push terms as far left as possible). Those events that match the LISPY expression are seeked-to in the raw data and decompressed and then passed along for field extraction. There's more to consider here, and part of your overall perceived problem may be the amount of time required to do field extraction. But, I think you need to have a more realistic search that does what your user is trying to actually do. Don't automatically assume that "simpler searches can be used to performance model more complex ones".

dwaddle · ‎02-16-2017

I would upvote this 50 times if it would let me.

dwaddle · ‎02-15-2017

dwaddle · ‎02-15-2017

I may be suffering from too much negative feedback,but ohSnap has only ever brought me tears. Last time, it was a cluster that decided to misbehave over Thanksgiving holiday because one app out of hundreds had timestamps being parsed wrong. This led to several new hot buckets per minute, most of which had exactly one event. I'll concede that IF you know you have well-ordered timestamps and a low retention and a high enough value for maxHotBuckets that ohSnap may not be bad. But every time it and I have crossed paths I have wound up in tears.

dwaddle · ‎02-15-2017

No, I'm pretty sure this does work, as long as you're willing to live with prestats=true . Try this on for size: | tstats prestats=true count where index=_internal by index | tstats prestats=true append=true count where index=* by index | stats count by index

dwaddle · ‎02-14-2017

As a general rule, if you get an assertion that's a sign you need to submit a support case. To truly look into an assertion requires access to the source code, and few if any of us in the community have such rights.

dwaddle · ‎02-14-2017

I will share a "clever trick" I use to decide how a whitelist / blacklist combination is going to work on a directory structure. Use the unix find command as follows: find $PATH -print | egrep $WHITELIST | egrep -v $BLACKLIST Things that survive this will be processed by the monitor stanza in question.

dwaddle · ‎02-14-2017

2017 update. I'm not sure this is even a valid question any more. Every decent Linux distribution -and- modern copy of Windows supports snapshots natively as part of the operating system. In today's world, I can't think of a good reason why you can't use snapshots as your primary backup mechanism, unless you are still using Windows 2003 or Red Hat 5. In which case, upgrading to a modern operating system will give you snapshotting natively.

dwaddle · ‎02-14-2017

I don't understand the question unfortunately. When you build a lookup file to maintain state, the amount of time things stay in that lookup file are totally up to how you build your searches that maintain the lookup file. Also, this is not a real time search, but a scheduled one - you can schedule it to run very frequently, but it is not real time.

dwaddle · ‎02-14-2017

For maxHostSpanSecs, Do 86399, do 86401 ... DO NOT DO 86400 (or 3600). If set to 3600 (1 hour), or 86400 (1 day), becomes also the lower bound of hot bucket timespans. Further, snapping behavior (i.e. ohSnap) is activated, whereby hot bucket boundaries will be set at exactly the hour or day mark, relative to local midnight. The thing that usually happens when you enable ohSnap (which notice you can't do 86400 without ohSnap coming on implicitly) is that you'll make a large number of small buckets. In a cluster, this can bring the whole cluster to its knees and break things quite terribly.

dwaddle · ‎02-13-2017

So let's establish some invariants. The custom command protocol - whether V1 or V2 - must pass all search results with all fields and values out to the custom command and back again. Because Splunk cannot predict what your custom command is going to do, this is necessary. If you are passing several thousand results out and back, this could be a substantial amount of data. External lookups can (and do) pass less information out to the external command and back. Splunk knows this is a lookup use case and - by definition - a lookup should be idempotent. Think of a lookup as a function f(x,y,z) for field values x, y, and z - the lookup should always return the same result for the same three values of x, y, and z. Not only are the inputs to an external lookup reduced to a distinct set of values for the input fields (and only the input fields) -- the outputs can be cached. Yes, Splunk will start more processes - probably due in part to preview during the display of events. Threads vs processes is a non-starter. Your code does not run within Splunk's address space - you have a process boundary between you and the search process for a reason. Any improvement to the interaction between external enrichment features - be they lookups, or custom commands - is going to require engineering changes in Splunk which will take time and analysis. I don't know what problem you're trying to solve, but I think the answer to your question is not "which of these is fastest" but "which is the most accurate?". If your relationship between your inputs and your outputs is truly idempotent, then dynamic lookups are your most correct answer. If it's not idempotent - or you need every result row - then a custom search command may be the better choice.

dwaddle · ‎02-13-2017

Confession, TL;DR ... I didn't read all the above. So if you've covered this, then my apologies. One big difference in the "dynamic lookup" protocol and the "custom command" protocol is that Splunk knows one is doing a lookup and the other might not be. A dynamic lookup will be handed on stdin basically a values(input_fields) one entry for all distinct combinations of input fields. Splunk can then take the dynamic lookup's output and apply it at needed to the result rows. A custom command has to be handed ALL of the result rows regardless - because Splunk can't guarantee what the custom command is going to do. Also don't forget the impact of file cache on restarting the lookup child process more than once. That could make a positive performance difference.

dwaddle · ‎02-08-2017

To add to Rich's excellent advice ... repeat the manta, "Splunk is not an SQL database" several times. The join command is useful in a few limited cases, but most of the time there are profoundly better ways of solving your problem than trying to use join. But beyond that I'll admit I'm having trouble following your search - especially without some sample data

dwaddle · ‎02-07-2017

I would suggest that you look at how to report a possible vulnerability at https://www.splunk.com/page/securityportal. Report it there, and the ProdSec team will review as needed and get back to you.

dwaddle · ‎02-06-2017

Might have a look at http://www.duanewaddle.com/splunking-bash-history/ for ideas. Some permissions problems and such may remain, but it's perhaps better?

dwaddle · ‎02-06-2017

Two approaches: Use | inputlookup <kvstore_collection> | eval _key = XXXsave_key | outputlookup collection_backup.csv in order to make a backup file for a collection. Be wary of complex data types and make sure you test restoring. Write something that uses the REST API to hit the kvstore directly and pulls all the documents and writes them to a file. Then you also have to build and test a restorer.

dwaddle · ‎02-05-2017

correct. Search heads don't particularly care about how many forwarders are connected to the indexer. But, if you are going to add a bunch of forwarders, then you should be looking at adding a deployment server to your design.

dwaddle · ‎02-05-2017

First of all, a comment that Geo-IP is sometimes notoriously inaccurate when you consider real-life things like cellular connections and roaming and so forth. You also need to make sure that you keep your GeoIP database up to date. (See http://www.georgestarcher.com/splunk-updating-the-geoip-database/) But, if we ignore these issues ... The key here is how you define (and store) "usual". What you don't want to have to do is run searches over a large time interval to define a user's pattern of normalcy - so we should save some state in a lookup file. You might define usual as the single most-frequently used, or possibly the top over the past XX days. But, however you define normal the goal is to make a scheduled search that builds and maintains a lookup file defining normalcy. One example of using a lookup for this purpose is here -> https://answers.splunk.com/answers/422889/how-to-search-for-newly-added-servers-by-comparing.html and another is in a .conf talk that @starcher and I did in .conf 2015. See: http://conf.splunk.com/session/2015/recordings/2015-splunk-38.mp4 http://conf.splunk.com/session/2015/conf2015-LookupTalk.pdf

dwaddle · ‎02-05-2017

I would suggest some homework first. Have a look at the Distributed Deployment guide, perhaps starting here: http://docs.splunk.com/Documentation/Splunk/6.5.2/Deploy/Implementationoverview This configuration is a highly common, typical small Splunk configuration. You: Disable the web interface on the indexer Configure the search head to act as a search peer of the indexer Configure the search head to forward its _internal and other local logs to your indexer This design is well-covered in the Distributed Deployment guide linked above as well as in the Splunk System Administration class. If you have specific questions about deploying this design, I would suggest a more specific follow-up question (or questions).

Posts	1304
Solutions	308
Karma Given	3110
Karma Received	2352
Member Since	‎04-30-2010

Online Status	Offline
Date Last Visited	‎08-05-2024 11:10 AM

Governance and Licensing for Add-on Builder develo...

Why am I not getting introspection data for these ...

Splunk hot buckets on S3 w/ FUSE and s3fs

metadata command giving different answers on diffe...

How can I change the debug level on a Universal Fo...

BucketMover - freeze candidate $BUCKET_NAME is alr...

Using a scripted input's passAuth token for CLI ac...

How can I programatically generate list of "OR" te...

Does stats distinct_count have a limit of distinct...

With coldToFrozenScript, what is responsible for d...

Re: Possible to define a sub-sourcetype?

Re: Splunk Add-on for Bamboo: How to avoid the pas...

Re: Splunk cloud ingest emails

Re: With my current Splunk environment running on ...

Re: How to trigger a backend script via Splunk UI ...

Re: With my current Splunk environment running on ...

Re: How to improve performance of search time fiel...

Re: search broken in splunk 6.5

Re: Joining accelerated data models using tstats

Re: Could I set the the time limit on Hot/Warm buc...

Re: Joining accelerated data models using tstats

Re: Universal Forwarder Crash _initCrcLen' failed.

Re: How to edit my inputs.conf to blacklist a dire...

Re: How to back up hot buckets ?

Re: How to search currently active VPN sessions?

Re: Could I set the the time limit on Hot/Warm buc...

Re: Do custom search commands have worse performan...

Re: Do custom search commands have worse performan...

Re: need help to optimize splunk query from databa...

Re: security - ssl rest api not closed on /dev/zer...

Re: What folder permission is need to monitor bash...

Re: How to backup KV Store for specific lookups?

Re: Splunk Configuration for Search Head, Indexer ...

Re: How to set up an alert for VPN user who connec...

Re: Splunk Configuration for Search Head, Indexer ...

Are you a member of the Splunk Community?