About vliggio

vliggio · ‎11-08-2022

Ok, I know this is very late, but it's been a long time since I was involved with this tool. My team (and genius programmer, Kal) built this, and would love for someone to own it. Reach out to Kal on github, and maybe he can spare some cycles. https://github.com/kalpatel01/appetite is his fork from our original opensource project.

vliggio · ‎05-15-2020

The key to Splunk performance is ensuring that you reduce a search to the minimum amount of results as soon as possible, ie, at the indexer. If you have an index with all windows events, and are searching for just authentication events, you'll be better off doing as you said - adding a sourcetype which separates those auth events from all the other win events. Splitting the events into a different index won't make too much of a difference in performance, unless it's very sparse data that you're regularly searching. Having a large number of indexes won't help your sanity. It's more about making sure the data you want to search has specific information that can be searched that eliminates all other unneeded data as early as possible in the search process. 2T/day is not a lot of data for Splunk if tuned properly. Your search optimization will matter more than anything. Read up on Splunk search optimization here - https://docs.splunk.com/Documentation/Splunk/latest/Search/Aboutoptimization. Make sure you're well tuned for large data ingestion (auto_high_volume buckets, etc), and that your searches are designed to limit the data being looked at. Tight time ranges, using source types, explicit indexes, etc. Don't do index=* sourcetype=win*. Run the data quality reports in the management console and make sure your events are clean, and that your time spans of data are current. Add tags to data on ingest if necessary. If you run a search which returns huge volumes of data to the search head, then you can quickly make a mess of your environment. For example, if you had one billion events with sourcetype "data" that were not well delimited so Splunk could not extract fields from it (say, a syslog formatted message with a json object inside), and you wanted to search for a key with value "pizza" which occurs only once in those billion events, you'd be hosed. With json data within a syslog message field, the individual keys in the json will not be extracted by default at index time (you will have a timestamp and a message key, and no fields within the json object). You would have to run an spath to extract the fields (something like "index=blah sourcetype=data | spath input=message | fieldname=pizza"), and Splunk cannot do that on the indexers, but has to retrieve all billion events in the initial search, and return them to the search head and then the spath runs on the search head. If your data is fixed so the fields are properly extracted at index time, then your search would be "index=blah sourcetype=data fieldname=pizza", and Splunk would find the one event with that key on the indexer super quickly.

vliggio · ‎05-04-2020

The only thing you need to monitor is splunkd. It is the parent process for everything that splunk UF will do. There are of course sub processes that might run such as python, but without splunkd the UF won't be working.

vliggio · ‎04-16-2020

Ah, that was not clear what you meant. In this case, you are correct for the app in question, so a metrics index is being used and they're sent using collectd to an http event collector (so that would be Metrics -> Collectd -> HEC). But if someone is still using the Splunk Add-on for Unix and Linux, then the UF collects metrics via scripted inputs (non-Linux systems and older versions of Splunk that don't have metrics index capabilities). For the OP, note that collectd would have to be installed on any older versions of Linux to collect metrics where it doesn't come natively installed.

vliggio · ‎04-16-2020

What do you mean by "Splunk UF only forwards logs for Linux machines"? The UF on Windows certainly collects logs.

vliggio · ‎04-16-2020

Yes. Splunk combines all the files in all app directories (following the precedence rules I linked you to). That's why btool is so important - you can put multiple inputs.conf files in multiple places and could have conflicting settings, and Splunk has specific rules to determine which one it uses. You can use btool to look at any Splunk configuration - just substitute the config file name (ie, ouputs, inputs, indexes, etc). As for this App/Add-On combo (I haven't installed this specific release), I agree with gcusello - look at the documentation. It's not like most Splunk apps which have inputs.conf. Read the following page on how to enable date inputs: https://docs.splunk.com/Documentation/InfraApp/2.0.3/Admin/AddData Also, one minor correction, the Add-On should also be installed on the indexers (in conjunction with the App) - both are needed for the App to function correctly.

vliggio · ‎04-16-2020

The App for Infrastructure goes on the indexers, and the Add-On for Infrastructure goes on both the indexers and the UF's. The most important command for debugging Splunk is btool. Learn it early and it will be your friend. Since Splunk combines many different config files together, btool allows you to see what Splunk is actually using for its final config. Try this: /opt/splunkforwarder/bin/splunk btool inputs list --debug That will show your actual inputs configuration (on a universal forwarder on a Linux box - substitute the application location as necessary on indexers and if you're using Windows). Unfortunately the configs you posted here don't mean anything because Splunk might be getting configs from other directories which override your settings. Play with the command a bit and you'll see (also read up on Splunk config file precedence here - https://docs.splunk.com/Documentation/Splunk/8.0.3/Admin/Wheretofindtheconfigurationfiles) You do not want to put your inputs.conf into your search app directory. It'll get very confusing very fast. You should have the add-on directory in your splunkforwarder/etc/apps directory and inside the add-on directory you'll see a default directory with an inputs.conf file. Create a local directory in the same folder that the default directory is, and copy the inputs.conf from the default into the local directory, and edit it.

vliggio · ‎04-10-2020

Retention is separate from TSIDX reduction. If you set a retention policy via time, that's for the bucket itself (on a per-index basis or via a global setting, and it's set by the frozenTimePeriodInSecs setting in your indexes.conf). To set the tsidx reduction, you enable it via the following two values: enableTsidxReduction = true timePeriodInSecBeforeTsidxReduction = As long as your timePeriodInSecBeforeTsidxReduction is less than your frozenTimePeriodInSecs, the reduction will delete the full TSIDX files after the TsidxReduction number of seconds, and will retain the raw data and the mini TSIDX files. When the buckets age to the frozenTime, then the data itself will be deleted. The data will remain searchable until that frozenTime period, but will just be slower to search.

vliggio · ‎04-09-2020

Take a look here for more info about TSIDX reduction: https://docs.splunk.com/Documentation/Splunk/8.0.3/Indexer/Reducetsidxdiskusage . The problem will be reduced search performance (possibly significant) if you force removal of the TSIDX files. You didn't mention how much storage or ingestion you're talking about, or anything about the type of searches you do. I wouldn't recommend doing it for long term, because the reduced search performance will also mean reduced ingestion performance (since the indexer will now be spending more disk cycles on searching instead of indexing). If the buckets aren't used anymore (ie, you know buckets older than a month are not used), it wouldn't be too big an issue, but if you have a lot of searches being done across all time, then your performance will suffer. If you have to have more space for indexing, you really should consider expanding your storage if possible (or adding another indexer if in a cluster). Storage performance can also go down significantly (depending on your file system and OS) as you get to the upper limits of your file system size.

vliggio · ‎03-27-2020

Clients should definitely use the universal forwarder (UF). There are a few cases where you need to use the heavy forwarder, but in general, you would use the UF for normal clients where you're collecting logs and metrics from. A single properly configured indexer can handle 120gb/day easily, depending on the storage type and your search volume. If you are running special applications such as Enterprise Security, you might need more horsepower, or if you have a large number of end users doing searching (and who are not well versed at Splunk searching, meaning they will write bad searches which consume a lot of resources). Also, depending on how important you consider your data, you might want several indexers so you can do replication of your data.

vliggio · ‎02-17-2020

One thing you can do is put the second indexer in your list of outputs twice, and the first indexer once (if you are using round robin DNS, put it in your DNS entry twice, otherwise, just put it in the outputs.conf twice). ie, in your outputs.conf, put: server = server1, server2, server2

vliggio · ‎02-07-2020

True, though if someone is going to be smart enough to know HISTFILESIZE=0, they are probably smart enough to change their prompt and shell options :). Always a challenge to keep ahead of the nefarious! All goes back to the original poster's question - is this for just general command tracking, or for actual security concerns. Relying on bash history for security purposes is risky. That's why running screen or sudo-io would be preferred in security situations, as the end user cannot override those.

vliggio · ‎02-07-2020

Capturing bash history isn't quite a correct answer. Any user can set their HISTFILESIZE environment variable to 0, and then their history will not be saved. You need to force user sessions to occur within something like the "screen" application to capture their session, if you want to guarantee from a security standpoint that you're capturing all their input. If you're looking to see who runs privileged commands only via sudo, then you can use something like sudi-io to capture all commands issued via sudo (either individually or even if someone sudo's into a shell). Note - if you are capturing the history file (a good thing to do irrespective of my comment above), set the following environment variable in your system defaults, which will put time in your history file (otherwise Splunk won't have accurate times that the commands were executed). export HISTTIMEFORMAT="%h/%d -- %H:%M:%S "

vliggio · ‎01-14-2020

Legally, no, you have to pay for the amount of data ingested. If you ingest it twice (on the same indexer or different indexers), you pay for it twice. If you blacklist an index, then you won't forward that data and won't be charged for it twice. But I don't think that's what you're looking for (correct me if I'm misinterpreting your request). If it were me, I would see if having the central SHC reaching out to the remote indexers delivers reasonable performance (remember, the slowest indexer to respond will affect the time gof the search), and what data you really need forwarded. Beware of things that might consume your bandwidth unnecessarily (do you need _internal data forwarded? It's free, but can be big if you have a large environment). You also haven't mentioned what kind of use you have for the data - if you're running Enterprise Security, your performance for your data summarization might be terrible if run over a WAN (and you can't use the summaries generated by the remote site, as each set of accelerated data is accessible by only one search head cluster). You haven't really said much about quantity of data, and if you want all the data searchable at both sites.

vliggio · ‎01-10-2020

Depends on your search types. If your searches are mostly transforming searches, then remote searches will not be a big issue over the WAN because not much data would be returned from the indexers (relative to the total data set). If they're primarily raw searches, then you're right, you'd want that data local. (see https://docs.splunk.com/Documentation/Splunk/8.0.1/Search/Writebettersearches). Good Splunk users will write efficient searches, inexperienced will do things like "index = *" over all time. I've seen users try to search 9 billion events using a simple "stats count" query, and complain they didn't have a search memory quota that allowed it. I explained their quota existed because it was meant to stop people from doing terrible searches like that and then taught them how to do a tstat search... There's really no way around the licensing with indexandforward, unfortunately (depending on your license of course - an unlimited license would not matter :).)

vliggio · ‎11-29-2019

This is a fairly easy process since Splunk does not rely on items which would be patched (ie, Splunk bundles its own version of Python, etc). You can patch all your servers first, without taking down Splunk. Then just do a reboot in the following order: Reboot Cluster Master Reboot Search Head Put Cluster Master into maintenance mode Reboot Indexers one by one, waiting for the buckets to be re-registered and the cluster to be searchable Take Cluster Master out of maintenance mode If you don't care about Splunk being searchable during the reboot, you can be a bit more aggressive and just reboot all the indexers at once. The forwarders will store the data while waiting for the indexers to come back online.

vliggio · ‎11-13-2019

I edited the original comment, so you can accept this answer. Glad it worked out for you!

vliggio · ‎11-12-2019

Oops, try this: /opt/splunkforwarder/bin/splunk --accept-license --no-prompt --answer-yes enable boot-start -user serviceaccount (order in that one matters)

vliggio · ‎11-07-2019

You have to add user, not edit user. You can't edit what doesn't exist.

vliggio · ‎11-07-2019

Ok, tried this on one of my hosts. This works: /opt/splunkforwarder/bin/splunk --accept-license --no-prompt --answer-yes enable boot-start -user serviceaccount /opt/splunkforwarder/bin/splunk add user admin -password NEWPASSWD -role admin /opt/splunkforwarder/bin/splunk set deploy-poll "172.16.182.76:8089" And you can also do it by creating the passwd file manually. If you create it BEFORE you run any splunk commands, splunk will start up without asking for admin user creation. (post edited with correction from below)

vliggio · ‎11-07-2019

The user will not be created until you first start the forwarder (the password file is not part of the tar), so it can’t edit it as the first command (older versions of Splunk just created an admin user in the passwd file by default, now it prompts for a user name when it's started). You can just do enable boot-start first instead of editing the user which will create the password file, or you can create the password file with the admin user in it yourself with either a real password hash or just disabled or something similar in the hash field if you don’t plan on using the admin user. Will check when I get to a computer.

vliggio · ‎11-01-2019

Yup, simple as that. Since it's cold data (not being written to), you can copy the files while the indexer(s) is up, make the config changes, and then restart Splunk. Downtime will be minimal, just a normal restart.

vliggio · ‎09-25-2019

You should be able to have tens of thousands of hosts in one index with no issue (or more), speed of search should not be affected by the number of hosts in this case. To debug what's slow in your search, look at the search job inspector, which will show where your search is spending most of its time. Most of the time it's an incorrect search. The key is to reduce the data as much as possible as early as possible in the search, to reduce the amount of data that needs to be pulled off disk and processed. You don't say how many indexers you have, or if you have a separate indexer from your search head. Posting your search here will get lots of replies I'm sure on how to optimize it (and perhaps a sample of the data, anything sensitive redacted of course).

vliggio · ‎09-02-2019

The src and dest field extractions take place at search time, so you have to put a ticket in and request that Splunk install the add-ons on to your Splunk Cloud environment. If you look in the Splunk_TA_nix, you'll see the props.conf has a bunch of FIELDALIAS settings, which, if you refer to https://wiki.splunk.com/Where_do_I_configure_my_Splunk_settings, you will see that FIELDALIAS is a search time configuration.

vliggio · ‎08-16-2019

Go.Splunk is just a collection of community provided Splunk queries. Why is that more supported than something from apps.splunk.com? The suggested app is written by Doug Brown who works for RedHat - if anyone can write an app to interpret Linux logs, I'd expect it to be a RedHat employee!

Posts	69
Solutions	12
Karma Given	3
Karma Received	52
Member Since	‎12-05-2013

Online Status	Offline
Date Last Visited	‎02-07-2026 11:35 AM

Re: Release Management with Splunk?

Re: Questions about data models, data architecture...

Re: Splunk Daemon and Solarwinds

Re: Using a splunk add-on for infrastucture for a ...

Re: Using a splunk add-on for infrastucture for a ...

Re: Using a splunk add-on for infrastucture for a ...

Re: Using a splunk add-on for infrastucture for a ...

Re: We have tested TSIDX reduction on one index an...

Re: We have tested TSIDX reduction on one index an...

Re: How to size Splunk deployment for 1800 client...

Re: How to specify the storage ratio?

Re: How can i monitor linux commands in splunk

Re: How can i monitor linux commands in splunk

Re: Multi-site Architecture - How to index and for...

Re: Multi-site Architecture - How to index and for...

Re: Splunk server need to be taken care before upd...

Re: Linux deployment of Universal Forwarder issue ...

Re: Linux deployment of Universal Forwarder issue ...

Re: Linux deployment of Universal Forwarder issue ...

Re: Linux deployment of Universal Forwarder issue ...

Re: Linux deployment of Universal Forwarder issue ...

Re: Changing cold storage location for a single in...

Re: Number of Index's to Host or Events

Re: Fields in Splunk Cloud from Heavy Forwarder (A...

Re: Dashboard with /var/log/sudo.log, /var/log/sec...

Join the Conversation