Not sure if anyone has interest in this but if anyone is trying to setup something similar I've got a working model for some of this I can share. I'm still working through the details but can share what I've got so far if it can help anyone else. Thanks.
... View more
Will do thanks. And for the record, if I up-vote your answer, will that close out my inquiry? I definitely want to keep this open as we're still actively working through this but you're responses have definitely helped pave the way to the solution we're coming up with but I'm still learning how this works so forgive any clumsiness as I work out the kinks here. Thanks again I'll have updates posted soon!
... View more
Great stuff David thanks again. I'm looking into the first part now. As for the second part, with initial problem was figuring out how to "diff" the 2 files which is where I'm heavily leaning on my consultant to help with via Splunk.
I could try using vlookup to get the diff of each file but if I script capturing the output, and I script doing the vlookup, at that point I may as well just script email notifications when certain changes are triggered. (which I might end up doing anyway). But this is why I want to get a good POC done since I'm guessing the best solution will bubble to the top after playing with a few test cases once we have the data to play with. Thanks again great stuff I'll keep you posted with updates as I clumsily hack through these things. Thanks again!
... View more
Hey that's a great idea you're thinking ahead while I'm still in the POC stage but that's great stuff. I'll mark this as "Answered" once I get the working model and share the info but want to keep this open in case there's more we need to solve and anyone else has other thoughts on this but that really puts this in the right direction. Thanks again more to come.
... View more
Will do thanks. And as I was thinking about this I feel like I might run into the same problem with sysmon or the eventlogs which is, how would I index those details in a way to compare 2 systems? Hopefully the consultant has some ideas but what I started doing after your comment was looking at manually outputting some data for my baseline. Let me know your thoughts on this.
I script an output of the dir /s and reg export to a text file for my baseline.
Ideally I ingest that output into Splunk which is ~400MB but better than 400GB for our system drive capacity which sounds like the "monitor" function would ingest.
The next time the script is executed, the new data will still be ~400MB but this should then allow a lookup or diff of 2 outputs and highlight what changed between reports.
Assuming we can't find the data already buried somewhere in the eventlogs or sysmon, this seems like it might serve the same purpose since dir /s, ls -la, etc., output the file, date, and size fields so if there was something overwritten, corrupted, etc., it would at least be called out so we could further investigate what trigged the change.
That would leave me to my final question which would be, is there an ideal way to ingest semi-structured text files into Splunk? Thanks again I hope to have some new updates for ya today 🙂
... View more
Thanks David one of our Splunk guys was quick to point that out too although not quite as constructively so thanks for the heads up I'm curious how the consultant responds and what his plan was to minimize the impact for these use cases.
And using existing data was my original plan from the beginning but the event logs only tracked actual installs like MSI's but sysmon event ID 2 looks way more promising and something I'm going to bring up to see if we can leverage. Really, it sounds like he's got the rest figured out for how we'd lookup the changes, do comparisons, and trigger alerts so getting the data in seems to be the hardest part. And technically, I can create the output I need manually by outputting the registry and filesystem to a text file which is about 400MB which isn't great but better than the alternative.
Thanks again for the help I'm sure there's a dozen guys out there trying to learn this between our real jobs, have no training, and it's replies from someone with nothing to gain like you that give me a little more hope that not everyone out there is a huge d0uc4e when getting training or the trial and error method isn't cutting it for issues you need to solve yesterday 🙂
... View more
Thanks David. Once I figure this out I'd image everyone here would want to use something very similar for the "what changed" outages. And it was long so I don't blame ya for missing the part towards the end where I suggest the solution is hopefully part of sysmon since we do ingest this already but good point on auitd since we do have some linux boxes here and once I figure out how to do this in Windows I'm going to for sure copy the process to our linux boxes.
I'm also working with this rockstar consultant that's given me some ideas but if you're up to the task here's my suggested next-steps. Feel free to dream up a solution and share your findings and I'll do the same. His suggestion is to create an index called "Directory" and use the monitoring function "monitor://C:/*" to watch the data in question. Then use a lookup (vlookup mabe?) to check for differences and generate a report to trigger when something changes. In theory this looks easy but I don't have the Splunk background to understand the exact steps or commands needed to make this happen so again, if you have real examples and syntax to help with I'm happy to try anything you can dream up. Thanks again!
... View more
Sorry in advance this is such a long post so I'll try describing this in a sentence or two in case this is so easy you don't need to read the short novel I wrote below it to figure this out.
Q. I need Splunk to help me figure out "what changed" by ingesting all Windows files, directories, and registry keys, and then have Splunk compare and trigger notifications from that indexed data the next time that data is ingested, in either a query, dashboard, ITSI service or ???? An example would be, someone changed a hosts file and "broke the internet" but of course there are no change-control records stating any work was done the previous night.
This may go a bit beyond what Splunk was designed for but I've also learned that Splunk can do about anything you can dream up. Here's a few use-cases why I need this (and so do you).
Use-case 1: I need to track changes and difference between large clusters of application servers since change-control is something my organization doesn't believe in. Here are some things I've found that were different that caused outages. Differences in hosts files, no hosts files, differences in application or OS patching, differences in registry hives, configuration files, .DLL/file versions, file sizes, you get the idea.
Use-case 2: Once we can figure out use-case 1, I'd like to setup triggers so that when specific changes are detected that we define, Splunk sends out an alert/email to notify some poor guy in our NOC. For example, someone changes a hosts file or patches one of our Prod servers, notify someone.
What I've Tried
In the past I've used Powershell DSC to automate "fixing" when changes happen but this is a slightly larger hammer than I want since often times as in the case of patching, we want those changes. But when something is broken and the inevitable "what changed" comes up, nobody knows, or they don't know all the files, registry updates, etc. that happened.
I've also looked at the Security Event logs which will list what files have changed or updated, and who made the changes, but if you install a patch for example, this isn't going to tell you exactly all the files that were changed, which reg keys were updated, etc.
I've used SMS and Zenworks back in the 90's which worked awesome to identify "what changed" but those tools don't really work on modern systems since, just sitting idle, your system makes about a million updates and changes which wasn't always the case.
SysInternals Process Explorer is a great tool but if you use the snapshot feature to see "what changed" without even installing or changing a single file, thousands of files and reg keys are updated and the longer you wait, the more changes take place.
What I'm Doing Today
So what I currently do is basically run some basic "tree" and "dir" commands and pipe the output to a .txt file with variables that name the file after the server it was run on, time stamp it, and copy the output to a NAS share with all the other files. Then, when there's an issue, I use WinMerge to ingest 2 or 3 files from suspicious systems, and it will automatically highlight the differences in my output files, and very slowly and manually, I find "what changed" from the night before and can solve the mystery usually much faster than anyone else not using this approach.
What I'm Hoping is Possible
If you're still following my ranting question here's what I'm hoping is possible with Splunk to make this either way easier, less manual, or hopefully plain old automated in a query, dashboard, or ITSI thingy.
I'd like to think since we're capturing perfmon, IIS logs, and the normal event logs (and whatever comes with your standard Windows TA), that our Splunk Indexes somehow already have a complete listing of system files and directories indexed (registry files would be a bonus). Then, with that data, run some sort of sysdiff on the data each morning, notify me something changed, and output the differences (or at least the file/directory that changed) so some location we can use for further investigation.
Then once that's working, set triggers on certain files so that proactively, when a certain file, app, or regkey is changed, I get an email before it's even reported as a problem.
I know that's a lot but I also know it's possible and at the very least be scripted, I'm just too new to Splunk and to dim to have the skillset needed to make this happen. And if there's some totally better way of doing this outside of Splunk, I'm fine buying or learning another product and having Splunk call/index that other "thing" to get our notifications out there but really trying to keep Splunk as my central hub for all our notifications one way or another. Thanks for the time!
... View more
I'm very new to Splunk and this seems pretty easy in Kibana but in Splunk I'm really having a hard time finding anyone that has done this yet. Initially I'd like to pull x-forwarded-for (XFF) data from my IIS logs. I was hoping I could somehow use the query below or something similar to pull the XFF IP's for the geo-location, and create one heat map for values showing how many connections are coming from that geo-location, and possible create another heat map that could use XFF IP's and the time_taken fields to show me where users with slow connections are popping.
Here's an example of a query to get me started which I can graph, but clearly just changing the graph type to "Cluster Map" or "Choropleth Map"
index=web host=J00Podyp* status=200 | stats values(time_taken) as time_taken by x_forwarded_for UserID | where time_taken > 2000
I've read through a lot of other suggestions but none seem to be using it the way I've discussed, and all have been way over my head so if anyone has any suggestion, please type slowly and provide steps as if you're explaining it to someone at the bar, as this may be where my problem brings me.
One last thing I guess, we do have ITSI and that's definitely an option but I'd rather use what's easiest assuming this is even possible. Thanks in advance for any thoughts or ideas...
... View more