I need to go over every item in our syslogs so I was wondering - how would I do the equivalent of a "select distinct *" in such a way that it ignores anything unique to each event but only gives me 1 instance of each actual logged item, know what I mean? I basically want to exclude all multiples of any entry (unique device and event combo) and would estimate that of the 3M records we have I may be talking about a hundred or so uniques.
I am guessing that there is something obvious I have completely missed and I apologize if this is a dumb question.
Second, purging items (can you say flapping interfaces?) seems to be a search by sourcetype= but all our entries have the same sourcetype= value - how can I purge/perma-hide (pipe delete) based on a unique text string?
Thanks in advance
You guys are great - not to mention patient and not giving me crap for being deaf dumb and blind - I found the export option (right in front of me) and am pretty confident that the punct field gives me a significant head start on uniques and if not I can just do whole chunks at a time. Thanks very much for your help!! -Senor Nooblet
No problem, @bagojunk. If my answer got close enough, could you click the check mark below the voting buttons to accept it? Thanks.
I got the delete thing wrong, I just misinterpreted (thanks tho!)
The problem with dedup is that with syslog it doesnt recognize the "meat" of the entry as a field - it recognizes most of the syslog as separate fields but not the event detail for some reason.
May 10 17:47:16 device12345 242504: May 10 17:47:15.795 EDT: %LINK-3-UPDOWN: Interface FastEthernet9/99, changed state to down
So I want all hostname+"meat" uniques, right?
The hostname in this example is "device12345" and the meat is "Interface FastEthernet9/99, changed state to down" however that section is not identified as a field by splunk - only all the timestamp, event type, etc, preceding it. If these were all loaded in sql or excel, I could do a RIGHT 100 or something just to get all distinct characters from the end of the entry but Splunk uses a proprietary database so I cant do t-sql or anything and want to know the equivalent.
Also, it's not like MSSQL isn't a proprietary database 🙂
Well create a field out of the "meat" and dedup on that in that case.
index=prod source="/usr/share/whp-tomcat-7/logs/catalina.out" " > user-agent:" | rex "> user-agent: (?
Got the unique IP addresses from it!!! Thanks to "dedup"...