Getting Data In

How to configure props.conf to parse events on the correct date and re-index data after removing incorrectly parsed data?

stellgod
Engager

Hey guys,
I'm a new splunk user and my events are not sorting correctly.

I have data coming from a UF that looks like this:

11/22/2005 17:28pm ****
                Connecting to ports, Please wait....


2005/11/22 17:29:07.789 TUE.   JOURNAL FILE RECORD ID 16341

2005/11/22 17:29:19.091 TUE.   JOURNAL FILE RECORD ID 16342

2005/11/22 17:29:28.334 TUE.   JOURNAL FILE RECORD ID 16343



                 Logging out12/13/2005 10:00am ****

And I want splunk to sort the events based on the date with the format dd/mm/yyyy, instead, splunk automatically made my events split on the yyyy/mm/dd.

First off, since this is a UF, do I need to add a props.conf on the UF and create a line like this? BREAK_ONLY_BEFORE = [0-9][0-9]/[0-9][0-9]/[0-9][0-9][0-9][0-9]

Secondly, when I add new data through the splunk add data wizard, I can separate my events correctly from there, but it doesn't affect the previously indexed events. If some amount of data is already indexed, and I change the props.conf will it not go back and reindex based on the updated props.conf or is it forever stuck parsed incorrectly?

Thirdly, this is a sandbox server, and I "cleaned" out one of my indexes. I want to readd the data, and the data is still sitting in the same spot as before, but Splunk doesn't recognize it. Splunk is already monitoring these files but doesn't pick it back up. How do I get the same data I had reindexed into splunk when I've cleaned it? Is it possible?

Thanks for the help.

0 Karma

tom_frotscher
Builder

Hi,

you just configure the name of the sourcetype on your UF in the inputs.conf. Then you want to make sure the sourcetype is correctly configured. You can do that with the help of the add data wizard and the preview on your indexer/search head. Doing this with the wizard and the preview is best practise. You can for example copy a sample file from your UF and use this with the wizard/preview. You already mentioned that you can configure the sourcetype correctly in the wizard and your events are broken correctly.

And you are also right with your other statements:

If some amount of data is already indexed, and I change the props.conf will it not go back and reindex based on the updated props.conf or is it forever stuck parsed incorrectly?
This is why you normaly should use a sample file with the preview on your search head to make sure the sourcetype is correct. You can then create a temp index and add the sample file as snapshot, not as monitor (monitor means the files is always watched, snapshot means it is just one time added but not watched permanently). If you made a mistake you can delete the temp index and start over again. If the data is added with the help of a monitor, the data will remain parsed incorrectly in your splunk. You have three options then:
1) Wait until the index size / time is reached and the events are dismissed
2) use the delete command (this is more masking of the data than a delete)
3) clear the fishbucket

Splunk is already monitoring these files but doesn't pick it back up.
This happens because splunk wants to make sure not to read the same events twice. For example if you rename a file, you do not want splunk to read it again. This is useful if you want to roll your logfiles. You can make splunk reading files again buy clearing the corresponding index and the fishbucket. Or you can use the crcSalt option. You can find information about the crcSalt in the link i posted abouth about the fishbucket.

Pretty complex answer, best is you avoid this situation by using the wizars and the preview with a sample file as snapshot.

Greetings

Tom

kristian_kolb
Ultra Champion

First) Splunk sometimes picks the 'wrong' place to split the stream of data into separate events. And no, don't do this on the UF. It will not make a difference. You'll have to do it on an indexer or heavy forwarder, since that is where the parsing phase takes place.

Second) There are a few pieces of metadata that have to be correct at index-time, as opposed to at search-time. Almost all other field extractions/manipulations can (and should) be done at search time. These are the ones you need to get right:

  • index
  • timestamp
  • event breaks
  • host
  • source
  • sourcetype

So, once an event has been indexed with the wrong information for any of these parameters, you're stuck, so-to-speak.

Third) Splunk keeps track of which files it has already seen in a special index called the "fishbucket". Cleaning out this index will let splunk re-index data.

/K

Get Updates on the Splunk Community!

Dashboard Studio Challenge - Learn New Tricks, Showcase Your Skills, and Win Prizes!

Reimagine what you can do with your dashboards. Dashboard Studio is Splunk’s newest dashboard builder to ...

Introducing Edge Processor: Next Gen Data Transformation

We get it - not only can it take a lot of time, money and resources to get data into Splunk, but it also takes ...

Take the 2021 Splunk Career Survey for $50 in Amazon Cash

Help us learn about how Splunk has impacted your career by taking the 2021 Splunk Career Survey. Last year’s ...