For my application the progress bar is for impatient humans waiting for something to happen. Ending at 96 or 99 or even 101 is irrelevant. The instances in my code actually do execute very fast, there is just a lot of them (half a million or more). This whole discussion reminds me of conversations at Cray about the virtues of parallel vs. vector computing. The programming of massively parallel computers is still an ongoing problem with no elegant solution.
Ideally I would like to be able to specify that the counter be updated from only one instance of the parallelized loop. Not having that control available, altenbach's solution works just fine.