A related blog post (https://news.ycombinator.com/item?id=47842021) discussed this and termed it "flinching". I wonder if this flinching could also be "mediated by a single direction" or if it can only be fixed by finetuning on a more extensive text corpus.
If you are going to prevent some-things we "know" are bad and your method is "known" to belong on that list the best you can hope for is a pyrrhic victory.
If we anticipate the worse case scenario on both ends the conclusion must be that we are terrible at such predictions.
But hey, if we let money guide us at least some will be happy with the result.