Sadly, the simplicity benefits it touts over Mutexes simply don't pan out. You still have to have some structure around your critical section, it forces you to still contend with partially-updated intermediate values, which mutexes spare you from.
Likewise, the composition issue isn't more solved with STM than with Mutexes. Implementations that wish to support re-entrant transactions still have overhead over those that don't, similar to the overhead a reentrant_mutex has over a regular mutex.
My favourite solution to the many-reader few-writer scenario is the rarely-supported "Upgradeable Shared Mutex". Like parking_lot's RwLock (https://docs.rs/lock_api/0.4.14/lock_api/struct.RwLock.html#...)
With an Upgradeable Shared Mutex one can take a simple shared read lock, an exclusive write lock, or an upgradeble read lock. The upgradeable read lock can then - without unlocking - be upgraded to an exclusive lock once you've prepared the writes you wish to do, keeping the exclusive section as short as possible. To avoid deadlock, the upgradeable read lock can exist concurrently with simple read locks, but not other upgradeable locks.