But every time I suggest to a team I’m working on that we should try modelling the problem with it… people are less than enthusiastic.
These are all great tips. Especially starting with the most simple, abstract model and extending it with refinement if you need to.
That basically means that you write your first spec as abstractly and simply as possible with as little detail about the actual implementation as you can.
If details matter, you use the first spec as a module and write a new spec with the added details. The new spec will use implication that if this spec holds then the first one holds too.
Anyway, I was kind of mad when I learned TLA+ that it and systems like it aren’t part of the standard curriculum and common practices.
“We built this distributed database that guarantees quorum if you have at least 3 nodes!” someone will excitedly claim. And when you ask how you get an informal paper and 100k lines of C++ code. Wouldn’t it be nicer if you just had 1k lines of plain old pure maths?
And pure maths that can be exhaustively checked by the computer? It’s not as perfect as a proof but it’s great for a lot of practical applications.
For me it’s nicer I guess.
[1] https://www.youtube.com/watch?v=qs_mmezrOWs
[2] https://speakerdeck.com/swlaschin/tla-plus-for-programmers
- Have a clear notion of what part of the specs represents the system under your control (the "machine"), and what part represents the broader world it interacts with. The world can do more than the machine, and the properties on the world are much more serious.
- Make lots of helpers. You need them more than you think.
- Add way more comments than you normally would. Specs are for analyzing very high-level ideas, and you should be explaining your high-level ideas.
- Make the spec assumptions clear. What has to be true about the operating environment for the spec to be sensible in the first place?
- Use lots of model values, use lots of constants, use lots of ASSUME statements. Constrain your fairness clauses as narrowly as possible.
- Understand the difference between the semantics of TLA+ as an abstract notation and the semantics of TLA+ as something concretely model-checked. For example, TLA+ is untyped, but the model checker is typed. Also, a lot of TLA+ features are not available in different model checkers. OTOH, you can break TLA+ semantics with use of TLCSet and TLCGet.
The last tip applies to whatever modeling language you use: most have the same distinction.
Take something like Google Docs. I'm sure they have some websocket protocol to handle collaborative text editing:
- establish connection
- do a few retries if connection is lost
- always accept edits even when there is no connection
- send queued edits when connection is back
- etc
But what properties can you imagine here?
We never enter an error state? Not really. Error states are unavoidable / outside the control of the system.
We always recover from errors? Not really. Connection might never come back. Or the error requires action from the user (e.g. authentication issues).