The MOVE architectures may work best with digital signal processors, because the data-flow is almost constant in such processors.
I invented my own version of the move only architecture (around 1992), but focused on speed. So here is my idea below.
1. The CPU only moves within the CPU, like from one register to the other. So all moves are extremely fast.
2. The CPU is separated in different units that can do work separately. Each unit has different input and output ports. The ports and registers are connected via a bus.
3. The CPU can have more buses and thus do more moves at the same time. If an output-data is not ready, the instruction will wait.
Example instruction: OUT1 -> IN1, OUT2 -> IN2 With 32 bits it would give give 8 units with 32 ports each.
Example of some set of units and ports. Control unit: (JUMP_to_address, CALL_to_address, RETURN_with_value, +conditionals) Memory unit: (STORE_Address, STORE_Value, READ_Address, READ_Value), Computation unit: (Start_Value, ADD_Value, SUB_Value, MUL_Value, DIV_Value, Result_Value) Value unit: (Value_from_next_instruction, ZERO, ONE) Register unit: (R0 ... R31)
It is extremely flexible. I also came up with a minimalist 8 bit version. One could even "plug-in" different units for different systems. Certain problems could be solved with adding special ports, which would work like a special instruction.
I did not continue the project due to people not understanding the bus architecture (like a PCI-bus). If you try to present it in a logical-gate architecture (like in the article), the units make the architecture more complicated than it actually is.
The tough part though is that memory is usually slow and you have to wait an undetermined number of cycles for data to get back from DRAM and while one operation is blocked all the other operations are blocked.
I guess you could have something like this with a fancy memory controller that could be programming explicitly to start fetching ahead of time so data is available when it is needed, at least most of the time.
It's possible to have a one instruction machine where the one instruction does a subtract, store, and branch if negative. But it's not very useful. This register-oriented thing is something someone might put inside an FPGA.
This is the the device register mindset, where you do everything by storing into device registers, as a CPU architecture.
Going for transport triggered architecture for additional features seems like a fairly easy win. I kind of started designing one before I realised that's what the design was. The Gigatron has to do some unreasonably hard work for a few operations, like shift right, which is an operation that can fundamentally be done with just wires once you have a mechanism to provide the input and fetch the output.
Definitely not knocking the Gigatron though. Every limitation it has is because it saved a chip, when it comes something minimal to build upon It's pretty cool.
I'm having trouble running the file though, it's missing a chip, "74181.dig". Can you point me to where to download that or add it to the repo?
Just when my night was going through a meditative sleep about basing ontological models using change as fundamental block. Identity is such a brittle choice as foundation, even if it's a great tool in many situations otherwise.