A predictor is where you look at pixel above, and pixel to the left, and instead of coding a raw pixel value, you sort the values by how likely they are for the given "above" and "left" values. Most likely value becomes 0, second most likely becomes 1, etc... This changes the image into a different image where you have more 0 values than others. So you can encode the data as "number of 0s", like RLE, but with the zero count encoded in a variable-length way.
That's from an external storage perspective (fitting more images onto a floppy, which is, IIRC around 140 Kb).
You are typically only going to be loading one image at a time. So if you save an additional 100 bytes, using more than 100 bytes of extra code, it's not a win from a RAM point of view.
The reasoning behind the oddball framebuffer layout is handwaved away with "you can probably blame Woz for this" and "possibly to save a few chips on the motherboard".
Well, yes, to both. And the Apple II scan hardware is an absolute masterpiece of the era (surpassed only, IMHO, by the Disk ][ card he invented a year later). That's what we should be talking about.