First the simpler version of this problem. After 50 full moves without a capture or pawn move, a draw MAY be claimed. After 75 moves, a draw MUST be claimed. This requires a count to be kept that may require up to 7 more bits.
The bigger problem is draw by repetition. If a position repeats exactly (same castling and en passant options) for a third time, then a draw MAY be claimed. If it repeats exactly for a fifth time, then a draw MUST be claimed. (Usually it is claimed on the third time, but you don't have to.) Applying this rule correctly requires not just knowing the current position, but what positions have occurred previously, and how often. Back to the last pawn move, capture, or change in potential castling status. This may require (per the first rule) knowing what up to 75 different past positions were.
The best way to store this history is almost certainly not as a list of positions, but as a history of moves. But, even if done efficiently, we will need more bytes for that history than we needed for the position.
0: https://lichess.org/@/revoof/blog/adapting-nnue-pytorchs-bin...
But if we store castling state, I think we are already trying to store the whole game state, and this is representable only by full history because of move repeating rules.
So, I think storing board state as a sequence of moves is more interesting. I would estimate that a number of possible actions on average is closer to 8 than to 16, so it would give as 3 bits for half-move and 6 bits for full move. 24-move game could be represented with 18 bytes, which is considerably lower than 26 bytes!
You can get close to average bit per move, if you reuse "spare" places for the next move. So, for instance, first move have 20 possibilities, which is representable by 5 bits, but you can reuse "spare" 32-20=12 possibilities as a bit for the next move.
This is a representation assuming you use only "move validator" thing that returns a list of possible moves. I think that if you use a chess engine that would output you a probability distribution of possible moves, you can compress noticeably better on average, but decoding would be slow.
Edit: Here are the Notes
0 Empty
10 Pawn
1100 Knight
1101 Rook
1110 Bishop
1111 Queen
32 + 32 + 472
2 times 6 bits: position of the kings
30 bits: color mask
120 + 2*6 + 30 = 162 bits
We can store the rest using the methods from the blog post and add 18 bits for promotion, giving 180 bits.
I'm sure this isn't the most efficient way, and I think I had other methods and considered things like the bishops being able to occupy 32 squares, though special casing doesn't make sense because of promotions.
Technically if you got 8 bishops/queens/knights/rooks You would occupy another 16 bits, giving 196 bits
I think the upper limit can be reduced at the cost of increasing the lower limit
EDIT2: I think I made the assumption at the time that to promote one piece you needed to capture at least one enemy pawn, giving the space for the two bits, which means the upper bound is actually 180 bits
Would love to see other people try in the comment section
[edit] This made me look for articles estimating this and I found this one [1] which confirms the above is in the right ballpark. Actual study (according to the article) says 4.822 x10^44 is their upper bounds
[1] https://chess-grandmaster.com/how-many-possible-chess-positi...
But each position can only be used once, so you really only have 64*63*62*61*...*33 possibilities = 64!/32! = ~2^53, so you could encode this with less than 7 bytes, and then use the basic 12-byte encoding for captures, castling, en-passant, and promotions, and you are below 19 bytes in total. (Did I miscalculate this?)
Also pawns can never be on the 1st or 8th rank so you can subtract those possibilities from each of the numbers that represents a pawn.
You also don't care about the order of bishops, rooks, knights, and especially pawns - so you should be able to do even better.
BTW 495 can be computed as a binomial coefficient C(8+5-1,5-1), the number of combinations of 8 elements chosen with repetitions from 5 elements.
I think that you need one extra bit, that can contextually encode "rook has moved" or "en passant available".
How can we eliminate two bytes?
We can more cleverly encode the mask so that two bytes are used to express the all-pieces-present, and certain other cases. A fully decoded mask is only used for boards that have too few pieces to break over 26 bytes.
For instance suppose we have a two-bit header encoding several cases:
00 - no piece are missing (32 six-bit coordinates follow)
01 - one piece is missing, followed by 5 bit ID of that piece
10 - two pieces are missing, followed by two 5 bit IDs of the two pieces
11 - three or more pieces missing, full mask follows.
In case 11, we need a 32 bit mask, followed by as many as 29 coordinate pairs.
So 2 + 32 + 6 * 29 = 208 bits.
And hey look, by dumb luck, 208 / 8 = 26.
I will check the article later.
Method 1: Lichess method; 64 bit header, 1 bit per square indicating occupied squares, then (up to) 32 4 bit codes for the 32 occupied squares. So 24 bytes worst case!
Method 2: My own method a list of 2 bit codes, one of the 4 codes indicates empty square, the other three codes are prefixes for a second 2 bit code. Three prefixes applied to one of 4 code values gives a total of 12 possibilities corresponding to any possible chess piece. Worst case 32x2 bits plus 32x4 bits = 24 bytes.
In each case there is sufficient entropy to create tricks to add the supplementary information (who to move etc.), similar tricks are in the original article.
I mention my own method from my Tarrasch Chess GUI https://github.com/billforsternz/tarrasch-chess-gui only for completeness. If I had known about method 1 I would have used that, it is simpler and better and there is much more entropy available making the tricks easier.
I would urge commentators to keep clear the difference between compressing a chess position (this challenge), a chess move and a chess game. A chess move needs far less bits of course. A complete chess game is always encoded as a list of moves, not positions, for this reason.
Edit: I should have mentioned that the chief advantage of method 1 over method 2 is average bits required. An empty board is 64x1 bits = 8 bytes for method 1 and 64x2 bits = 16 bytes for method 2.
Edit 2: I am going to describe my tricks, just because they are fun. Two kings the same colour means Black to move. Two White kings means the first king is white, two black kings means the first king is black. Otherwise White to move. A friendly pawn on the first rank means an enpassant vulnerable pawn. Swap it with the 4th rank square to get the actual contents of both squares. A hostile pawn on the first rank is an unmoved rook or king, establishing all castling rights. The castling and en-passant tricks can be combined in a pleasant and harmonious way.
TLDR: You stupi...lovely folks, need learn what a bit is. What is a byte, and why integers are NOT BITS.
And no one called this post out from years ago? Crazy.
Someone ask, and I will show you how to do it in 24 BYTES IN BINARY ENCODING FOR REAL - no fake "integers are bits". And can we use compression techniques? We can get this to 10-12 bytes maybe.
So it beats the fake "26 bytes" and my version is actually real binary bits.
Ask.
With a few bytes more more you can create an implementation that is a lot easier to understand. Bytes are cheap, developer time isn't.