Mode 3 is a square wave generator. If 1 is subtracted, the output will be n/2 high and n/2 low when it is counted n times, which is a complete square wave. Subtracting 2 means that a complete square wave is divided into two parts. The first time you output a high level, subtract 2 from each clk, then output a low level after it drops to 0, and so on, it is also n/2 high and n/2 low. These two essences are the same, but different textbooks have different interpretations.
For your question, 625 times is reduced to 0, but the output in the first round is high and the output in the next round is low. It takes 2*625 to calculate a cycle.