Language Models over Canonical Byte-Pair Encodings