Information Locality as an Inductive Bias for Neural Language Models