Fundamentals of Machine Learning

Cover and Thomas, Elements of information theory, Wiley, 2006.
The “standard” textbook.  It’s well-written, fairly accessible, and provides a very rigorous foundation.

Pierce, An Introduction to Information Theory: Symbols, Signals and Noise, Dover, 1980.
A classic introduction to information theory and its application. Well-written and less technical than Cover & Thomas, while also covering a broader range of applications. Keep in mind that the "Information theory and physics" chapter is somewhat dated.

Gleick, The Information: A History, A Theory, A Flood, Vintage, 2012.
A popular science book that discusses about the historical development of the idea of information.

Frigg and Werndl, Entropy - A guide for the Perplexed, 2010. A great, slightly technical overview of entropy as it is used in information theory, statistical physics, and other scientific fields.