How is perplexity calculated in language models?

Could you explain the method used to calculate perplexity in language models? I am interested in understanding how this metric is derived and its significance in evaluating the performance of these models. Any insights into the mathematical formulation or practical applications would be appreciated. Thank you for your assistance.

#Crypto FAQ