Multinomial Logistic Regression

Category: Natural Language Processing

Multinomial Logistic Regression

Classification

Input can be anything (document, image, etc.) and output is a class label from the finite set $\mathcal{L}$.

$$ classify : \mathcal{V}* \rightarrow \mathcal{L} $$

$\mathcal{V}$ is the set of words in our vocabulary.

$X$ is a random variable representing the input, in a given instance taking values from $\mathcal{V}*$.

$Y$ is a random variable representing the output, taking values from $\mathcal{L}$.

$p(X, Y)$ is the "true" distribution of labeled texts. $p(Y)$ is the distribution of labels. We don't normally know this without looking at the data.