Italian Word Embeddings

Human Language Technologies (HLT), Istituto di Scienza e Tecnologie dell'Informazione "A. Faedo", Consiglio Nazionale delle Ricerche - Pisa, Italy


Overview

We generated word embeddings with two popular word representation models, word2vec and GloVe trained on the Italian Wikipedia.

Models

The word vectors trained with skipgram's word2vec are available here (1.5 Gb) and the word vectors trained with GloVe are available here (790 Mb).

The tar.gz files contain the pickled models that are readily usable (after decompression) with the Gensim framework.

The pickeld word2vec files include the entire model and can be also retrained with new data. The pickled GloVe files include only the word vectors.

Italian Word Analogy Questions.

Once you have loaded the model you can evaluate it on the Italian word analogy test downloadable here.

References

word2vec

GloVe