On the measure and the estimation of the evenness and diversity of vocabulary
Ginebra Molins, Josep; Puig Oriol, Xavier
Universitat Politècnica de Catalunya. Departament d'Estadística i Investigació Operativa; Universitat Politècnica de Catalunya. GRESA - Grup de recerca en estadística aplicada
Modelling word or species frequency count data through zero truncated Poisson mixture models allows one to interpret the model mixing distribution as the distribution of the word or species frequencies of the vocabulary or population. As a consequence, estimates of their mixing density can be used as a fingerprint of the style of the author in his texts or of the ecosystem in its samples. Definitions of measure of the evenness and of measure of the diversity within a vocabulary or population are given, and the novelty of these definitions is explained. It is then proposed that the measures of the evenness and of the diversity of a vocabulary or population be approximated through the expectation of these measures under the word or species frequency distribution. That leads to the assessment of the lack of diversity through measures of the variability of the mixing frequency distribution estimates described above.
Peer Reviewed
Àrees temàtiques de la UPC::Matemàtiques i estadística::Investigació operativa
Poisson, Distribució de
Vocabulari -- Estadístiques
Estadística aplicada
Lexicologia -- Estadístiques
