Comments on Ferrer-i-Cancho & Solé (2003)

A serious problem of Ferrer-i-Cancho & Solé (2003) is that it lacks a rigorous evaluation of the presence of Zipf's law for word frequencies at the critical point. Indeed, simply the visual evidence of Zipf's law (defined as a power-law) is dubious. Caring about a rigorous evaluation opens three questions:
  1. What is the actual distribution at the critical point in this model?
  2. What is the best distribution to model Zipf's law for word frequencies? 
  3. What is the best random variable variable to define Zipf's law?  One could use word rank or word frequency.
Question 1

The global minima of the energy function do not show a power-law with a realistic exponent, even for values of the parameter lambda close to the critical point (Ferrer-i-Cancho & Díaz-Guilera 2007, Prokopenko et al 2010, Dickman et al 2012). At the vicinities of the phase transition, the global minima are dominated by an inverse-factorial (sub-logarithmic) law (Prokopenko et al 2010). An apparently more power-law like distribution is obtained in the vicinities of the phase transition in local minima (Ferrer-i-Cancho & Diaz-Guilera 2007, Prokopenko et al). That local distribution should be investigated further.
Interestingly,  a variant of the model presented independently by Ferrer-i-Cancho (2005) has at least two virtues: it provides a better visual evidence of a power-law at the critical point and the exponent of the power-law as a function of the size of the system can easily be predicted.

Ferrer-i-Cancho is very concerned about the need of rigorous methods to investigate Zipf's law for word frequencies and power-laws in general. He is been exploring and applying more rigorous methods for power-law  fitting (Moreno-Sánchez et al 2015, Corral et al 2012). He has also investigated dubious claims about the presence of power-laws (Ferrer-i-Cancho et al 2013, Ferrer-i-Cancho et al 2014, Baixeries et al 2013, Hernández-Fernández et al 2011). 

Question 2
This question has been investigated by several researchers, e.g., Clauset et al (2009), Li et al (2010) and Moreno-Sánchez et al (2015).

Question 3

This question has been addressed by several researchers, e.g., Ferrer-i-Cancho & Gavaldà (2009) & Piantadosi (2014).


Baixeries, J., Hernández-Fernández, A., Forns, N. & Ferrer-i-Cancho, R. (2013). The parameters of Menzerath-Altmann law in genomes. Journal of Quantitative Linguistics 20 (2), 94–104.
[ doi: 10.1080/09296174.2013.773141 ]

Baixeries, J., Hernández-Fernández, A. & Ferrer-i-Cancho, R. (2012). Random models of Menzerath-Altmann law in genomes.  BioSystems 107 (3), 167–173.
[ doi: 10.1016/j.biosystems.2011.11.010 ]

Clauset, A., Shalizi, C. R. & Newman, M.E.J. (2009). Power-law distributions in empirical data. SIAM Review 51, 661-703.
[ doi: 10.1137/070710111 ]

Corral, A., Deluca, A. & Ferrer-i-Cancho, R. (2012). A practical recipe to fit discrete power-law distributions.
[ e-print ]

Corral, A., Boleda, G. & Ferrer-i-Cancho, R. (2015). Zipf's law for word frequencies: word forms versus lemmas in long texts. PLoS ONE 10 (7), e0129031.
[ doi: 10.1371/journal.pone.0129031 ]

Dickman, R., Moloney, N. R., Altmann, E.G. (2012). Analysis of an information-theoretic model for communication. Journal of Statistical Mechanics: Theory and Experiment, P12022.
[ doi: 10.1088/1742-5468/2012/12/P12022 ]

Ferrer i Cancho, R. & Solé, R. V. (2003). Least effort and the origins of scaling in human language. Proceedings of the National Academy of Sciences USA 100, 788-791.
[ doi: 10.1073/pnas.0335980100 ]

Ferrer-i-Cancho, R. (2005). Zipf's law from a communicative phase transition. European Physical Journal B 47, 449-457.
[ doi: 10.1140/epjb/e2005-00340-y ]

Ferrer-i-Cancho, R. & Díaz-Guilera, A. (2007). The global minima of the communicative energy of natural communication systems. Journal of Statistical Mechanics, P06009.
[ doi: 10.1088/1742-5468/2007/06/P06009 ]

Ferrer-i-Cancho, R. & Gavaldà, R. (2009). The frequency spectrum of finite samples from the intermittent silence process. Journal of the American Society for Information Science and Technology 60 (4), 837-843.
[ doi: 10.1002/asi.21033 ]

Ferrer-i-Cancho, R., Forns, N., Hernández-Fernández, A., Bel-Enguix, G. & Baixeries, J. (2013). The challenges of statistical patterns of language: the case of Menzerath’s law in genomes. Complexity 18 (3), 11–17.
[ doi: 10.1002/cplx.21429 ]

Ferrer-i-Cancho, R., Hernández-Fernández, A., Baixeries, J., Dębowski, Ł. & Macutek, J. (2014). When is Menzerath-Altmann law mathematically trivial? A new approach. Statistical Applications in Genetics and Molecular Biology 13 (6), 633–644.
[ 10.1515/sagmb-2013-0034 ]

Hernández-Fernández, A., Baixeries, J., Forns, N. & Ferrer-i-Cancho, R. (2011). Size of the whole versus number of parts in genomes. Entropy 13 (8), 1465-1480.
[ doi: 10.3390/e13081465 ]

Li, W., Miramontes, P. & Cocho, G. (2010). Fitting ranked linguistic data with two-parameter functions. Entropy, 12, 1743-1764.
[ doi: 10.3390/e12071743 ]

Moreno-Sánchez, I., Font-Clos, F. & Corral, A. (2015). Large-scale analysis of Zipf ’s law in English texts.
[ e-print ]

Piantadosi, S. (2014). Zipf’s law in natural language: a critical review and future directions. Psychonomic Bulletin & Review 21 (5), 1112-1130.
[ doi: 10.3758/s13423-014-0585-6 ]

Prokopenko, M., Ay, N., Obst, O. & Polani, D. (2010). Phase transitions in least-effort communications. Journal of Statistical Mechanics: Theory and Experiment, P11025.
[ doi: 10.1088/1742-5468/2010/11/P11025 ]