Possible origin of power-law behavior in [Formula Presented]-tuple Zipf analysis

András Czirók, H. Eugene Stanley, Tamás Vicsek

Research output: Contribution to journalArticle

12 Citations (Scopus)

Abstract

In [Formula Presented]-tuple Zipf analysis, "words" are defined as strings of [Formula Presented] digits, and their normalized frequency of occurrence [Formula Presented] is measured for a given "text" (sequence of digits). In the case of various non-Markovian sequences, the probability density of the frequencies [Formula Presented] has a power-law tail. Here we argue that a broad class of unbiased binary texts exhibiting a nonexponential distribution of cluster sizes can indeed yield a power-law behavior of [Formula Presented], where we define clusters to be strings of identical digits. We support this result by numerical studies of long-range correlated sequences generated by three different methods that result in nonexponential cluster-size distribution: inverse Fourier transformation, Lévy walks, and the expansion-modification system. Our calculations shed light on the possible connection between the Zipf plot and the non-Markovian nature of the text: as the long-range correlations become dominant, the probability of the appearance of long clusters is increased, leading to the observed "scaling" in the Zipf plot.

Original languageEnglish
Pages (from-to)6371-6375
Number of pages5
JournalPhysical Review E - Statistical Physics, Plasmas, Fluids, and Related Interdisciplinary Topics
Volume53
Issue number6
DOIs
Publication statusPublished - Jan 1 1996

    Fingerprint

ASJC Scopus subject areas

  • Statistical and Nonlinear Physics
  • Statistics and Probability
  • Condensed Matter Physics

Cite this