Categories
BlogSchmog

Power Laws, Pareto Distributions and Zipf’s Law

Mark Newman wrote an article in 2005 for Contemporary Physics examines systems from a mathematical perspective, relying on manipulations of power law equations. As was the case with his two talks at NetSci 2006, Mark was pretty effective in explaining the nuances of power law … even if this article also crystallized for me by desire to not dwell in this part of the domain more than I have to.

In a power law, a normal Gaussian distribution is replaced with a long tail that flattens into a straight downward-sloping line when viewed logarithmically. This special form of exponential distribution has the property of being scale-free, describing a system in the same way at every level of examination.

The equation approach to analyzing data is superior to plotting all data points, and Newman explains ways to arrive at estimations for the value of the exponential in a given system. Two techniques — logarithmic binning and cumulative distributions — address the problem of noise being present at the end of the long tail due to a smaller sample size there. The mathematics attempts to provide tools to address special nuances of power law systems, such as discrete values (integers, rather than real numbers) and an absence of species death (the agents live on, even if nothing new is added to the general population).

Newman also discusses several examples of power law distributions in nature and society:

  • Word use frequency
  • Academic paper citations
  • Web hits
  • Book sales
  • Telephone calls
  • Earthquake magnitude
  • The size of moon craters
  • Intensity of solar flares
  • War deaths
  • American wealth
  • Family names
  • Population of cities

In some systems, Newman considered the power law property to be “unconfirmed” due to the requirement to make a subjective judgment in order to extract a value for the exponent. Indeed, there are other kinds of distributions that occur which are not power law. Couple relationships (exponentially distributed), population of North American birds (log-normal), email in an address book (stretched exponential), and forest fire acreage (exponential cut-off) are all systems that are not power law distributions.

I’m not done with this article. One of the projects I am volunteering to do is try and trim this long article down to a 5-8 page version that is more accessible to the introductory class, filled with non-mathematicians like me who — despite Newman’s best efforts — will find the math manipulation a burden.

By Kevin Makice

A Ph.D student in informatics at Indiana University, Kevin is rich in spirit. He wrestles and reads with his kids, does a hilarious Christian Slater imitation and lights up his wife's days. He thinks deeply about many things, including but not limited to basketball, politics, microblogging, parenting, online communities, complex systems and design theory. He didn't, however, think up this profile.