Blogging by the numbers: The Big Number

One number to rule them all

How many blogs are there? This is one of those questions of incredible interest without any reliable answer. A few companies have been nice enough to provide information that give us some estimate of how large the blogging world is. Unfortunately, most people don’t take the time to decide how reasonable (or not) those numbers are.

A side note… My policy with this website is not to include external links unless those sites are seriously something you should check out. In other words, all these other sites I refer to but don’t link to do exist; I just hope you don’t stumble upon them because they’re useless crap and I don’t want to increase their Google rank inadvertently. Also, for those links I do have, there are probably other places I should have citations for data but I don’t include them because I don’t add multiple links to the same page. Just look around and the data is linked to somewhere else.

Some great data comes from the Pew Internet and American Life Project. One poll that has great blogging results was taken between March to May 2003 with about 1,500 people, making for pretty accurate data. Quick statistics review… The poll had a 3% margin of error with a 95% confidence interval, meaning that there’s a probability — 95% in this case — that their numbers are within 3% of the real numbers, not that any given result is +/-3% of its actual value. Furthermore, the poll was conducted by phone using random phone numbers, meaning the distribution of people who answered the question probably was close to that of the U.S. overall. (Don’t worry about bias from people without phones; they probably don’t have Internet access either.)

Given this fairly accurate poll, Pew reported that about 2% of American Internet users said they have a blog or web diary and about 11% of ‘net users read blogs and web diaries (the data is summarized in their report Content Creation Online). Some people ran with these numbers and found amazing statistics of their own. For example, did you know that 110 million people worldwide read blogs? Well, I didn’t and neither did those 110 million people. I’ll explain…

Let’s say 11% of American Internet users really do read web logs or diaries. And other polls suggest about 200 million Americans (about 2/3 of us) have Internet access from home, work, ‘net cafes, whatever. Do the math and BAM — 22 million Americans read blogs. But then some people take this way too far. Somewhere around 800 million to a billion people worldwide have Internet access, so you multiply 11% by 1 billion ‘net users and 110 million people read blogs. You can do the same math with the number of bloggers and you get about 20 million of them.

110 million blog readers, 20 million blogs, right? Nope, nothing is that simple. Look at LiveJournal’s statistics. By their numbers, more than 35% of LJ’ers are under 18 years old (graph below). Pew missed a significant part of the blogging population by polling only 18-and-ups, and everyone citing that Pew number as absolute truth missed what I think is the most important blogging group there is.

ljage

LiveJournal users shown by age. Taken from http://www.livejournal.com/stats/stats.txt. Nov. 12, 2004. The peak of this graph is at 18 years old, and about 36% of users are under 18.

Pew missed the mark by a bit. So how much did they miss it by? I’ll make a few guesses. That 35% of under 18 LJ’ers is probably an underestimate. Half of the people using LJ didn’t bother answering the age part of their profile, and I believe most of those non-responses are under 18. Furthermore, there are liars. You can see a little peak around 104 years old; those people put down their age as being born in 1900. And of course, if you’re going to have a publicly articulated persona, you want to make yourself look as cool as possible, and older is cooler. I’m surprised there aren’t more kids who are 69 years old, but then again how many of them can do the math and figure out they should be born in 1935 for that.

Remember that the Pew study was only about Americans and their Internet use, not the world. If 11% of Americans are doing it, you can be certain that number will not hold for the entire world. While Americans make up about 1/4 of the Internet population, they’re probably a majority of the blogging population. LiveJournal records use by country, and a little under 80% of blog writers are from the U.S. So even if 25% of ‘net users are American, we dominate the blog world. According to the NITLE Blog Census (which I use with caution as I’ll explain later), about 80% of their crawled are written in English, but about 1/3 of all web sites are English and 2/3 of English web sites are in the U.S. Anyone who simply extrapolates global ‘net use to blogging behavior will be off big time.

Graphs of each are below, and the key to the rankings are at the end of this page. Though it would be comparing apples and oranges, the NITLE distribution is almost exactly the same as the LiveJournal data (above). Someday it might be fun to delve deeper into the language/country blog differences, but this is as far as I’ll take it for now.

ljcountry

The top ten countries of total Internet users worldwide (Global Reach) versus LiveJournal users as a percent of the total. (rankings are at the end)

nitlelanguage

The top ten ranked languages from Internet language data versus NITLE BlogCensus language data as a percent of the total. (rankings are at the end)

One other significant issue is that of definitions. The Pew poll asked if people had read, written, or contributed to a web log, blog, or web diary. I don’t want to digress on such a philosophical problem, but people have different understandings of what a blog or web diary is. Did people say yes about writing a web diary when it was just their family’s web page? Does Slashdot count or not? Who knows. Problems like this always happen, so we’ll have to trust that people know a blog when they see one.

I had a chat with Mary Hodder with Technorati back in October about the big number. (FYI, Technorati deals with all information blog and that’s about it.) She said Technorati estimated the number of blogs to be about 12 million, and that they have over 4 million blogs indexed. BBC News recently had an article where they cited Technorati as saying there are 4.5 million or so blogs in existence. Funny, the BBC number is a lot like the number Mary cited for Technorati’s crawled blogs. I guess you can’t even trust reputable news sources for accurate blogging information.

It gets worse. Other “authorities” for blogging size are cited too often without reflecting on how they got those numbers. NITLE’s Blog Census currently has around 2 million indexed pages. I hope nobody is using their numbers yet. They’ve only gone though less than 5% of the over 5 million LiveJournal blogs (their <5% is less than the number of journals active in the last week). And like I said before, some parts of the sample, such as under 18 bloggers, are only evident in certain domains. Even if NITLE is using a 95% confidence interval, it’s meaningless if they aren’t sampling from the entire population.

Then there are the “other” polls… Most of these I question their methodologies. A few had open online polls, so you have no idea how representative the poll results are of the entire population (like the poll I cited in my last post. Others look only at LiveJournal or similar blog hosting sites without taking into account the non-blog-service-using people. Even with these complaints, at least those polls had enough intelligence to mention these facts along with their poll results.

What does Technorati do that the rest don’t? Their numbers are based on a few things. First, they have web crawlers made specifically for blogs, that use the links in those blogs to find other blogs and add the new ones to their search. They let people submit their blogs to the engine if it’s not already there. They also get “pings” whenever a new blog is created on certain blog hosting sites or when new blog software is installed on an individual’s site. With all this information, the data and estimates they give are probably the most accurate if any are to be trusted.

So how many blogs are there? I have no idea. If I had to wager, I would put my faith in Technorati’s numbers since I trust their methodology the most and since they have much to lose if they’ve got it wrong. Technorati also said (in that same BBC article) that the number of blogs is doubling every 5 1/2 months — 10,000 or so a day, a trend maintained for the last 18 months. That 12 million I cited earlier is probably closer to 15 million blogs now.

I’ll ignore other counting issues like private blogs (as in not publicly accessible), abandoned blogs, and fringe blogs (without incoming links so they can’t be crawled) except to say these make defining what is and is not a blog even more difficult. This stuff is pretty hard to do even with all the data that’s already out there.

Finally, Pew just today came out with some new poll results. 27% of American net users now read blogs — an increase of 58% from February (from political blogs?). Also, Pew says 7% of people now have blogs which is in line with Technorati’s prediction of blogs doubling every 5 1/2 months (comparing Pew’s Mar-May 2003 poll to the new data). That memo is pretty brief though; I’ll wait until there are some methodology details or a full report of their survey data before believing them. Unfortunately, this will not stop people from committing all the same errors I described above and reporting that now 70 million blogs exist and 270 million people read them worldwide. Just wait for it…

That’s enough for now. From here, I’ll get into more detail about blog readers and writers.


Top 10 rankings for the Global Reach country data.

  1. United States
  2. China
  3. Japan
  4. Germany
  5. United Kingdom
  6. Korea
  7. Italy
  8. France
  9. Canada
  10. Brazil

Top 10 rankings for the LiveJournal country data.

  1. United States
  2. Canada
  3. United Kingdom
  4. Russian Federation
  5. Australia
  6. Germany
  7. Philippines
  8. Singapore
  9. Netherlands
  10. Japan

Top 10 rankings for the Global Reach language data.

  1. English
  2. Chinese
  3. Spanish
  4. Japanese
  5. German
  6. French
  7. Korean
  8. Italian
  9. Portuguese
  10. Malay

Top 10 rankings for the NITLE language data.

  1. English
  2. French
  3. Portuguese
  4. Farsi
  5. Polish
  6. German
  7. Spanish
  8. Italian
  9. Dutch
  10. Chinese (big5)

Blogging by the numbers: An introduction

Damn you statistics!!!

I did some research this semester about blogging. The inspiration for this was my economics of information class. As a requirement for that class, we had a final paper and presentation about any economic/information topic of our choosing, so I decided to study the economics of blogging.

At this point, I want to make sure we all understand what economics is. The best definition I heard for economics is that it’s the study of the distribution of scarce goods. Most people confuse economics with money. Certainly money has a lot to do with economics, but economics is much more than that. Economics has a lot to say about utility and usefulness, distribution, production, and more.

In other words, I sought to find information that wasn’t simply about money and the business of blogging. Someone out there coined the word “blogonomics” as a bastardization of both words “blogging” and “economics.” Thankfully that word hasn’t been adopted; the person who came up with it (or at least was credited with it) was pandering for donations. This just shows you how little most people really understand about economics.

So I divided my efforts into a few parts. First, I needed to learn more about bloggers. Who is blogging? How many blogs are there? How fast is this growing? This alone is worthy of massive amounts of research to say the least of the two months I had to produce my final paper. I’ll start discussing that in a bit, but suffice to say the quality of blogging statistics is miserable at best.

Next, I wanted to get a little deeper into the nature of the blogging realm. Why are people blogging? How are companies using blogging as part of their strategies? Who reads which blogs, how is traffic distributed, and why? Rather than just plea to the Zipf curve, most people avoid the deeper implications of why these traffic patterns emerge and what it means for the blogging world.

I also did some of my own research into the function of the blogging realm. I wanted to look at linking patterns between blogs and the rest of the Internet. Do blogs exist in their own little realm or do they anchor themselves among the rest of the Internet or what? By looking at link structures, maybe I could get some sense about what blogs really do.

At the end, I was left with more questions than I started with. This was the most disappointing aspect of my work; you would like to hope that weeks of work would turn out some revelation but instead I was wondering where I could find other people to help me out. No matter though, I’ll offer you the same questions I asked myself hoping that maybe someone out there will get a clue and do this much needed work.

Somewhat related to the research I did for my economics of information class, I worked with a friend on a project for another class using this blog research as some of the basis. Before we began our research, I was showing him some polls I found on the Internet with details about bloggers. In particular, there was one poll that had… um… interesting results.

This poll used a methodology that made it completely useless. It was done by a search engine site, and most of the less than 1000 responses came from people who had registered with that site. In other words, this was a self selected population. When you deal with polling and statistics, the most important aspect is to ensure that your sample accurately reflects whatever it is you’re studying. When your poll takers decide for themselves to do it or not, you will never know how or if it’s biased. My guess is that only the most interested bloggers will take the poll, biasing it towards more participation and more frequent posting when the opposite is more likely.

This was lost on most readers however. Commenters loved it and said it was great. I can only feel bad for the people who use it to prove anything about the blogging world. There is NO WAY that 95% of bloggers post at least once a week. That’s why I won’t offer you a link to the poll. It’s crap. The only use it might have is for the site that did the poll, to get insight into the type of people who use the site. Given that number above, you should have no trouble identifying it, then closing your browser as soon as you encounter it.

Unfortunately, this poll is typical of blogging statistics. They’re loaded with hidden biases, skewed samples, and gaping holes that most people don’t care to look for before reporting them to the masses. So let me get this caveat out of the way. Quite possibly the numbers and information I’m going to provide in the coming weeks will be slightly off or outright wrong. This should not detract from the points I will make along the way. If there’s anything you should take away from these writings, you should think deeper about the numbers before accepting them as truth.

There are lots of great quotes about statistics I could use as a conclusion here, but I won’t. In fact, it’s quite possible that I’m here to mislead you with numbers and prove to you that I’m right and everyone else is wrong. But it’s also quite possible that I’m onto something, and if so, then I promise you I’ll be the most surprised one in the end. Next time, we’ll get into the numbers.

Targeting Toolmakers

Marvel Comics jumps the shark.

Marvel Comics just announced they’re suing Cryptic Studios, Inc. and NCsoft Corporation, the makers of the game City of Heroes. For those of you out of the loop, City of Heroes is an online role playing game that allows you to play as superheroes complete with superpowers and costumes befitting their super-ness. You can then take your newly created super hero along with thousands of other players and kick thug and villain butt in Paragon City for about $15 a month.

So why is Marvel suing them? City of Heroes comes with a well designed character creation system that lets you tweak nearly every aspect of your superhero — size, colors, uniforms, you name it and you can change it. Of course, this means that you can conceivably make a character that looked like an existing comic book hero, say The Hulk, give him powers just like The Hulk, and call him The Hulk, then let him loose on the virtual streets of Paragon City.

City of Heroes is a pastiche of every superhero thing the makers thought they could put into a video game. If there’s anything Marvel can rightfully be unhappy about, it would be people using their character names (for which they do have valid trademarks) in game. They’ve invested lots of time and money in Wolverine, and if a CoH character tried to play Wolverine like the comic book Wolverine, I would hope other players kick his butt for not being original using the superhero creation kit and absolutely give him the smackdown if he deviates even the slighest from character. However, I wouldn’t have an issue with a CoH character Wolverfellow that was a lot like Wolverine, that everyone knew was based on Wolverine, but that everyone knew wasn’t Wolverine but merely an homage to the character. For you CoH players, how many other players have you seen that look a lot like existing Marvel or other comic book characters? In general, how many comic book super heroes have you seen that have similar powers or seem almost exactly alike?

With respect to designing characters like ones that already exist, I say there’s a limited set of superpowers (ice, fire, energy, telekinesis, etc.) and so characters will repeat or at least seem a lot alike after the 10,001st one is created. Super strength and flying are common among superpowers, but that doesn’t mean that every superstrong flying character is modeled after Superman. And yes, DC Comics did sue Marvel in the 1940’s over the character Captain Marvel because his powers were too much like Superman’s. I’m certain that since then Marvel and DC have made other characters that were very similar but decided a lawsuit wasn’t worth the effort.

(An aside: Marvel and DC already claim they own the trademark “super heroes,” but I don’t know if that will hold up much longer. A quick search of the City of Heroes website turned up many references to the term “super heroes” but only on the user forums and in review quotes — none made by the game producers. You gotta wonder if they’re trying to avoid the term altogether so they won’t get sued.)

I guess this means that Pixar is Marvel’s next target. The powers of the characters in that movie resemble those of the Fantastic Four (no spoilers — Elastic Girl = Mr. Fantastic, their daughter Violet = Invisible Woman, and Mr. Incredible = The Thing) so Marvel should sue the shit out them, right? Those characters and the other references like the X-Men movie and characters, and, well, I don’t want to ruin the references, but there are many they’re all homages, not bait for infringement lawsuits.

But my greatest worry is that this is the tip of an iceberg of intellectual property lawsuits. Should Izzy Stradlin be able to sue Fender Guitars for making instruments that other people can use to learn and imitate his riffs? Maybe the makers of City of Heroes can sue Microsoft for making tools that let other people make games like City of Heroes. Or anyone can sue makers of CAD software because you can design nearly anything with those tools.

They’re blaming the toolmakers, not themselves. We can’t have a system where we always point the finger at the toolmakers when the blame lies with the tool users. This doesn’t absolve the toolmakers entirely — they still have to act responsibly and reasonably when issues like this come up. So for everyone from Google to gun manufacturers, peer-to-peer application and video game makers, we need to distinguish between people who make tools, people who use tools, and the tools themselves. Analyze the situation and blame the dumbasses appropriately.

Marvel is the dumbass in this affair, not the toolmaker or tool users. Marvel was dying until their movies resurrected them (check how their stock rebounded after the X-Men movie was released in 2000). Get a clue Marvel. Be happy with the royalties you’re getting from the movies and leave it at that. If anything, you can get some accounts in City of Heroes to steal new character ideas from existing characters in the game. And then the game users can sue you for improperly appropriating their creations. We all know you haven’t created any good characters since the new X-Men in 1975 and you ostracized all your good artists in the 90’s who reacted by ditching your lame ass company. Stupid, stupid, stupid.

Just for that, I’m not going to pay to see the next Marvel character based movie. Then again, I didn’t pay to see Daredevil, Punisher, and Hulk either, but they all sucked ass, so maybe Marvel’s problems run deeper than just a video game.

Distributed Desktop Searching

Google’s mistake means everyone wins! And a few more people lose.

One of the biggest problems plaguing the business world is the unwillingness of people to contribute to group knowledge systems. Plenty of products are built to share information between coworkers, like Lotus Notes, where users must add their documents to the system or nobody will never know about them. But users are lazy, so they either never add those documents, or put them in the wrong place, or don’t follow all the processes making the document inaccessible or unusable. These systems often cost lots of money, don’t work as well as they claim, and require non-intuitive methods for getting their so-called benefits

And then Google came out with their desktop search tool. Before you could say “hack this,” someone found a way to make those searches remotely. Now as irresponsible as I think Google is for throwing technology like that to the world, there could be an upside.

Most interesting documents never leave people’s PCs, and most groupware solutions have crappy search interfaces. So here’s my idea — distributed desktop searching. Install Google’s search tool on everyone’s PC, then install the remote search tool. Make a application that will take a search string, send the query to the PCs with the remote Google search tool, then assemble the results on a nice page. With the results, you can go to that person for the document or, even easier, click a resulting link and get the document yourself. This could put all those crappy groupware companies out of business and actually get you the files you need from your coworker’s PCs.

Since Google’s and presumably Yahoo’s and Microsoft’s will all be free, what will this mean for companies who depend on disorganization and the inability to find information for their business? I’m talking about groupware companies, consultants, IT managers, SIMS Masters graduates…

It’s just an idea… the first of many I’ll put here. Feel free to ignore it.

Google Bashing

“Google hacking” gets a new connotation

I tried to install Google’s new desktop search tool, but the installer didn’t work. It said I didn’t have enough hard drive space to install it, despite the fact that I had more than enough hard drive space to install it. After submitting the bug report, I got a response a week or so later that essentially said too bad.

Their tool captures everything you do on your computer, including emails sent and received, browser history, and all textual information on your hard drive (except WordPerfect documents apparently). It can take that information and let you run searches on your PC just like searching on Google’s web site, then combine those results with a search of the Internet, reporting your query to Google as well.

Google is going to be deluged with individual search habit information to a degree that they’ve never seen before. They (probably) know how people search the Internet, but now they know how people search their own computers. And the resulting information and popularity of the tool will put Google years ahead of any of their closest search engine competitors.

I don’t want a search engine on my computer, regardless if Google gets my search information or not, so I guess I’m happy that the tool didn’t work. But faster than you can say “script kiddie,” there are hacks for providing remote access to your computer’s Google desktop searches. One of these sites that described the trick warned that you shouldn’t use it for malicious purposes. Like that’s going to keep the hackers from using this.

Let me explain my fear. Google releases a tool that lets you search (almost) every document on your computer including, say, your Excel spreadsheet that contain password lists, your cached browser page that has your social security number on it, or the email that you got with your username and password for a shopping website. Just Google your machine for “password” or “username” or “SSN” or “credit card number” or “billing address” and see what comes up.

And now there’s an exploit that lets other people remotely query your machine using Google’s tool. People worry about what if they get a virus that turns their computer into a spam spewing zombie. Now you can worry that you’ll get a virus which will allow someone to search away on your PC for any information about you. I can’t wait until the first viruses that install Google’s new tool after infecting your machine. Just think of the rise in identity theft, stolen credit card numbers, cases of blackmail, and so on scaling in proportion to the rise of desktop search tools.

(Note: I’m calling this an exploit even if Google doesn’t (actually, I don’t know what they call it). If this was Microsoft, that’s what it would be called. As I see it, Google’s good name is the only thing keeping this off the radar.)

I think this could be the first of a series of similar tools that threatens privacy, security, and more. Well, maybe it’s not the first either. Gmail and other web-based email tools have a great exploit too — using search engines to answer the “security question” like “What’s your mother’s maiden name?” or “What’s your dog’s name?” when some of that information is easily searchable on the net. I know I’m not the first person to suggest that exploit, but what you should realize is that while the migration of search to the desktop gives you better access to your information, it also gives others better access to your information, your search habits, and, if used for bad purposes, your private information. Compared to Google’s desktop tool, RFID is just a UPC code.

Google scares me. Not because they’re evil, but because they’re throwing tools onto the ‘net without any regard for, well, without regard for anything as far as I can tell. The word “irresponsible” comes to mind. They’re like kids playing chemistry with the chemicals under the kitchen sink. Maybe there’s value in using the Internet as a research or marketing setting on a mass scale. But “beta testing” with anybody who wants to play with their tools means we can find the bad parts of their technology before they can fix them.

Now everyone is speculating on where Google is going next. Rumors include a Google branded browser or instant messenger. Google doesn’t want a browser. There’s enough competition in that market without Google; their toolbar is as involved as they want to get with the browsers. What Google does want is to be your portal to all the information on the Internet, your computer, and everything. They have two extraordinarily valuable assets besides their name – search technology and storage capacity. These assets stick out in all of their tools — the search engine, Orkut, Gmail, Froogle, image searching, etc.

If they are creating a “browser,” it’s not in any traditional sense of the word. I hate fortune telling, but I have a vision of something with IM and chat (based on Jabber that remembers and makes searchable all your conversations), community and social networking services (Orkut but using community information tied to their search engine info), email (Gmail), location based services (my sleeper prediction for their next avenue, eventually tied into community and general searches), and brute force searching power (including the not mentioned yet desktop and Internet searches) all built into a single (web?) application like Gmail. IBM had a prototype of parts of this in their Remail tool. Unlike IBM, if there’s anyone who can pull this off, it’s Google. And if Google can’t pull that off all at once, just watch the next few applications they release and you’ll see where they’re headed. Yahoo will be kicking themselves in the pants if (more accurately, when) Google gets to it first.

But if Google seems intent on throwing a new application to the world without some due diligence on their part, they’re only deluding themselves. And so I want to repeat my earlier comment. Google, the Internet is not your beta testing environment. You deserved more flak than you got after you released Gmail for the privacy concerns in that software, and I can only hope that your future technologies are put under even more scrutiny. Your glory days will not last forever, so you had better start thinking of new markets to wind your way into not based on your search or storage technologies. And Google, start thinking about social responsibility before you unleash these beasts into the wild.

Finally, when you get around to it, could you please fix that bug in the desktop tool installer? I’ve got some friends that I want to send it to so I can keep an eye on them…

The Law of Diminishing Interest

Sometimes, one opinion is enough for everybody.

Time for the much anticipated corollary of the Law of Diminishing Opinions.

The Law of Diminishing Interest

After opinions/statements/stories have proliferated about a topic, it will eventually be beaten to death such that no one will care about it any more.

Obviously this doesn’t apply to everything. Specifically, a few issues are so polarized, so passion invoking, that no amount of time will let it slide: abortion, hard-core Republicans vs. hard-core Democrats, are Bert and Ernie gay, stem cell research, is “Shiny Happy People” the worst song ever written. And personal blogs are an exception I’ll get to later.

So lets take John Kerry and the swift boats mess. It was hardly a month ago that every news program dedicated at least one story to the newest revelation of this story. Everyone had an opinion on it, the opinions proliferated, and the story broke under its own weight. We all got so sick of hearing “turned chicken and ran from the fight this” and “shot some gook in the back that” and now we would rather live the rest of our lives without knowing this ever happened.

Music provides a better example. Why does it seem like there are only ten bands that play on MTV or pop radio stations all the time? Probably because there are only ten bands playing all the time. Popularity in music has limits; only so many bands can be popular at once, so once a new one hit wonder comes along, an old one has to go away. We’re no longer interested in that old song. Our interests have moved on to the next big thing.

Let’s move to blogging. First, you already only can keep up with a few blogs at once. Your reading 30 or 50 RSS feeds and that already takes up hours a day, but at least it’s more manageable than checking 30 to 50 web sites a day. You can hardly keep up with those posts, so you skim most and only read a couple that matter most. Adding another blog is right out unless there’s an old one you can remove to make way for the new one.

Furthermore, you don’t want to read two blogs that cover the same thing, offering the same opinion. Just like the news, you can get all your news information needs sated from one, maybe two sources that you trust. Any further sources just rehash what you already knew. That’s why blogs have to differentiate themselves with witty opinions or pointless pining or random digressions. If you’re the same as everyone else, why should other people read what you write? Personal blogs are obviously an exception to this since they are unique as are their authors, but people with a panache for daily posting can quickly become overwhelming…

And even within blog posts, bloggers’ laziness is evident in their behavior. The ultimate props you can give is a link and maybe quote to someone else’s blog. This results in a single story propagating throughout the blogging realm with lightning speed and with dulling repetitiveness. I can’t count how many times I’ve now read the story about Bush’s so called wireless earpiece strapped to his back during the first debate. Actually, it wasn’t an earpiece. The battery pack that keeps Robot Bush animated slipped out a bit, and the operators couldn’t come on stage during the debate to slip it back in place.

The blogging echo chamber is the worst manifestation of the Law of Diminishing Interest. Here we have a medium that prides itself on interconnectivity and information proliferation. The result of this is repetition, unoriginal commentaries, and shameless self promotion. While the Law of Diminishing Opinion tells us that fewer and fewer new thoughts will proliferate the longer a topic languishes, the Law of Diminishing Interest tells us that we will care less and less about those opinions as time goes by. That’s not to say one of those tail opinions might be interesting or not, just that they’re lost in the noise as a result of bad timing. This reinforces the value of breaking news stories and being quick with responses to current events, both of which the blogging world are very good at. My point is that as opinions start flowing out about an event, we dilute the value of any one of them because there are so many opinions written and we can’t spread our attention that thin.

Surprisingly, these problems are partly solved because of our limited attention span — the fact that we can only keep track of so many (or few) sources of information at once. With so many information sources available, we ratchet our own information filters very high so that we don’t become overwhelmed by keeping up with two hundred web sites a day. The end result of this are Zipf distributions (power law, on a log-log plot it’s a line with a slope of -1) of traffic for the Internet as a whole and (as I hope to experiment with soon) blogs. Just like Bush’s tax cuts, the top 0.01% of sites get nearly one-third of all traffic. (Porn is different. A small number of sites (say, 1, 3, 5) is enough for most Internet search needs, but we need many more pages of porn to fill our, um, needs. Geoff Nunberg made this observation in one of my classes.)

But this distribution of blogging traffic means you put your blinders on. We like news sources that reflect our own ethics and political views so we congregate to those sites. So those few sites you do read are ones that reinforce your world view rather than expand it as the Internet idealists would prefer.

And thus I offer you a challenge. Start reading something that completely appals you. Democrats — try the National Review. Republicans — how about The Nation? Undecided or Independents, read The Week or The Economist depending on how short or long your attention span is, respectively. LaRouche or Nader people, I assign you to read The Constitution. I can happily admit that there is no greater educational experience than understanding your enemies. Not only will it reinforce your beliefs but hopefully it will also make you realize why you believe the things you do.

Simply put, the Internet has too much information to be useful unless you narrow your eyes a bit. This could be the ten pages you read most, the five search results you check after you’ve entered your query, or the several dozen pages you go to for your porn needs. Blogs are even more guilty of this than most, primarily due to their explosive growth (more on blog growth in future posts). Certainly we need better tools and methods for filtering what’s out there to a usable level. But for now, I think we would all be best served by having better porn search tools. Regardless of how interests diminish for most of the web, I think I can safely say that interest in pornography is something you can count on far into the future.

Logic flaws and gullibility

When does 1/2 and 1/2 equal 1/4?

Watching the MacNeil/Lehrer NewsHour a while back, I saw two members of the Senate Intelligence Committee talking about the report they completed regarding intelligence failures relating to the war in Iraq. They claimed part of the culprit was a groupthink mentality where everyone viewed the evidence with a predisposed conclusion that weapons of mass destruction must exist in Iraq. My immediate reaction was how 1984-ish the term “groupthink” is and whether or not I should just tune out the report altogether.

But then I listened a little more and was surprised at what was said. Apparently all the reports about unmanned vehicles spraying deadly chemicals or reconstituted nuclear arms programs or mobile biological weapons factories were tagged with caveats that were ignored to reach the conclusion that Iraq must be doing something bad. In other words, there was a possibility — or better yet a probability — that there were no unmanned vehicles or nuclear bombs and the like.

And finally the reasoning and logical part of my brain kicked in. If there were warnings that these reports could be false, then the probability of them all being true is less than any one of them being true. Remember probability? So take two bins, each with half red and half blue socks. You take a sock from one, then a sock from the other. The probability of picking two red socks is… 25% since it’s 50% for each bin, then you multiply the two together to get 25%. In other words, it’s less likely you’ll draw two red socks when you combine the probabilities.

Replace my above example with “true” for “red” and “false” for “blue.” So the reports of Iraq’s weapon stockpiles were possibly true or possible false, then the likelihood that all of the reports were true is less than the probability of any one being true. This triviality of mathematics didn’t stop the government from presenting all of this evidence to the United Nations as fact and reason for war. Nor did I see any other nation call the U.S. this. I suppose there weren’t any math majors as analysts in either the U.S. or any other countries.

All you conspiracy theorists need to take some math lessons too. Recent scares are building up to the moment when a world government will form and stamp out all freedoms? Chemicals in the water and subliminal messages on TV keep us subdued and pacified? Tin foil hats can reflect electromagnetic waves that aliens send down to brainwash us? That’s about as likely as frozen Walt Disney driving around America with Spuds MacKenzie and zombie Ronald Reagan running people down in their ’68 Cadillac Eldorado convertible.

We as humans somehow buy into these conspiracies. Magic bullets, the Illuminati, Santa Claus — they’re exciting compared to the dullness of reality. We’re willing to suspend our disbelief even if a situation is completely improbable. We’re creatures of suspicion; the simpler an explanation is for a situation, the less likely it’s the real answer.

Rather than plea to simple reason, we argue from fear, misunderstanding, and complexity. We’ve been doing this for ages, holding on to ideas that we laugh at today — that the sun revolved around the earth; UFOs crashed in the desert and were taken to Area 51; that Bert and Ernie are gay.

It’s harder to accept a simpler explanation due to pressures at the time keeping those explanations as the “truth” — because the earth was created by God and therefore must be at the center of the universe; that the government would obviously cover up the alien landings with a blanket of lies; two men who share a bed for decades (yet miraculously stay the same age) must be gay.

The reality of the situation may be boring, but at least we’re more sure of this than the previous theories — the earth revolves around the sun; a high altitude weather balloon crashed in the desert; Bert is a figment of Ernie’s imagination like that character Brad Pitt played in Fight Club.

Besides the human instinct to believe the unbelievable, two other related culprits at work here. The first is an error of selective judgement where, given a set of facts and observations, you come to a conclusion that is isn’t supported by those facts and observations. You can omit parts of your observations when coming to this conclusion, but the error is entirely in your reasoning about those observations rather than the observations themselves. You know, like how O.J. Simpson got off for murdering his wife. Of course he did it. All the evidence pointed to him. The gloves were “too small?” Yeah, right. The only person who didn’t know that O.J. did it was… well… who didn’t believe that O.J did it? I rest my case.

The second and far worse error is selective observation. Rather than coming to the wrong conclusion given a set of facts, the result of selective observation is a set of facts and observations that can lead only to a specific conclusion. Often this information is skewed, removes any observations that don’t support the conclusion, or even has fabricated information inserted when the results didn’t come out as expected. How about the Kennedy assassination? We all know there had to be multiple gunmen, but the government only believes in magic bullets so that’s what they concluded. Maybe once all the people involved die we’ll find out the truth.

The reason selective observation is worse than selective judgement is that when you make the wrong judgement, you can always go back to the facts and draw a new conclusion. When your observations are skewed, then there’s no way to guarantee a correct (or at least a better) judgement from those observations. In other words, the information that you chose not to commit (or to commit incorrectly) means all results based on that information is flawed.

Taking this back to the case of war in Iraq, certainly there was selective judgement on the part of the Bush administration to take and present the intelligence as fact without caveat. From the perspective of everyone else, we cope with the observations given to us by Bush et.al.; if we’re to believe what the government tells us, we have no other conclusion to draw except that Iraq has weapons of mass destruction. Unless, of course, we believe the U.N. weapons inspectors.

Maybe given more time, the U.N. inspectors would have turned up something. So here’s the final flaw that felled Bush’s arguments. Given this premise — Iraq has weapons of mass destruction — proving the affirmative is much more difficult than proving the negative. A word-bender for you: we can never be certain that Iraq does not have weapons of mass destruction. Think that over a few times. Said differently, we can never be certain that Saddam was right when he said Iraq destroyed all its WMD. However, we can easily prove they did have WMD simply by finding them. Some might claim that’s impossible too, but I say proving Saddam right is the more difficult proposal of the two. We can keep searching Iraq forever for WMD and never find them. And we will never be sure that there were never any WMD unless we find them.

Most people probably won’t think this deep about Iraq, probability, logic errors and the like (nor work themselves into confusion like I did in that last paragraph). However, I would desperately hope that our government is doing this kind of thinking. I like to believe that the steady decline in Bush’s job approval and agreement (or rather disagreement) on whether the country is headed in the right direction is the result of the American people are grading him on his logic and coming to a new conclusion of their own.

My political persuasions aside, the lesson here is to please take the time and become more math and logic literate. There is no better pleasure than laughing in the face of a person who can’t make a coherent argument or understand facts, statistics, and probabilities. Or make up words. Like “Kosovians” or “resignate” and “subsidation” or “subliminabable.” Because even if George W. Bush doesn’t excel at logic or statistics or forming cohesive statements, at least we know he’s creative.

Politics and technology

When blogging plays Hardball

It’s that time of year again. Elections are gearing up. Mudslinging ads are the same but come with the tagline “I’m (so and so) and I approve this message.” Political strategists and analysts get raging hard-ons big enough to knock out hanging chads.

In these times of debate and billion-plus dollar spending on campaigns, I find myself hooked on political coverage. Seriously — this stuff is the pure crack of our governmental process. When politics and technology mix, well, let’s just say it’s a mindblowing experience.

So feel free to check out the MSNBC Hardblogging web site at your own leisure. In short, it’s NBC correspondents blogging about the upcoming election and political party conventions. Sure, plenty of people are blogging about politics right now. However, most TV audiences and certainly these NBC folks are not acquainted with blogging. You can read some of the comments by authors and emailers as evidence of this. I suppose that’s why it feels more like a diary-type blog than any other.

I see Hardblogging as evidence of a change in the media’s understanding of their own business and the Internet. Without digressing about the politics of Fox News, their opinionated hosts no doubt are a large reason why people are attracted to their shows. I would love to see evidence of how the news world reacted to Fox News by inspecting the programming of CNN, MSNBC, and others before and after Fox went on the air.

What I mean is that news propagates at an insanely fast speed thanks to the Internet, cable news stations, cell phones, news choppers. and telegraphs; every news station will report on breaking news at about the same time with about the same information. The result is that news stations have to find other ways to differentiate themselves from their competition and increase their ratings. Fox obviously has its own way of drawing viewers. ABC News has promised their “ABC News Now” or something like that, trying to blitz people across all mediums — Internet, cell phone, TV, semaphore. CNN got the aid of Technorati in their blogging creation and monitoring efforts, but CNN’s blogs appear to be written by web site staffers rather than their TV personalities. BBC News is blogging from Boston. There was even a bloggers breakfast at the convention.

Is blogging the answer to the news media’s uniqueness problem? No. Besides the diary blog entries, the others are already very much like the “daily emails” you can get if you sign on to these news personalities’ web sites, or maybe an op-ed piece in a newspaper. Blogging doesn’t draws viewers the same way that opinionated news personalities do.

The question then is this — what viewers are the media trying to get by blogging? Younger audiences (think 18-35 yr. olds) are now using the Internet as a primary news source rather than TV and print. Certainly they’re the ones most familiar with blogging. Even though news organizations are playing catch-up, they still can recapture those people if they wisely pay attention to the news consumption habits of that demographic. Older audiences seem intrigued by blogging as well. Could this be the indoctrination they need to become part of the blogging culture? Remember that blogging and creative uses of the Internet were a large part of Howard Dean’s formula for success in the Democratic primaries.

Speaking of demographics, I would love to find out more about the kind of people accessing the Hardblogger site. Average age? Previous experience with blogs? What drew them to the site? How does the blogging experience differ from getting news via plain old broadcasting or newspaper? Are there TV watchers who would like to see the web site but can’t?

Regardless, the problem here isn’t with the news media trying to break into new formats or get new audiences. Rather it’s their lack of understanding about how people consume news and other information and, more important, how people want to consume their news. There are times we want it hard and fast, and others deep and long. However, we aren’t given that choice by most news outlets; they present it superficial and at a 7th grade level except in extreme occasions (like election coverage or September 11th).

This perhaps explains the rise and popularity of news blogging. We’re tired of the dried-up, half hour versions we get at various times during the day. We’re also tired of all news, all the time — which is really just the same news over and over again every hour. And in both cases, they still talk down to you as if you’re a baby.

In better news, I think we’re finally past the point of expecting our news to be unbiased and impartial. Viewers perceive the media as generally liberal, whether it’s true or not. Maybe there are journalists out there who try to be objective, but in our post-modern times we’re aware of and try to see through the spin. Even the media themselves, in a recent Pew Institute study, labeled themselves largely independant but with more liberal than conservative reporters (emphasis on labeled themselves). I would go further to say people are actually interested in getting opinions as part of a deeper analysis of their news (see Fox News above). How many times have you turned on the O’Reilly Factor just to get angry or enjoy what he says? The popularity of opinionated news blogs could also be evidence of this.

To all of you blogging-watching types out there, keep your eyes on this example. This will not be the first break-out of blogging into a new audience, and there certainly will be bigger experiments to come, but election time blogging is unique enough to warrant special attention.

Also, keep an eye on how the news outlets adopt the Internet and its related technologies. They’ve been very conservative as of yet in their approach to the Internet; a news web site reads similarly to a newspaper. With blogging, faster speeds (think video and audio streaming), messaging and forums, and more advents to come, it’s about time they realized that the Internet can be more than a reproduction of the TV (or newspaper or whatever) arm of a news network. There’s still room for a traditional Internet face, but they will flourish once they realize the value of their archives, backstage activity, and opinions made available through the Internet. The real revlolution will come when the news media can activate their valuable audience — getting them involved in presenting, discussing, and debating the current events of the day. What better way is there to capture an audience than to make them part of the show? And maybe that best of all explains the media’s interest in blogging.

For now, take joy in the electoral process since it is the culmination of our democracy. Oh yeah, give Hardblogger a few hits and watch it unfold. I only wish they would wise up and offer an RSS…

The Cost of Privacy

It’s about five dollars

This saga starts at the San Francisco Farmer’s Market. It’s held at the Ferry Building, at the north end of Market Street in downtown. Every weekend hundreds (thousands?) of people and dozens of booths make this a nice place to do your shopping for fresh foodstuffs if you live in the city. There was no lack of tasty treats for whatever your appetite desires from what I saw.

As I was walking over to the building, I passed by some artists showing their wares and a randomly parked BMW Z3. The BMW was being offered in a raffle. The only requirement for entry is writing down your name, phone number, address, and a couple of other random pieces of information (like age, email) on a little sheet of paper then putting that paper in the appropriate box.

That’s all? Just a little information about myself? Hm… Well, I’ll probably get some telemarketing calls and maybe some junk mail, but for the chance to win a nice, new BMW…

But wait a minute. They’re giving away a BMW for nothing. I mean, I’m not stapling a $5 to the sheet I drop in the box. And the car costs a lot of money. I don’t know exactly how much, but I’m sure it far exceeds what’s in my bank account right now.

So, the BMW-giver-awayers must be getting something of value to cover the cost of the car, right? In other words, if all I’m giving them is my name and such, then that must be worth something. Something as in dollars.

Like five dollars.

Random uninformed numbers to make my argument seem logical: Say they’re giving away a $25,000 car. 5,000 people enter. That means the value of each name is about $5, and certainly more because the people giving away the car must be making profit on our names and information or they wouldn’t have much incentive to invest the $25,000 to invest in the car in the first place.

Your personal information has value. Whole industries are built on this — collecting information about you, what you buy, your demographics, your friends, and what raffles you enter where you give out your name and address.

Another example: Some guy was out on the street giving away free Domino’s personal pizzas to whoever would fill out his little form, presumably for a credit application or something similar. Again, about a $5 investment because us students are high risk for credit card companies, where the credit card company is sure to make back that $5 investment in interest payments. And I was hungry too…

While those are examples of giving out information, most “invasions” of privacy are much more subtle: associating your credit card number with your grocery purchases to build a profile of your shopping; using cookies or spyware to track your web surfing habits; the cameras the government planted in my glasses to keep track of everything I see and do.

Now that I think about it, those “savers cards” you get at the grocery store that they use to track you usually save me about $5 when I use them…

Legally, you don’t have a reasonable expectation of privacy in public spaces. While I agree, I think that anytime it takes more than an individual human’s effort to track what you do, it should be illegal. In other words, computers have made “invasions” into privacy much easier — storing your purchases in a big database available at the click of a mouse. If a person was following you around, writing down by hand everything you did, I would be OK with that. But if it took two or more people to do it, or one person plus a computer (scanning your groceries at the register), then no — that’s too much. Anyway, I think it would be funny if you went to the grocery store, picked up a basket or cart, and then picked up your person to follow you around and record all your purchases.

But again, we value our privacy, and our privacy has a value. Therefore, that grocery store guy can follow me around and write whatever he wants, but I get a free pound of tuna steak. That’s right — tuna steak. I can’t afford the good eats on my income (or lack thereof). My grocery store privacy is worth at least $15 of tuna.

My Internet privacy is up for sale too. You can have it at the small cost of a high-speed dedicated connection (DSL or cable modem, your choice) plus $1 per hour of surfing payable to my PayPal account or in Amazon.com gift certificates. And I promise you it’s good stuff too.

Anyone else interested in buying other aspects of my privacy can inquire via email using the email link at the main menu. Other suggestions include: friends and associates, music listening, TV watching, sleeping, and eating habits (including restaurants, alcohol, and snacks — it’s a bargain!).

What? They’re already tracking that information? Shit… In that case, live streaming videos from my apartment are available at the low, low cost of $5 a month. Though I promise you nothing worth even that much is going on here…

Random reactions from CFP

You knew this was coming…

My fear, noted in my previous rant, has been realized. In short, the choir was in attendance at the 2004 Computers, Freedom and Privacy Conference and was summarily preached at. Not even Slashdot, stalwart of (libertarian?) technology news, had a story on CFP. I suppose conferences are not the proper venue to invite the general public to learn more about these issues.

A friend of mine would quickly add that the troops need time to discuss strategy among themselves, to be made aware of the goings-on in their individual camps. While I agree, this raises the question of when should the focus change from rallying the troops to stirring passion on the home front.

The realization that my fears had come true occurred as I was having a conversation with a non-technical non-lawyer after the conference proceedings. He noted the lack of a primer for people interested in these issues such as himself; even if you’re interested in the issues, most of the conference will go over your head if you don’t understand the vocabulary we use or if you’re not aware of the current events or if you don’t know the laws and policies involved.

So several of us will take it upon ourselves to find a solution to this problem. This gets back to that previous rant, namely coming up with ways to get other people to care. Education is a necessary part of that. The lack of pedagogy is alarming to me, and (of course) I defer voicing my opinion on pedagogy until some future rant.

But here are the major themes as I saw them that were presented at the conference as well as questions I was left with, including extra cynicism (cynicism you’ve all grown to love and cherish by now I hope…).

  • The lack of coordination between law, policy, and technological efforts

    I think someone at every session suggested or outright said that we need more interaction between the different camps (lawyers, policy makers, technologists, industry, etc) to reach better solutions. But wasn’t this the point of CFP in the first place — to foster exactly this communication? If so, then why is this communication not happening? My guess is that we’re too busy in our own little worlds to find time to do this large scale coordination… Maybe they’ll fix this by the next CFP (yeah, right).

  • “National security” as the new Catch-22

    Any time that something seemed questionable, like collecting databases with info about you and using them together to determine if you’re exhibiting terrorist behavior, the “national security” exception was invoked. You can’t question this without being unpatriotic, and no patriotic person would question the need for greater national security, right? “National security” is also like that newsgroup law about Hitler — as soon as you mention Hitler, the conversation is over because there’s nothing you can say to come back against Hitler (just like from Office Space — “You know, the Nazis had pieces of flair that they made the Jews wear”). Once someone cites “national security” as the justification for action, all other arguments lose merit.

  • Lack of research funding, interest in pursuing such research, and research in the wrong areas

    Doug Tygar, U.C. Berkeley professor, hit the nail on the head with this one. Nearly all of the problems presented at CFP would benefit from deeper research. Not only do we need to find money and people to pursue these topics, that also implies we should trim other less fruitful but related areas of research (trusted computing comes to mind).

  • “Clarifying” versus “disagree”

    The conference was remarkably civil, despite the very brave representatives from Microsoft, DirecTV, the Bush Administration (the Department of Justice), and more willing to play in the lions’ den. Rarely the civility broke down and people got a little angry. But the essence of this comes down to one word: “clarify.” If I disagree with you, I start a comment with “I want you to clarify…” rather than “I don’t agree…” because I assume some people have to try really hard not to start a Jerry Springer-like moment while on a conference panel, no matter how funny such an event would be.

  • Definitions

    For a conference called “Computers, Freedom, and Privacy,” I only know what computers are. Freedom and privacy are too broad and relative to discuss without having definitions for them. This is especially important as those definitions change in different contexts — privacy in email is much different than privacy in web surfing habits. I’ve got a rant around here somewhere that deals with the definition of freedom — I’ll get around to it some other time…

  • Unsophisticated users, sophisticated systems and laws

    Users are stupid. Technologies are too complicated and arrive too rapidly for individuals to learn to use them successfully with respect to laws and privacy and the like. Also, most users don’t understand laws, licenses, copyrights, and legal issues surrounding these technologies. Are we somehow responsible for teaching people these issues? Or should we aim for the lowest common denominator and dumb down technologies and laws? I am truly dumbfounded for how to solve this, but get me a bottle of whiskey and a computer and I’ll gladly provide words on the subject… or just wait a few more weeks until I formulate a legitimate opinion and put it up here.

  • The Internet and computers have a long and deep memory

    Exemplified by Gmail (1 GB of email storage) and the Internet Archive (storing the web since 1997), people don’t realize that pretty much everything that we do on the Internet or computer networks is stored somewhere. Even if this is mostly limited to the web now, it will very quickly expand (if not there already) to email, instant messaging, voice-over-IP (Internet telephony), and all other present and future net communications, and even things we don’t usually associate with computers like your purchasing records, travel plans, health information, financial statements, and so on. The scariest example of how information like this can be used against you at the conference was how your credit information can be used to deny you insurance or even jack up your rates if your credit history makes you seem like an at-risk individual or how that credit info can be used to discriminate against groups. Even though nobody explicitly stated the problem as I did here, this is the sleeper issue that worries me most from the conference. Information can be used against us just as it can be used to help, but what if information is permanent and can be collected from disparate sources? Think if Orkut and Gmail shared information and fear Google. I think I’ll revisit the information permanence problem later…

  • Can or should digital technologies reflect analog systems?

    This is the big question I left with after the conference. Many of the presentations implied that while we want the benefit of digital systems, we also want all the capabilities of physical world systems. For example, electronic voting systems should have some human verifiable or auditing method for performing recounts just like recounting paper ballots. Can we really have it both ways in every case? If we can do it, that doesn’t mean we must do it…

  • The philosophers, social scientists, economists, average joes, etc…

    Much of the conference focused on three things: law, policy, and technology. This ignored many important other parts like social repercussions of technology, law, and policy change, economic aspects that affect the development of such things or measurable results of changes, philosophies that underlie our beliefs, or even the beliefs of everyday people (not in attendance at the conference). Many important opinions and points of view were missing from the conference as a result. Hopefully future CFPs will take this into account when inviting panel participants. On a similar note, I don’t think I heard the word “ethics” mentioned once, even if that was the subject of nearly every discussion.

I leave you with the eternal question that plagues my mind: If nobody cares about these issues, should we do anything about them? Most people scoff at the question — “Well of course we should do something” in the kind of way that implies we know what’s better for them than they do for themselves. The deeper implication of the question is whether or not everyone should care about these issues. If so, what can we do to achieve that?

Answers, of course, are left as an exercise to the reader.