Categories
BlogSchmog

Freebasing Info

I finally received my Freebase invite over the weekend. After viewing the initial tutorial video to get my bearings, I’m ready to jump into this communal database project as a fall side project. Freebase (as in “Free + Database”) follows the basic wiki model of a communal editorial staff of writers feeding the site. Unlike a wiki, though, the content being captured is encapsulated in a formed database with an API intended to reuse this data in many ways.

Tim O’Reilly blogged about this last March, and I’ve been patiently waiting for my invite in the interim. It came (along with one from Spock) while I was under the dark side of the Internet moon the past few days. After viewing the initial tutorial video to get my bearings, I’m ready to jump into this communal database project as a fall side project.

Freebase (“Free + Database”) follows the basic wiki model of a communal editorial staff of writers feeding the site. Unlike a wiki, though, the content being captured is encapsulated in a formed database. It isn’t that members compose open prose about a place, event or person; They do so in small cells of data that are structured in a way to ease retrieval and re-use. One of the big motivations for this project is not consumption of data at the Freebase site but rather application data through third-party development using the Freebase API.

As with Wikipedia, contributing members spew forth information into the community site about films, sports, politics, music, science and anything peripherally connected to that knowledge. The structured data not only allows for easy retrieval but also is a nice way to explore and improve the site content. Since everything is cross-linked by content and data type, a navigable network forms as you edit. All Freebase data is licensed under Creative Commons, so its use only costs an attribution link.

At the moment, the young project boasts data profiles on 356K people, 60K towns, 23K films, 9K books, 305K musical acts and just 600 websites. This is likely the result of targeted data scrapes and dataset uploads, as well as community “data mob” projects, like the current Hometown Pride effort to expand the depth of member locales. But it is impressive that there are already 902 types of data and some 2.4 million topics created.

The Freebase project is an extension of MetaWeb, a San Francisco company dedicated to building a better infrastructure for the Web. The MetaWeb team includes people with experience working on Netscape, Alexa, Intel and Broderbund. More information is available on their FAQ.

On the surface, Freebase seems like a latecomer challenge to Wikipedia to become the open repository of knowledge for the globe. It might also be viewed as a redundant effort of the Open Directory Project, a comprehensive human-edited directory of the Web, or even AboutUs, a one-year-old wiki of websites. However, Freebase is in many ways an extension of these tools or perhaps a bridge between them. It would be wonderful if duplication of data in MetaWeb could allow for more and better hooks into these other tools by connecting the things that do overlap.

What will be most interesting to observe as Freebase grows is how its content ultimately reflects use of the MetaWeb data through third-party API. If a car enthusiast site, for example, suddenly started leveraging MetaWeb, then the Freebase site might become a leading resource for information about both cars and the people excited about taking care of them. Locally, if Bloomingpedia—celebrating their third year of existence detailing our community history—were to find a way to port their data into MetaWeb, the view of the world as seen through Freebase could have Bloomington, Indiana as the center of the universe. The inevitable attraction of spammers and legitimate business marketers will also be a sign of growth and acceptance. As an informaticist, that might be an interesting study to connect use of API with body of data.

As with most Alpha tools, there are glitches and aggravations. I wasn’t able to successfully upload an avatar image for my profile. Search completed with no results but a “Search in progress…” message remaining on screen. The link to Woodstock, Illinois worked (though it took a long time to load) but the one to Bloomington, Indiana did not. The feedback tool also generated a “transient server” error, so I couldn’t even use the site to let MetaWeb know about these things. Growing pains, all. It doesn’t detract from the promise this new community might have in organizing Internet content.

Related news: There was apparently an explosion in downtown San Francisco yesterday that knocked out power, affecting Freebase a little. No data was lost, except whatever was in the Sandbox at the time. Yet another reason I am glad I could fly directly to San Jose.

By Kevin Makice

A Ph.D student in informatics at Indiana University, Kevin is rich in spirit. He wrestles and reads with his kids, does a hilarious Christian Slater imitation and lights up his wife's days. He thinks deeply about many things, including but not limited to basketball, politics, microblogging, parenting, online communities, complex systems and design theory. He didn't, however, think up this profile.

5 replies on “Freebasing Info”

Thanks for the great write-up, Kevin. Freebase has a lot of facets, and you really did a nice job of covering the big concepts.

Two thoughts: 1) If you have contacts at Bloomingpedia, and they’re interested in getting their data into Freebase, shoot me a note and I’ll see if we can help make that happen. 2) Although we are in alpha, and errors do happen, you encountered far more trouble than we expect at this stage. We’re looking into it! Meantime, if you hit any additional problems, please don’t hesitate to let us know.

Finally, as I’m sure you know, you’ve linked to a handful of pages in Freebase that, at this point, people can see only if they have accounts. If any readers here would like access, drop me a note: sarah at metaweb dot com.

Best,
Sarah

Thanks for the feedback. I’ll definitely ping the Bloomingpedia folk about interacting with Freebase. I have a feeling this year at the School of Informatics will have more focus on implementation by ad-hoc student groups, and helping connect the local content and MetaWeb API might be a great group project to put on portfolios.

Maybe the restricted links will encourage people to register and join the community. If so, then I meant to do that.

Interesting. Actually, I had thought about having Bloomingpedia be more structured originally, but in the end simplicity won out and a wiki made the most sense. This idea really took off [1] back in 2004 as something called the EverythingDB that I was going to make for Kevin Beachamp’s Cybrcaf. But I realized while making the schema that trying to make a database for everything was needlessly complex. I came across SeattleWiki and thought it works much better because it doesn’t require people to learn much. And that gives people the free-est access to knowledge and ease of contributing more knowledge. Its enough to ask someone in any community to spend some time contributing content, if they become obstructed by needing to learn how we’ve structured the site, it makes them that more likely to not contribute.

I decided in the end to make Bloomingpedia a wiki like Wikipedia. I actually choose Bloomingpedia over a name like Bloomington Wiki or something like that because I didn’t want to tie it to any one technology. So using something like Freebase to start a repository of more structured information is fine, maybe we could have something like db.bloomingpedia.org. The point is that Bloomingpedia can grow beyond just being a wiki. Like I said in my presentation on Saturday, there is room for ancillary projects that enhance Bloomingpedia.

The thing is, what is Freebase really? Are there any examples? Are people using it yet? Wiki is here already and proven and took about 12 years to become mainstream. Surprisingly, it existed before the first web explosion, but it took until the second one to become mature and useful.

In the end, I think its more important what visitors get out of the site more than what goes into it and how its structured. Will something like Freebase be more or less useful to people or is the normal structure of a Wikipedia/Bloomingpedia article satisfy 95% of the visitors. I don’t know that yet, but I’d be inclined to believe that effort into Freebase will be useful to only that 5% or so. Plus, will they even allow the software to be used independently of their site like the Wikimedia Foundation does?

Also, for a site that quotes Stewart Brand, it sure is hard to get at the information that is there now. Sounds more like vaporware. Plus from reading about it on Wikipedia, they don’t use a relational database for the backend, they use a graph database. Other database structures are ok, but I find that other ones are there for not wanting to design the database well. You don’t need other database types to do what you can do with relational databases. You just have to think about what you want to do. When I made the relational schema for EverythingDB, there were some challenges to overcome, but nothing that wasn’t solvable with planned design. I think what it really took was trial and error. I can show you what I did sometime if you want.

[1] – Actually, I’ve thought about the concept behind Bloomingpedia since I was teenager as far as making more detailed documentation about a locality. But the web wasn’t popular back then. Bloomingpedia seemed like such a natural thing to do though when I came back to the idea three years ago. I enjoy it beyond just managing it.

Hey, Mark. I’m the community manager for Metaweb, and I can answer a couple of your questions about Freebase.

First, I’ll note that this isn’t a sales pitch! If Bloomingpedia’s current software covers the users’ needs, that’s great. If at some point, you or the community wanted a more structured adjunct to what you’re doing, or one connected to other data sources, Freebase would be an option.

We do let you use the data in the system for applications hosted elsewhere. In fact, supporting those efforts is a core focus of ours, and we have an API and a templating language to help people build sites with Freebase data. In addition, all the data in the system is available under a Creative Commons by Attribution license, which means it’s open to anyone to use. The Fb software itself isn’t currently licensable by anyone the way wiki software is, but with our system, the data is the interesting part, not the input mechanisms. (Of course, if a lot of people tell us they want to use the front end on their own sites, we’ll look at that.)

There are a handful of Fb-powered applications in the wild, but I’ll admit that they’re not really open yet: to use them, you need an account on Freebase, and we’re still in an invite-only alpha. That’s going to change very shortly, however, and I think (and hope!) that we’ll see a big uptick in apps once anyone can view the data.

On the graph vs relational db question, I’m not a database expert, but we have chosen this route for the flexibility it offers in modeling data of many kinds (and later querying it). I’m reinvigorating our blog this week, and I’ll see if we can post something useful on this issue.

Best,
Sarah

Comments are closed.