Demonstrated today in our Media Ventures class: Autom. The speaker is Cory Kidd, formerly of the Media Lab, now an entrepreneur based in Hong Kong, where software talent is cheaper and local government incentives are available (including free rent) to set up his business.
Other posts about my MIT Sloan Fellows experience:
You may remember a brief preview at beginning of the fall semester of my Linked Data Ventures class that was taught by Tim Berners-Lee. In the months since that post, we really rolled up our sleeves and got into the concepts and languages that support the Semantic Web — and also created real applications and business ideas based on Semantic Web/Linked Data.
TBL taught some of the classes, but we also had some great technical sessions with Lalana Kagal and Ian Jacobi from MIT’s CSAIL as well as business sessions with Reed Sturtevant and Katie Rae. Another organizer for the class was K. Krasnow Waterman, a 2006 MIT Sloan Fellow who told me about the history of the Linked Data Ventures Class when I met her at an alumni reception in New York earlier this month.
In addition, nearly every week, we had guest speakers who work with these technologies or develop companies based on Linked Data, including OpenCalais, an RPI faculty member named Jim Hendler who has worked on the federal government linked data initiatives, and numerous startup founders.
But what I wanted to show in this post was a summary of what we learned, from the point of view of someone who started the class with only a vague understanding of what the Semantic Web was. Here are some examples from my homework assignments for 6.898 in the early part of the semester (Note: There may be mistakes!). At the end of the post, I offer some concluding thoughts about the class and the broader SemWeb ecosystem.
My circles and Arrows diagram for assignment 2. The goal was to get us to think about relationships described in a paragraph of text in terms of subject-predicate-object “triples”. Here’s the assigned text:
Joe Lambda, a 25-year-old man, has a FOAF file. Joe has an AIM account “jlambda”, and a Jabber account “firstname.lastname@example.org”, which is also his e-mail address. Joe is a graduate student at Foobar University, a university in the Cambridge, Massachusetts (42.373611°N, 71.110556°W), the homepage of which is located at “http://foobar.example.org/”.
Joe Lambda has two friends, Bill Foo and G. Baz. Normally, Joe lives in Somerville, Massachusetts (42.3875°N, 71.1°W), a city that borders Cambridge, with Bill. G. Baz is their neighbor. Joe, Bill, and G. have a number of different interests, but are all interested in Linked Data. Joe is also interested in Astronomy, and Cricket, Bill also enjoys American Literature and Baseball, and
G. is interested in the TV show Arrested Development and Hockey.
And here’s the diagram:
Then, we moved onto the languages, starting with turtle/n3, which identifies SPO relationships in a more human-readable format than the XML-based RDF. A brief, imperfect sample, based on the text from assignment 2, above:
We also designed our own ontologies, which define words, relationships, and other Semantic Web concepts relating to various topic areas. RDF and turtle/n3 graphs can then reuse ontologies for specific graphs (this is what the @prefix code refers to in the previous example). In the following example for assignment #4, we had to create an ontology for top-level biology definitions. Mine looked like this:
Fungi a owl:Class, [ a owl:Restriction; owl:minCardinality "1"xsd:nonNegativeInteger; owl:onProperty cell ] . Fungi owl:intersectionOf ( Eukaryote Species ).
Plants a owl:Class, [ a owl:Restriction; owl:minCardinality "1"xsd:nonNegativeInteger; owl:onProperty cell ] . Plants owl:intersectionOf ( Eukaryote Species ).
Bacteria a owl:Class, [ a owl:Restriction; owl:Cardinality "1"xsd:nonNegativeInteger; owl:onProperty cell ] . Bacteria owl:intersectionOf ( NonEukaryote Species ).
Archaea a owl:Class, [ a owl:Restriction; owl:Cardinality "1"xsd:nonNegativeInteger; owl:onProperty cell ] . Archaea owl:intersectionOf ( NonEukaryote Species ).
Protists a owl:Class, [ a owl:Restriction; owl:Cardinality "1"xsd:nonNegativeInteger; owl:onProperty cell ] . Protists owl:intersectionOf ( Eukaryote Species ).
… But unfortunately it did not map too well to the ideal solution that we were shown after we handed it in. Creating a model of these relationships depends heavily on logic as well as an understanding of the capabilities of OWL, the language that ontologies are written in.
Finally, we learned the Semantic Web query language, SPARQL. I had taken a SQL class years ago at the Boston College Woods College of Advancing Studies, and this experience was a good introduction to SPAQRL, which basically involves generating new graphs of data from existing triples in a very SQL-like manner.
The following SPARQL example that was shown to us in the class lab generates a list of countries from a triplestore based on the CIA World Factbook and restricts it to countries with a certain area and population:
But the class wasn’t just about learning these languages and concepts. For the second half, we were tasked with forming teams and developing an actual application and business model built on Linked Data. The instructors for this segment were Reed Sturtevant and Katie Rae, but we got a lot of feedback from Tim Berners-Lee, Lalana Kagal and Ian Jacobi during the practice demo in late November. Startup founders and angels gave us some additional feedback on demo/pitch day on December 7. Our team consisted of two Sloan Fellows and an undergrad Computer Science/Media Lab student. We ended up creating a neat little educational app that teaches kids about different countries. You can see a brief demo in the following video (scroll ahead about two or three minutes to see it):
The winner of the demo contest was a neat restaurant review/location service. The people on the team seemed pretty serious about taking it to the next level, so we’ll see how that progresses over the spring.
There is also the question of the future of the wider Semantic Web/Linked Data world. For ten years people have been talking about the potential of the technology, and there have certainly been a slew of tools, projects, apps, and datasets made available . But there are also some limitations to the Semantic Web/Linked Data, as our study group found out when we were designing our mobile educational application. Performing live queries to the Web was a no-go, owing to the slow response time, and many of the datasets (including the widely used DBPedia graph) were inconsistent or had other flaws.
Yod, Mads and I went to TBL after our December 7 demo to discuss the “curation problem,” and he offered some interesting suggestions. For instance, in choosing the best photos from flickrwrapper for the “places” part of the geography app, we could add some geocoded logic to find the best light/positioning (300 meters west of the object at a certain time of the day) and employ some to-be-determined algorithm or AI to “make sure Aunt Jenny isn’t in the frame”. He also suggested leveraging Google to programmatically derive the semantic meaning of certain terms that have additional definitions beyond geography. But the idea of using existing Linked Data, standard queries and ontologies without extensive programmed/human curation is just a dream … at least for the time being.
Beyond the technical issues, there is also the lingering question of what sorts of killer apps might be derived from the Semantic Web. I think a key reason the 6.898 class exists is to help launch more Semantic Web-based startups, open-source tools, and new datasets, in the hope that one or more of these efforts will spark a truly innovative or ground-breaking app that moves LD and the Semantic Web into the mainstream in a highly visible way. I don’t know if our educational app or the others from the class will move beyond the prototype phase, but there has been a lot of serious talk in our class about using these and other ideas as the basis of new ventures once we finish. I’ve been thinking about how the Semantic Web could vastly improve many common data-driven genealogy or history applications (areas which I have written about for years — see “Google/Ancestry.com followup: Using outsourced Chinese labor to overcome OCR limits” and “Making a case for quantitative research in the study of modern Chinese history: The Xinhua News Agency and Chinese policy views of Vietnam, 1977–1993“), and over the next few months will do some additional research and reach out to people at MIT and elsewhere to evaluate the viability of such a venture (feel free to contact me at ian dot lamont -at- sloan dot mit dot edu if you want to discuss).
Lastly, I would like to offer my profuse thanks to K. Krasnow, Reed, Katie, Ian, Lalana and TBL for not only offering Linked Data Ventures this year, but also for making it a truly challenging and eye-opening experience. It really is one of the best classes I’ve had at MIT.
I received an email from a reader in Europe about my Sloan Fellows experience. One of his questions asked about the “soft” side of the program. He was under the impression that it’s all about homework exercises and “hard” topics, and it’s not difficult to see why — it’s practically all I talk about on this blog.
While the Sloan Fellows curriculum during the summer indeed contains the quantitative core (Microeconomics, Accounting, DMD, Finance), the Fall semester is more “soft” topics — leadership, strategy, managing innovation. Further, the program is not just about reading cases and completing exercises. There is a tremendous focus on the SF community as well as discovering your own strengths and weaknesses, and finding your career path. Everyone is going through this right now and I am reluctant to discuss my own journey on the blog, as it’s quite personal. Suffice it to say it is a central part of what we are doing and the coursework and exercises for these classes are very helpful in terms of letting us determine what we want to do after we finish. I am much clearer on this point now than I was in the summer — but more self-discovery remains to be done!
Additionally, some of my classmates are also concentrating their electives in leadership and management coursework. It’s where they see additional value in the Sloan Fellows program. My own preference tends more toward the technical and business experiences that I might otherwise never have a chance to learn about and may help me in my own career post-graduation. For this reason, I am taking two “action learning” classes — G-Lab (a group consulting project overseas) and Linked Data Ventures, a class that studies the Semantic Web and involves designing an actual software product and business plan. It may not be soft, but it really is a rich learning experience. It’s hard, but I love it.
Learning about a robot opera was the highlight of a second trip to the MIT Media Lab (see the other robot videos and interviews I shot on the first trip in the link at the bottom of the page). During my visit, I talked to some of the students and staff who are working on the production of “Death and the Powers,” including my old friend and bandmate Bob Hsiung (pictured). In my interview with Bob, you’ll get a glimpse of the storyline, as well as a close-up look technologies that powered the show, such as OLPCs running Linux and a joystick:
Here is a brief, professionally produced snippet from the opera itself:
One thing I’ve wanted to do since arriving at MIT at the beginning of the summer is to see the MIT Media Lab. I mean, I’m a media guy interested in geeky technologies, so it should be one of the first stops, right?
Well, I never got around to going into the Media Lab building, until today. I heard from two non-Sloan friends of mine — a current PhD candidate studying new ways to do gaming AIs, and a staff member who helped make the world’s first robot opera — that there was a Media Lab open house, in which most of the current projects (and a few past ones, too) would be on display with researchers on hand to explain them.
Firehose time. I spent a few hours visiting the offices and labs, drinking in projects that ranged from anthropomorphic robots (such as Lexi, inset) to a kids-oriented programming language to “tangible media.
Unfortunately, there was no way to see it all. But I was able to shoot a few highlights and conduct some interviews using my iPod touch. There are three videos:
MIT Media Lab Open House Video #1:Scratch + Legos = Cool (interview with researcher Sayamindu Dasgupta from 00:00 to 08:00, for the final two minutes I spend a few minutes looking at other Media Lab projects)
MIT Media Lab Open House Video #3: An army of robots? (I was unable to find the identity of the PhD student on the Media Lab/Personal Robotics Group website, but I am wondering if that’s because of the military sponsor?)
The students file into an ordinary, medium-sized classroom in building 4, near the center of campus. Outside, it’s a beautiful afternoon, a few days before the autumnal equinox. The room is brightly lit, thanks to the room’s tall windows. Muffled sounds of trumpets and horns can be heard nearby — there is an active music community at MIT, and some students take classes in music and the performing arts in building 4.
After everyone has settled into their seats, the professor gets up in front of the class. He is thin, has gray hair, and wears the standard faculty attire — khakis and a long-sleeved, light blue button-down shirt without a tie. Seeing him walking down the corridor, most would have no idea who he is, but to a few he’s given away by the large MacBook Pro tucked under one arm, covered with stickers, including one from the W3C — the World Wide Web Consortium.
The man is actually the director of the W3C and has played a remarkable role in the history of computing, and, indeed, the course of human history. He’s Tim Berners-Lee, the inventor of the World Wide Web — arguably the most important communications invention since Gutenberg used movable type to create the first printed bible.
Everyone reading this post has been touched by the Web in untold ways. For some people, including me, the Web has changed their lives. Now I am about to hear about another Internet technology that Berners-Lee hopes will make as big an impact: the Semantic Web.
Berners-Lee starts talking. He has an English accent, I’m guessing from somewhere in the Southeast. In front of this new audience he talks quickly, the thoughts sometimes tumbling out faster than he can speak them.
The first thing he writes on the chalkboard is http:// and a domain name — two of the fundamental elements of the World Wide Web. He adds an anchor tag.
“To a certain extent, when you go to the Semantic Web, you’ll have to leave that all behind,” he says.
As a muffled horn ensemble begins to warm up in the next room, he gives a primer on the Semantic Web, how it’s different than the World Wide Web, and some of the basic concepts that make it work — URIs (not URLs), XML, RDF (see my post from earlier in the week), triples, ontologies. These technologies can turn the World Wide Web into a linked, queryable database, and give relationships and meaning to otherwise unstructured data on the Web.
Berners-Lee likes to draw diagrams of the RDF graphs, and sometimes uses the circle/arrow notation that’s used to model Linked Data relationships (I am using “Semantic Web” and “Linked Data” interchangeably, per the usage employed by one of the other instructors later in the class). He shows the standard “Subject-Predicate-Object” (aka subject-verb-object) format used for triples, and describes how they might be used to describe certain relations:
Tim Berners-Lee (subject) has an assistant (predicate) Amy (object) .
Each one of the elements in these relationships will be links. For unique entities, like a person, there should be a document that describes all of the properties of that individual. As described above, Tim Berners-Lee’s is http://www.w3.org/People/Berners-Lee/card#i, and contains information such as his public home page, photographs, projects he’s participated in, and even the people he knows. Everything in the list is a link. For common verbs or relations, there are definitions already in existence that can also be referenced by a link, so new definitions need not be created from scratch. The idea of the Semantic Web is these machine-readable entities, relationships, and descriptions can be used for queries or specialized applications — for instance, “Who is Tim Berners-Lee’s current assistant?” or “What is TBL’s assistant’s email address” or “return a list of all of the email address of current MIT faculty assistants”. The beauty of the Semantic Web is the data is (ideally) readily available on the Web, instead of a proprietary database somewhere, and can be manipulated by software agents.
Linking Open Data cloud diagram, by Richard
Cyganiak and Anja Jentzsch. http://lod-cloud.net/
Students in the class ask questions. They vary in complexity. The audience is a mixed bunch of Computer Science graduate students, Sloan MBAs, and the odd LGO and Sloan Fellow. Some of the CS students already get this. To others with non-technical backgrounds, it’s completely new. I fall somewhere in-between — I can code HTML and am familiar with XML, but other Semantic Web technologies were unknown to me before I registered for the course.
An MBA asks: What happens when inconsistencies arise in linked data? For instance, what if Amy leaves her job, but only one of the reciprocal links above is adjusted to reflect that?
“This is the Web!” Berners-Lee declares. “It’s not consistent!”
This leads to a discussion of the value of having links in both directions from RDF graphs talking about the same thing, and then his “five-star” system of rating sites (or organizations?) on their ability to post data openly on the Web, especially machine-readable data.
I want to ask a lot of questions, but I hesitate. My background is online media, and the creator of the Web is standing in front of the class. It’s like being able to ask Gutenberg a question about his next generation of printing presses.
“Can you talk a little bit about trust?” I finally ask. I’m thinking about the reliability of the relationships identified in triples, and the potential for the linked data system to be abused, much as earlier Internet platforms such as email and the Web have been overrun by spam and malware.
Berners-Lee pauses, expressionless. A few people laugh. Have I really asked that stupid a question, or does everyone think I am talking about the broader concept of trust?
He interrupts me, and gestures toward the blackboard. “I can talk about it, but I am afraid it would take hours,” he says. The long and short of it: It’s a complex area, and the subject of much of the current Semantic Web research. “There’s a big social element,” he concludes, and leaves the discussion at that.
Today’s theme is screens: Laptop, projectors, calculators, and the Web. It’s Friday night, and where is this group of fellows? Not at home, and not at the Muddy. We’re at E51 going through our Data, Models and Decisions group assignment on Casterbridge Bank. Tomorrow, we’ll be working via Skype on the Supply Chain Management group assignment on Li & Fung.
On Tuesday, about 50 of us from the Sloan Fellows “A” section participated in a fascinating — and depressing — climate change simulation. The exercise was based on international negotiations to reverse several global warming trends, as well as a tool called C-ROADS (“Climate Rapid Overview and Decision-support Simulator”). After arriving in a large conference room, we split into groups representing various nations (I was in the India group) and attempted to negotiate a climate treaty that balanced our national interests with the global imperative to reverse the amount of carbon entering the atmosphere. The C-ROADS tool, which was developed by MIT and several partner organizations, has actually been used by governments and NGO to model policy actions and their impact on long-term greenhouse gas emissions and sea level changes (I’ve embedded a video below that explains how it works.)
If you paid attention to the horse trading and bickering that took place at Copenhagen last year, you are probably aware that it was impossible to get everyone on board. In the India group, we had several obstacles which prevented us from willingly signing on to aggressive reduction targets. Our briefing document instructed us to preserve economic growth at all costs, and we also had to be cognizant of the fact that India’s population will actually surpass China’s at some point mid-century, according to the projections we were given. We unsuccessfully attempted to tie the negotiations into demands for technology transfer and an end to European agricultural barriers. In the end, with a little arm-twisting and urging from the facilitator (Sloan professor John Sterman), we gave up these demands and signed on to more aggressive targets. In the C-ROADS simulation, this resulted in a reversal of several trends and a much smaller degree of global warming. The earth was saved!
Not so fast, Sterman said. He gave us an uncomfortable reality check about international climate negotiations: There is little chance all of these agreements would be ratified by the legislatures of the democratic countries that took part in the negotiations. This is because an important stakeholder has not been brought on board: The public. Many people are either not convinced of the need for drastic action, fear the negative economic impacts, or may resent the fact that some countries are receiving unequal treatment, even though they have contributed more to global warming in the past.
After the four-hour session ended, everyone in the room was amazed at what we had just participated in, but depressed about the implications. Is there nothing that can be done? A lot of fellows thought hard about this as we left the building. I actually had a few ideas that targeted the public awareness issue, and sent the following email to Sterman:
Your presentation yesterday afternoon was quite powerful and left a big impression on me and many of the other fellows.
I just wanted to add a few observations about getting through to the public, which you touched upon toward the end of your presentation. I come from an online news background and have some observations with how traditional mass media as well as new media tools can be used to reach the public.
The first is that, perversely, the visuals from the Deep Horizon disaster and the aftermath have probably done more to turn people to sustainability-related causes than any other event in the past ten years. The live video from the well location has been particularly disturbing for the many millions of people who have seen it. It illustrates everything that is wrong with offshore drilling and drives many people to ask the question: What are the alternatives?
The second observation is that visuals like these are often far more effective than data or prose in terms of bringing home the message about something like environmental change. As one of the other fellows told me as we left the Marriott, “instead of showing a simulation involving bar charts or maps, why not show images of people drowning?” It may seem like a strange idea, but creating a simulation of real areas being inundated or ruined by a breach would be very effective, even if no actual deaths were depicted. There are some graphics technologies related to video game design which could actually help do this — imagine a sim of New York City partially under water, or a massive breach overtaking water control features in Holland, Venice, Sacto, etc. This is the type of thing that has a large potential to either A) be picked up by major broadcast news outlets (especially in locales that are depicted) or B) “go viral” on YouTube and Facebook.
The third observation is that C-ROADS is an excellent tool for getting a global view of the problem, but there needs to be a simulation tool or tools for people to see how they will be personally affected by global warming. For instance, how about a simulation that lets people type in their address, and spits out an estimation of whether or not their home is under water, the estimated decline in value or increase in insurance costs, the impact on their community (refugee resettlement, difference in snow/rain/drought days) and even what sort of plants they can expect to see disappear from their garden, as well as new plants that will take their place?
Sterman’s response was interesting. While visuals have been helpful in informing the public (he mentioned scientist-turned filmmaker Randy Olsen) he also sent a paper that he authored for Science magazine that noted the following contradiction in public attitudes toward global warming:
“Majorities in the United States and other nations have heard of climate change and say they support action to address it, yet climate change ranks far behind the economy, war, and terrorism among people’s greatest concerns, and large majorities oppose policies that would cut greenhouse gas (GHG) emissions by raising fossil fuel prices.”
According to Sterman, another important constituency that needs to be handled with care are scientists themselves. He didn’t get too far into this issue, but it’s not hard to see why this is so: Different stakeholders respond to different messages and data points, and dramatizations intended for the general public will not work with people who are used to dealing with complex data and peer-reviewed research.
Video: John Sterman on C-ROADS Science and Confidence Building