Mining the blogosphere
September 10, 2012

Parsing dependency relations for the sentence: “The movie was genuinely funny.” (Credit: Shamima Mithun)
Can a computer “read” an online blog and understand it? Several Concordia University computer scientists believe so, and are helping get closer to that goal.
Leila Kosseim, project lead and associate professor in Concordia’s Computational Linguistics Laboratory, and recently graduated doctoral student Shamima Mithun have developed a natural-language-processing system called BlogSum that allows an organization to pose a question and then find out how a large number of people talking online would respond.
The system is capable of gauging things like consumer preferences and voter intentions by sorting through websites, examining real-life self-expression and conversation, and producing summaries that focus exclusively on the original question.
Making sense of blog text
“Huge quantities of electronic texts have become easily available on the Internet, but people can be overwhelmed, and they need help to find the real content hiding in the mass of information,” explains (CLaC lab).
Analyzing informally written language poses unique challenges compared to analyzing, for example, a news article. Blogs, forums, and the like contain opinions, emotions and speculations, not to mention spelling errors and poor grammar. So a summarization tool must address question irrelevance (sentences that are not relevant to the main question) and discourse incoherence, (sentences in which the intent of the writer is unclear).
The researchers developed and tested BlogSum by examining a set of blogs and review sites. BlogSum used “discourse relations” to crunch the data — ways of filtering and ordering sentences into coherent summaries. BlogSum was measured against prior computational rankings and achieved mostly superior results.
In addition, it was evaluated by actual human subjects, who also found it to be superior. Summaries produced by BlogSum reduced question irrelevance and discourse incoherence, successfully distilling large amounts of text into highly readable summaries.
Detailed Architecture of BlogSum (credit: Shamima Mithun)

Comments (9)
by Robot
Puny humans. I’ve been analyzing you for years. Soon it’s time for rock’n'roll. Hold on to your hats.
by Rainer von Ammon
actually there are even software systems which automatically generate articles or papers for scientific and reviewed! conferences, no joke.
automatic abstracting from free or unformatted text, information or fact retrieval etc. were already started in the sixties or seventies, today we have nearly unlimited computational power
by Ian Clarke
Is the editor of kurzweilai real? I have my suspicions. :D
by Editor
Ian, you nailed me. I’m a bot and I passed your silly Turing test. Deal with it. :)
by Bri
GORTI!! Is that you? Everybody stay calm or were toast!!!!
by Ian Clarke
In the future, I wonder if a robot will be asking the same question about us?
We have so much information out there that we need to find ways of distilling it. Finding answers still sometimes takes a lot of hunting around. What I’d like to be able to do is ask a question and have answers spat back at me in order of confidence, with the varying levels of confidence stated for each one. Even then, I’ll continue to dream of the day when a straight forward question receives a straight forward reply.
by Ian Clarke
Meant as reply to Dan Robinson.
by Dan Robinson
So how soon will a robot be able to post an intelligent comment?
by Bri
It’s my view that when we dream it’s the brain processing the input of the day. The article about a self aware robot said that they were achieving this by making the robot figure the relationships between objects. I think that’s what is happening during sleep. We may be unaware of it but the brain is very busy. Researchers have noticed intense brain activity. If you read Carl Jungs work he talks about archetypes. My belief is that the new information from the day is mapped out onto this structure. It is the interconnectivity that makes human intelligence and the intelligence of all species. If that is so and researchers apply the relationships principal further. It won’t be long before robots can comprehend. So the short answer is we are very close and it might happen in just a few years. As noted in other posts there is more than enough computational power. Once that relationship loop starts they will become amazingly smart in an incredibly short time.