I've been thinking about ways to get around having a large commonsense database in order to parse text (I'm working on producing a narrow A.I.)... and it's slow going! Anyways, it occurred to me just how much of our commonsense understanding of the world results from the physicality of how objects interact; and that has got me wondering whether there might be a very rudimentary way to represent knowledge about physical objects (as opposed to mental objects), in order to simulate that physicality just enough to start making useful deductions (to use to parse text)... but not so much that one has to basically simulate much of the whole earth (including all the people, animals, water, etc.) -- think Minecraft or Second Life, and then imagine it being even more rudimentary than that.
Stepping back from that for the moment, consider this: imagine you have a search engine that when you ask it things, it sends out physical robots to find answers to your questions. For example, if you asked it "how tall is the Empire State Building?", rather than looking up the data on a database (which would require doing lots of parsing and language understanding), it sends a quadcopter out to physically measure the Empire State Building, and then report the answer; if you asked it whether it was possible to see Mexico from Chicago, it would again send out a quadcopter to test it; if you asked it whether it was possible to walk from New York to Paris, France, it would send out a walking robot from New York, to see whether it could walk to Paris; and if you were to ask it whether a human would likely survive if dropped from a skyscraper, well...
Now, of course, we can answer these questions just using our common sense; but it should be clear that we could also answer them by going out and physically testing them (in principle, at least). Since computers don't have common sense, and since it is difficult to teach them all the rules they need to understand our language, it's a good idea to look for an easier -- if computationally more intensive -- solution. I think that, fortunately, even an extremely crude spatial-temporal model of the world should suffice in many circumstances -- a model where people, buildings, animals, plants, etc. etc. are built crudely out of cylinders, ellipsoids, spheres, amorphous blobs, planes, etc., each with certain basic properties like color, flexibility, transparency/opaqueness, softness/hardness, etc.
For example, if you were to tell the computer that Shakespeare wrote such and so play in such and so year, perhaps what it could do with that information is to create a very, very crude model of England in that year, and place Shakespeare (sphere for a head; box for a body; cylinders for ams and legs and bones; etc.) at Stratford-upon-Avon in a box house, with a copy of his play at a box desk, and with his hand connected spatially and temporally to it. There is SO much information implied by that simple physical representation that I couldn't even list it all. For example, if you were to ask the computer whether people in England of the time could read the play on the internet, all the computer would have to do is to look to see whether this model of the world has an internet, and/or whether the citizens of England had computers in their homes at the time.
I know it sounds strange... and kludgey... but the only other probable alternative is to wait until several hundred million (or billion? or 10 billion?) commonsense rules are coded up into logical formulas by hand (or maybe there's a way to find those rules without doing all the work; I think so for some of them... but not all); don't hold your breath on that coming anytime soon!
Actually, looked at in another way it isn't so kludgey after all: As I said, I think it likely that physical models would be a far more efficient representation of many kinds of knowledge than words -- "a picture is worth a 1000 words", as the saying goes.