In Fall of 1974 I entered MIT’s Graduate School. This was the next step in my plan for a lifetime of research in AI/machine learning. I was officially admitted through the Math Department, which served as my “host” department. In fact, I was enrolled in an interdisciplinary PhD program through DSRE (the Division for Study and Research in Education). The mechanics of this involved setting up an interdisciplinary committee to oversee my studies. Actual requirements were then negotiated with my committee. My committee included Seymour Papert (AI/ML, education, and developmental psychology), Susan Carey (cognitive psychology), and Dan Kleitman (mathematics, combinatorics). My plan was to work directly for my PhD (skipping a Masters degree).
This setup seemed ideal! In many ways it was like ESG at the graduate level. I had tremendous freedom to define and pursue my own interests. I was encouraged to explore multi-disciplinary interactions. DSRE itself was set up as an interdisciplinary entity at MIT – including among its luminaries: Seymour Papert (who was my graduate advisor), Ben Snyder (psychiatrist), Don Schon (urban studies & creativity), and Jeanne Bamberger (music, education). The unifying interest shared by all of these (and by me, too!) is in learning in all its forms, and how a deeper understanding of learning can inform how we design education (create rich learning environments!). I chose Seymour Papert as my advisor and mentor because we shared so many interests: understanding thinking and learning, math, computers, puzzles, and learning new skills. As part of the LOGO Learning Lab, Seymour encouraged everyone (both children and adults) to engage in novel and fun learning activities. For example, circus arts such as juggling, pole-balancing, and bongo-board balancing were widely shared and explored. We would not only learn these skills, we would analyze them and explore ways to teach them! The same was true of puzzle-solving. Seymour, like me, was a puzzle enthusiast and collector. We enjoyed sharing puzzles and discussing how we solved them. One of Seymour’s memorable quotes is “You can’t think about thinking without thinking about thinking about something!”. So basically anything at all that we spent any time thinking about became source material for thinking about how thinking worked. I loved this kind of self-reflection.
Machine Learning was perhaps my central academic interest in pursuing my graduate studies. It seemed clear to me that any true artificial intelligence must be able to learn, in order to adapt to new situations, as well as to extend it’s knowledge and skills. Much of AI at the time worked in the paradigm of building a system to demonstrate a competence that was considered part of intelligence. Examples included playing chess (Greenblatt), debugging programs(Sussman, et.al.), language understanding (Winograd). The focus seemed to be on direct programming of skills and knowledge. This approach, while certainly worthwhile for initial exploration, seemed too long and arduous a path to true machine intelligence, and if the resulting systems lacked learning capability, they would always be limited and brittle. One exception was the thesis work by Patrick Winston on machine concept learning (the classic “Arch Program”). This work was very influential on the direction of my research, and I ultimately added Winston as a co-Thesis Advisor (with Papert).
A Research Maverick
As I mentioned, pursuing machine learning ran counter to the dominant AI paradigm at the time. Many people (faculty and fellow grad students) argued that it was “too difficult”. Maybe it was difficult, but I was strongly convinced that it was the key to building AI. If we could just build a general learning system, then we could educate it – let it learn the skills and knowledge we wanted it to have! Of course, to make progress, it would be necessary to start with simple learning, and initially would not result in impressive performance. Because most AI research was funded by DARPA (Defense Advanced Research Projects Agency), there was quite a strong pressure for researchers to generate impressive results. I felt at the time (and still do in the present day!) that developing powerful AI systems required more of a basic research approach. My thoughts on this were likely influenced by my mathematical training — I approached things from an abstract direction, wanted to understand basic core principles (Ross’s dictum: Think deeply about simple things!). My ultimate intellectual goal was to develop a general and abstract theory of intelligence which would subsume both human and machine intelligence. It occurs to me that my commitment to a learning approach to AI is analogous to the technique of mathematical induction (prove assertion true for n=1, and also if assumed true for a given arbitrary n then prove true for n+1). The learning approach, admittedly challenging, seemed like a high-risk high-reward direction to pursue. If successful, AI researchers would no longer have to work arduously to encode specific skills and competencies – the system could simply learn them!
Another dominant aspect of the prevailing AI paradigm was working on individual pieces of intelligence, for example planning, language understanding, game-playing, problem-solving, robotics, and even concept-learning. These were all studied in relative isolation. I heard little or no discussion regarding the overall architecture of intelligent systems. The research approach was essentially bottom-up – build the pieces, and then figure out how to put them together. I recall being struck by the research approach in nuclear physics to developing controlled fusion. Yes, they focused on specific problems (attaining high temperatures, plasma containment, plasma density), but these sub-problems were studied the context of a set of alternative working models (e.g. Tokamak, and laser implosion). AI didn’t have any working models for how an artificial intelligent system would be organized! It struck me that there was tremendous heuristic value in having at least one working model — specifically to help focus research attention onto critical sub-problems, and at the same time help define the sub-problems by suggesting how the sub-problem solutions needed to interact with other (yet to be created) pieces. One of the worst examples (to my mind) of the piece-meal approach was the work in Knowledge Representation, where there were numerous KRL (Knowledge Representation Language) proposals, but little attention to or work on the ways in which these systems would be used. The CYC project also seems to favor this paradigm — let’s just encode lots of facts, and worry later about how to use them. In knowledge representation work, a deep philosophical truth was (imho) overlooked — representation is a process! Static symbols and data structures are not endowed with inherent meanings or representations. It is the processes that interpret and work with those structures that are the key element in representation! I sum this up in one of my favorite original slogans:
No representation without interpretation!
My observation is that many philosophers don’t fully appreciate this. I cringe when I hear discussions of meaning totally lacking any appreciation for all the processes (perception, interpretation) necessarily involved for meaning to exist at all. It is seductive to imagine that words, for example, have inherent meaning, but the meaning cannot reside in the words themselves. To have any real appreciation of meaning requires examining the social, cultural, perceptual, psychological, and learning processes that in effect attach meaning to particular words and symbols. But I’m straying from my topic (I plan to write at greater length on my philosophical thoughts at a future time). Back to research strategies — whenever I suggested a top-down research approach (building integrated working models) The typical reaction I received was that “it’s just too hard and we don’t know enough at this point”. I still think top-down is the “right way to proceed”, and I’m encouraged by the evolving sub-discipline of cognitive architectures (examples include: Langley’s ICARUS, and Laird and Rosenbloom’s SOAR architectures), but those weren’t developed until the 1980’s and later, and I think they still suffer a bit from “results pressure” from funding agencies [I wish there was more appreciation of and financial support for basic research].
One central personal learning goal for my graduate years was to develop my skills as a researcher. It seemed essential to learn how to define a research problem. So when it came time to select a thesis topic, I used this as an opportunity to begin learning this skill. I was not content to work on an “externally defined” problem — there were plenty of such problems that already had funding, and choosing one of those would have been the easy path. Instead I generated a series of proposals, and the initial ones were overly-ambitious, and naturally I didn’t get very far with them. One of my first ideas was to take Winograd’s SHRDLU (one of the great success of early AI, which demonstrated rudimentary language understanding), and work on a learning version of it. This had the potential for a more integrated approach – it would integrate several sensori-moter modalities (hand-eye in manipulating a simulated blocks world, and language generation and understanding). I even thought about having the system learn motor skills in the blocks world. The problem with this is that it was way too difficult, and worse, tried to solve too many problems at once — it lacked focus. It might serve well as a lifetime research project, but was not manageable as a thesis (I hoped to finish my PhD before I retired or died).
I came to realize that I suffered from a serious “grandiosity” bug — I wanted whatever I did to be big, amazing, and spectacular, maybe even revolutionizing the field 🙂 What I needed was to simplify and focus on smaller, more manageable problems. I think I also lacked the skill of working on large projects. My training in mathematics and computer science had mostly consisted of working on smaller problems and projects. The biggest project I had worked on was my Summer UROP research, but even that didn’t seem to scale up to a multi-year thesis project. The thesis topic I finally settled on was “Extensions of Winston’s ARCH Concept Learner”. I chose this because it was one of very few pieces of AI work that was centrally about learning, and also because I really liked the work itself (the way it used semantic nets to represent concepts, and the training paradigm of positive and negative examples).
A thesis is born
So I started out by writing a (friendly) critique (from an admirer’s point of view) of Winston’s concept learner. I recall coming up with something like 6 directions in which the work could be extended, and my initial proposal was to work individually on each of these, and collect the results into my final thesis. This had the heuristic advantage of dividing the “problem” into subproblems. To further simplify, I selected just 3 of these extensions to work on:
1. Learning disjunctive concepts (Winston’s only learned conjunctive concepts)
2. Learning relational concepts (Winston had relational primitives, like ON & TOUCHES, but didn’t learn new ones)
3. Learning macro concepts (allowing any learned concept to be used as a primitive to represent and learn more complex concepts) (Winston’s work included some of this already, but I wanted to generalize it to cover disjunctive and relational concepts as well).
It was natural to have Winston as my (co)Thesis Advisor for this work, and I thank him for his patience, attention, and advice!
Masters thesis as “Consolation Prize”
By the end of my 5th year of grad school, I had only completed the 1st item (with a little preliminary work on item 2 as well). It looked like another 2 or 3 years would be required for me to finish my planned thesis. I was feeling frustrated, since my progress was much slower than I expected of myself, and I was losing self-confidence. At the same time, my funding was running out. To continue, I would have needed to start taking out loans. I was very nervous about accumulating significant debt, and feared that even after a few more years I might still be unsuccessful at finishing.
So I decided to wrap up my work thus far as a Masters Thesis, collect my SM degree, and graduate and look for a job. My S.M. thesis was titled “Learning Disjunctive Concepts from Examples” and I think it was very solid piece of work. I collected my SM degree in August 1979, and withdrew from my graduate program. I had the intent of returning at some point to complete my PhD, but alas, that was not to be.
Non-academic threads of my life during the graduate years
During most of my graduate years I served as one of 3 graduate resident tutors in Bexley Hall (the undergraduate dorm I lived in when I was an undergrad myself). I very much enjoyed both the social and mentoring aspects of this position, and have developed a number of lifelong friendships with students from those Bexley years!
I did not own a car during grad school, and don’t know where I could have parked it if I could have afforded one. I did, however, purchase a motorcycle (a used Honda 350) which I learned to ride, and parked in the Bexley courtyard. I had many interesting adventures riding to PA (to visit family) and New Hampshire, and also to Danbury CT to visit my first serious girlfriend when she moved back there. I remember reading Zen and the Art of Motorcycle Maintenance and trying to apply some of its ideas to working on my bike.
I also purchased a used electric piano, which I enjoyed playing for many years. Although I had written 1 or 2 songs in high school, I didn’t try more serious song-writing until I had my own piano. I think I had fantasies of being in a rock band, and even auditioned at one point, but was turned down because the group felt my grad studies would prevent a full commitment to the band – I’m sure they were right. I still, to the present day, enjoy playing keyboards and writing songs.
My passion for puzzles continued unabated. I added to my puzzle collection – my favorite puzzle store was (and still is) Games People Play located in Harvard Square. I recall that it was around November 1979 when my wife got me perhaps the best puzzle gift I’ve ever received. She went into Games People Play by herself and asked for “the hardest puzzle you have”! Carol, the owner showed her a Rubik’s Cube, saying “we just got these in from Hungary, and it’s so hard he won’t be able to solve it”. Of course, I couldn’t resist that kind of challenge, and after nearly a week of intensive work (I’d say roughly 15-20 hours over 5 days), I finally had developed my own solution method. This was before Rubik’s Cube hit mega-popularity, and if anyone had suggested I should write a book on “How to Solve Rubik’s Cube” I would have laughed out loud at them! This puzzle was so hard, that it would only appeal to a very small number of hard-core puzzle solvers (so I figured), and they are not the type to want to hear about anyone else’s solution (at least not until they had solved it themselves). So I failed to cash in on the boom in Rubik’s Cube solution books — there were rough 4 or 5, I think, all simultaneously on the NYTimes top-10 non-fiction best seller lists for a time (1981?). Just goes to show I’m a terrible market analyst!
I also had a number of relationships with women during these years, from which I learned a lot, and have mostly very positive memories! In June of 1977 I met the woman I was to later marry. We had a 2-year relationship which led to marriage in June 1979. There was a lot going on in 1979 — In addition to getting married, I was writing up my SM thesis, applying for jobs, accepting my first job, and moving to Pittsburgh — I’ll tell you more in the next installments on marriage and career.
… to be continued