How to Write a Bilingual Dictionary
We're going deep on dictionary lore this week! Listen in for an interview with editor Peter Sokolowski on how we wrote our French and Spanish bilingual dictionaries.
Download the episode here.
Emily Brewster: Coming up on Word Matters: bilingual dictionaries. I'm Emily Brewster and Word Matters is produced by Merriam-Webster in collaboration with New England Public Media. On each episode, Merriam-Webster editors explore some aspect of the English language from the dictionary's vantage point. While the dictionary at Merriam-Webster.com and the bullseye-bedecked hardcover Collegiate Dictionary it's based on are the best-known of Merriam-Webster's dictionaries, the company, in fact, publishes a significant number of other dictionaries. Among them bilingual dictionaries. One of the co-hosts of Word Matters was hired expressly to work on one of these. Up next, Peter Sokolowski answers all my questions about Merriam-Webster's bilingual lexicography. Merriam-Webster gets the second part of its name, of course, from Noah Webster, the father of American lexicography. And for many, many years, Merriam-Webster only produced dictionaries that would have been familiar to Noah Webster. They were monolingual dictionaries of the English language as it's used primarily in the United States. But at some point Merriam-Webster started making lots of other kinds of dictionaries too, including bilingual dictionaries. Now, Peter, you came to the company as a bilingual lexicographer, is that right?
Peter Sokolowski: Yeah, that was the first project that I was working on. It was the reason I was hired. They hired a little staff to make bilingual dictionaries right in the office and to make them from scratch, which I didn't realize was so unusual. Basically, nobody had been around for a dictionary that wasn't based on something that had already existed. And so these were really made from scratch. We had to kind of figure out a new way to work.
Emily Brewster: Because traditionally what we do is we build on the works of existing dictionaries. So the Collegiate Dictionary, for example, which is what the Merriam-Webster.com dictionary is based on, that dictionary was really based on Noah Webster's dictionaries. I mean, there's a very clear lineage between them. So the bilingual dictionaries were completely new creatures. And I think it's important to note that you all weren't even allowed on the editorial floor, in the Springfield office, you had your own little enclave that was apart from the traditional lexicography that was going on.
Peter Sokolowski: That's right. We had our office in the garden level, also known as the basement. And the reason for that was it was determined by the powers that be before we were even hired that these new bilingual editors would probably have to talk to each other.
Emily Brewster: Because as I think we have mentioned on this podcast before, traditionally, nobody talks at Merriam-Webster if you are on the second floor of the building, which is where most of the lexicography goes on, that is the editorial floor. And there is no talking up there it's a little bit less strict now than it used to be, but you all were expected to be having actual conversations.
Peter Sokolowski: Because they did assume quite correctly that we'd had to figure a lot of stuff out. And we were all new. We were all young with the exception of a real veteran Eileen Harrity, who was a cross-reference editor and a real veteran editor at Merriam-Webster, worked on the Collegiate Dictionary for years, but also she had worked as a librarian and lived in Latin America and she spoke very good Spanish and she could read French quite well. So her knowledge of the languages was secondary to her knowledge of the sort of apparatus of dictionary making. She knew how to organize a Merriam-Webster dictionary. And it was our job, the sort of language specialists to make sure that the language part was correct because she was really the overall editor making sure that the style of the dictionaries were correct. And that was an interesting way to work.
Emily Brewster: Did you make two dictionaries at the same time?
Peter Sokolowski: That's right. There were two. There was the Spanish and the French, and it was determined that they would be the same size and that the English side of these bilingual bi-directional dictionaries, which means there's an English French, and then a French English side. And the same for Spanish, that the English list of words would be the same list of words. And so we were starting from kind of the same place.
Emily Brewster: Where did you get that list? What was that list?
Peter Sokolowski: That's kind of a neat thing. The executives, John Morse, who is now retired, but he was the executive editor at the time. He did what we often do, which is kind of the practical approach. He took the smallest dictionary that Merriam-Webster publishes, which is the little red pocket dictionary. And it has about 40,000 headwords. And he said, well, we want a book with 80,000 words. So if the 40,000 English words are translated into French, that's the English side. Also, we don't have to reinvent the wheel. We've already made a good little dictionary of English and let's use that as the basis for a good little Spanish and French bilingual dictionary. It was a very clever thing to do.
Emily Brewster: The pocket dictionary provides the essential vocabulary that a person is likely to need in going out and about their business. So it's a great place to start a bilingual dictionary also.
Peter Sokolowski: Because these were not intended to be scholarly, translators dictionaries. These were intended to be very utilitarian. Dictionaries that you might be able to carry for learning in the first few years of language class or taking on vacation, for example. There's a little story behind how this happened, which is to say that it relates to the old way of doing business. In the days before the internet, there was a big shift in book publishing in the way that books were marketed and sold. And that was the big chains, Barnes & Noble and Borders. And there was this ritual that every publisher had, which was going to Michigan and visiting the sales executives at Borders Books. And you had a big presentation and you got 30 minutes and every year you had your new titles and new ideas for marketing and the vice president of sales and the executive editor at Merriam-Webster went to Borders.
And they said, well, we've got a new addition to the Collegiate coming up in a few years and here's the Geographical and Legal dictionary and all this stuff. And the executive at Borders said, "Great, we love the Collegiate. We sell the Collegiate in large numbers. We sell the red mass market paperback dictionary in large numbers, but here's the thing: I'm meeting with you from Merriam-Webster for 30 minutes and you have 13 titles that we sell. The next meeting I have is with Simon & Schuster, and we have 1300 titles from that publisher that we sell. In other words, we love our relationship with you, but we want you to have more books, more titles on our bookshelves. Can you come up with new titles?" And that's how the bilingual dictionaries actually began, because they came back to the office and said, "What haven't we done?" And it was clear that we hadn't done bilingual dictionaries. And the next neat little thing is the first idea was, Oh, we'll just license some and maybe Americanize a British one or something like that. And it was determined that that was the way to go. And then the person who was put in charge of researching the partnership discovered that another American publisher had licensed the same bilingual dictionaries from the same British publisher. And when he came back with that information, we just said that's not the way we want to do things at Merriam-Webster. And that's when it was decided to make them from scratch.
Emily Brewster: So it strikes me that in writing these dictionaries, you did not have the thing that all the other dictionaries that Webster produced were built on. And that is our citation files. The Merriam-Webster citation files are slips of paper. Now they are also digitized, but they were, at that time, all slips of paper with bits of context on them and a word identified, they are the evidence. They are the backing for definitions. They are what you assessed to determine what a word means and how well established it is. Now, you all did not have a French citation database, right? You did not have a citation file.
Peter Sokolowski: You're exactly right. That's the biggest weakness. I sometimes think of our citation files as the raw form of the dictionary. This is the evidence that we've collected. I believe the Merriam-Webster citation files represent the largest body of collected evidence of any language in the world on paper. It's like 16 million three-by-five cards. It's an enormous reference. And exactly, we had no such reference for Spanish or French now, but this was in the early 90s. This was 1994, '95. This was the very beginning of the commercial use of electronic corpora or corpus research. And in many ways, the citation files formed a kind of corpus of themselves. It occurs to me, we should probably identify what a corpus is. Kind of explain what that is. It's a selected group of texts that are searchable. In a way we all understand this now. This was long before Google, of course, but Google itself is a kind of messy corpus, but a linguistic corpus, one for this kind of purpose, is what I would call a clean one, which means there's no code or advertising.
There's no repetition. In other words, every article is only represented once so that you don't get false counts. You have to be able to kind of take a census of the language many times per million words of French is this word used. So we needed to have this new tool to kind of very quickly catch up to the old technology that had given us a good solid English dictionary in order to make good, solid French and Spanish dictionary. So we made these corpora by assembling them in house. We used data from the linguistic data consortium at the University of Pennsylvania, Mark Lieberman, a great linguist there who runs the language log, among other things, helped us. And what he had was digital versions of newspapers, which is the perfect kind of language use for this kind of dictionary. It's edited prose, professionally edited, professionally written, pretty serious, and it hits the target. The words that are used the most frequently, not the most slangy terms, not the most new, maybe not covering comedy and sports as much, but really covering the big middle of the used words of a language of the standard words of a standard language like this. And so we took these scanned newspapers and one at a time we would strip the codes and the tags from them and add them one at a time, to a big file that became the French corpus and the Spanish corpus. And the one thing we did was also populate them very deliberately in a case of the Spanish dictionaries, uniquely newspapers from the New World, because we recognized that most bilingual dictionaries in the market then (as today) were European or British. And we wanted to have, especially with Latin America being such a huge market and a huge population of Spanish speakers, that Mexican and Colombian and Costa Rica and Puerto Rican and Cuban newspapers were all added to that database exclusively. So it was truly a New World dictionary, American English and Western hemisphere Spanish. And for the French, a similar thing was done. Most of the evidence, maybe two thirds was from French-speaking Canada. But then we did add a lot from France and Switzerland and Belgium and from French-speaking Africa as well. So the French one was a little less situated in North America. And there's one anecdote I can tell you about using a corpus and how it works. So one of the things we did to use the corpus for research was to index the words, to find how often each word was used. And there was a word that I'd never encountered before in French. And it was clear that the word meant email and email was pretty new to me in the mid-nineties. And when I had lived in France, I was unfamiliar with it. So I wouldn't know the word for email and also in France as with many places, a lot of people just used kind of the English words for these things, but we found in the Canadian newspapers, a word for email, which was courriel, which is kind of a portmanteau, a smashup of two words, courrier, meaning mail and electronic. So it kind of made sense. Email was courriel and probably because the word email in French, the same letters spelled l’émail, which is the word for enamel, like the enamel on your teeth. So it was an uncomfortable sort of homograph for many people. And so we did what we're supposed to do. We added the word to the dictionary with a little label that said Canadian, here's the word for email. And then a number of years later, five or six or seven years later, there was a little note in the wall street journal that said, Oh, the Académie Française, the French Academy of Paris has announced the official word for email in French will be, and it was the same word, courriel. He and I got a note from our president, John Morse, just sent me an email, said, "Hey, write a pink on this." Which means, add this to the notes for the French dictionary. So we can add this term to the dictionary. And I went back and I said, "Hey, it's already in the dictionary. We were the first dictionary in the French language to have the word email in it because we were looking at Canadian sources and that the French-speaking Canadians had adapted and adopted that word before the Europeans did." So I'm kind of proud of that too, that I sometimes say lexicography is all practice and no theory, which is sort of not really true, but the job, the doing of the job is very much kind of a routine of looking at evidence and reflecting that evidence. And we found this word that we'd never heard of, but we added it to the dictionary because the evidence was there and we were right. What we did later do was we took the label _Canadian off of that term because it's used in Europe now too.
Emily Brewster: Once the French Academy adopted it officially, then it is no longer just Canadian. And again, lexicography must be based on evidence or it's a fabrication. We'll have more on the making of bilingual dictionaries right after the break. You're listening to Word Matters from Merriam-Webster and New England Public Media.
Neil Serven: I'm Neil Serven. Do you have a question about the origin history or meaning of a word, email us at firstname.lastname@example.org
Peter Sokolowski: I'm Peter Sokolowski. Join me every day for the word of the day, a brief look at the history and definition of one word available at Merriam-Webster.com or wherever you get your podcasts. And for more podcasts from New England Public Media visit the NEPM podcast hub at NEPM.org.
Emily Brewster: And how long did you work on these dictionaries?
Peter Sokolowski: Well, we were told that it would take two years to produce two dictionaries. This little team. We were about six or seven of us, a couple who could function in both languages. And then we kind of had principal roles. There was someone who had a home language of French or Spanish and someone who had a home language of English, but who had a bilingual knowledge of French or Spanish. And I was that person for the French side. For example, I grew up speaking English and my mother has French Canadian roots. I heard a lot of French as a child, but then I went off to go to college in France. And so I had gained a kind of academic fluency in French, but I was by no means what we might call a native speaker who has French as a home language. But we had a wonderful colleague from Quebec.
We were told it would take two years. Of course, it's axiomatic in dictionary publishing that projects take longer than you expect. And the whole project took five or six years, five years, I think for the Spanish and six for the French all told, but really an amazing amount of learning. I didn't even know this was a job. I didn't know people wrote dictionaries. And initially for the first long time, the job was sort of computer coding and taking off tags and making a corpus. I'd never heard of any of this. But then when we got to work, one of the immediate tasks was to make it look like a Merriam-Webster dictionary, because we had never had a bilingual apparatus, what we call it a style file. And so we had to really think hard and long about things, and we worked hard to make them kind of feel familiar to people who would know Merriam-Webster dictionary. So if you look at our bilinguals, you'll see, for example, we use the boldface colon very much in the same way that we do in the monolingual dictionaries. One of the problems that we kept encountering, it took us a long time to solve was how to show the example sentences in a bilingual dictionary, because in our dictionaries, it sounds simple telling it now, but there are some example sentences that are put in angle brackets that just sort of show a very brief example of the word and use. And some example sentences are actually put in bold face because they're sort of fixed phrases. And we struggled with this. We would say, we try to kind of draw the line and try to understand the distinction between the two. And finally it was E. Ward Gilman, the director of defining at the time, the last member of the staff of Webster's Third Unabridged Dictionary, who was still on staff at the time, he helped us out. He thought about it for a couple of days and he came down to the basement and he said, "Well, if there is an idiom in French or Spanish or English, that can never be changed, that it always has the same exact words in the same exact order, then that will appear in bold face what we call a bold note. And if you can change any of these words or the tense of them, or whether they're plural or singular, then you will put it in angle brackets in light face as a normal example sentence or verbal illustration, as we say." So that kind of thing we had to work to make it feel and function like a Merriam-Webster dictionary. And we finally got there and I'm very proud of that.
Emily Brewster: They're great dictionaries. They're both still in print.
Peter Sokolowski: They're really successful dictionaries. I'm very proud of this too. And we should mention another of our colleagues. I think the only member of the staff who joined the bilingual department was Karen Wilkinson, who had, I believe a master's in Spanish and who had already worked on a collegiate dictionary and knew all kinds of stuff to help us. And in fact, she has recently completed the second edition of the Spanish dictionary and these dictionaries sell really well in print. They exist mostly in print. It has to be said, but the Latin American market for books is a very strong one. And in fact, we've seen our book sales in recent years even go up in Latin America for the Spanish dictionary. And also we made a second version, which kind of makes sense. The bilingual dictionary has two sections, the English to Spanish or English to French side, and then the French to English side or the Spanish to English side.
But they also include a kind of grammar of the foreign language written in English. So there's a grammar of French written in English in the beginning of the French dictionary, but in the Spanish dictionary, we decided to flip it and make an edition for Latin America or for native speakers of Spanish in the United States, in which the preface is a grammar of English. That's written in Spanish and our dear friend, Sean [inaudible 00:17:47] who grew partly in Mexico is a native speaker of Spanish. He did that work of translating the prefatory material. And of course he was one of the principle definers on the dictionary. We also made a little tiny tourist version, pocket Spanish and pocket French, which sell well. And from that, we did something special. We made a special USAID edition of the French dictionary that was published for the governmental agency with no barcode, no price tag on it.
And we printed 50,000, 70,000 of these to be given to people in French-speaking Africa. This was distributed by the US State Department and USAID. And then a number of years later, a group of teachers brought to the United States from the US state department who was sometimes give tours to teachers of English, came by the offices at Merriam-Webster. Emily, you know this, that we sometimes host these teachers and give them little workshops in the office and a little tour of the dictionary. And there was this man who was from Côte d'Ivoire, from the Ivory Coast, and in his suit pocket, he had a copy of the USAID Merriam-Webster dictionary. And he took it out of his pocket. And he said, "This is my dictionary." And the handlers from the US State Department said, "Oh, did you know you were coming to visit Merriam-Webster?" He said, "No, this is my dictionary. This was given to me a number of years ago and I carry it with me everywhere." And I just thought here is this English teacher from Ivory Coast who had traveled all this time and all this distance to come to, not only the building in which the book was made, but to meet me of all people. He had no idea I was just giving the tour, but I just said, that's really special to me to see that book come back from Africa and to see that it was put to good use because he was an English teacher. So he really used his dictionary. It's one of those little things that happen in life that kind of complete the circle.
Emily Brewster: Let us know what you think about Word Matters. Review us on Apple Podcasts or email us at email@example.com. You can also visit us at NEPM.org and for the word of the day and all your general dictionary needs, visit Merriam-Webster.com. Our theme music is by Tobias Voigt. Artwork by Annie Jacobson. Word Matters is produced by Adam Maid and John Voci. For Neil Serven, Ammon Shea, and Peter Sokolowski, I'm Emily Brewster. Word Matters is produced by Merriam-Webster in collaboration with New England Public Media.