Welcome to Semantic Domains

Semantic Domains

This website contains a list of nearly 1800 semantic domains. They are organized in a hierarchy under nine major headings so that similar domains can be found together. The list of domains was developed by Ron Moe, a linguist working with SIL International, as a tool for collecting the words of a language and developing a dictionary. But researchers have been finding other uses for the list.

Each domain includes

  • a number for sorting purposes
  • a domain label (consisting of a word or short phrase that captures the basic idea of the domain)
  • a short description of the domain
  • a series of questions designed to help people think of the words that belong to the domain
  • a short list of English words under each question that belong to the domain.

You may freely use this list of domains and adapt it for your needs according to the Creative Commons Attribution-ShareAlike license. The list is intended to be used for any language in the world, but is currently based primarily on English. Research has been done on the domains of other languages and the results of the research has influenced this list. It is hoped that further research will enable this list to become more global.

What is a semantic domain?

A semantic domain is an area of meaning and the words used to talk about it. A domain is often given a name consisting of a common word in the domain. For instance English has a domain ‘Rain’, which includes words such as rain, drizzle, downpour, raindrop, puddle. We use these words to talk about the rain.

The words within a domain are related to each other by lexical relations. Linguists use the term lexical relations to refer to various kinds of relationships that exist between words. There are two basic types of lexical relations. The first type are known as collocates—words that are frequently used together in a sentence. For instance we often use the words bird and fly in the same sentence. Bird andfly are related by the lexical relation agent:typical action. The second type are known as paradigm forms and include relations such assynonyms, antonyms, and the generic-specific relation. The words big and large are close synonyms. Kind and unkind are antonyms.Bird is a generic term that includes the more specific term chicken.

As a child learns to speak, he forms lexical relations in his mind. We need these lexical relations in order to speak correctly. Each of us has a mental dictionary which is organized into a giant network of lexical relations. Within the network are important clusters, like cities and towns linked by roads. So a semantic domain is a cluster of words in the mental network. The words within the domain are linked by lexical relations and the domains themselves are linked by lexical relations.

How are semantic domains used?

Semantic domains are very useful in investigating the relationships among words in a speaker’s mind. They can be used to efficiently collect the words of a language. You can pick a word, any word, in your language and start thinking of other words that are similar to it. With a large list of semantic domains you can systematically collect most of the words of your language. To learn more about collecting words, see the RapidWords.net website. A number of semantically classified dictionaries have been published using the list of domains. Semantic domains are also useful for investigating the meanings of words. All the words in a domain share aspects of their meaning and usage. So you can efficiently define the words in a domain by comparing and contrasting them. Semantic domains are also used to facilitate internet searches.

The word collection method has been called the Dictionary Development Process (DDP) and, more recently, Rapid Word Collection (RWC). The list of semantic domains and instructions for using the method are available for download from the RapidWords.net. The list has also been incorporated into the FieldWorks program and the WeSay program.

How was this list of domains developed?

Ron Moe, a linguist working in Nairobi, Kenya with SIL International, began working on the list in September, 2001. He began with a list of domains developed by Roger Van Otterloo, an SIL linguist, for Kifuliiru, a Bantu language spoken in the Democratic Republic of Congo. Ron Moe led two workshops in which speakers of two Bantu languages, Gikuyu of Kenya and Lugwere of Uganda, developed a list of semantic domains for their language. These three lists were combined. Moe began collecting lists of domains from other languages, looking for ideas for other domains and investigating how each list was organized. Moe also began classifying the words of English in order to ensure that all the words of English could be classified somewhere in the list. As of this date (September, 2012) over 60,000 English words and idioms have been classified, including the 20,000 most frequent words in the Corpus of Contemporary American English.

Annotated Bibliography

Grouped by language family

Austronesian

Mansaka is an Austronesian language of the Philippines. It has 100 domains.

Tuwali Ifugao is an Austronesian language of the Philippines. It has 150 semantic domains.

Kalinga is an Austronesian language of the Philippines. It has 150 domains

Afro-Asiatic

Austro-Asiatic

Premsrirat S. Thesaurus and Dictionary Series of Khmu dialects in Southeast Asia. Mon-Khmer Studies-Mahidol University Special Publications. 2002.

This is a comparative work for a branch of Austro-Asiatic in southeast Asia. It has 200 domains.

Sedang is an Austro-Asiatic language of Vietnam. The dictionary contains 280 semantic domains in boxes scattered throughout the alphabetized entries. Not all the words are classified.

African

As its title suggests, the list is designed to elicit standardized word lists for comparative purposes within the African context. It has around 2,000 words in 220 domains.

This list was designed for languages in west Africa and contains 240 domains

Bantu

This list of semantic domains was produced in a workshop by Lugwere speakers. The workshop was the second of two workshops designed to develop an emic list of domains for a Bantu language. The first workshop was for the Gikuyu language of Kenya. The second workshop was for Lugwere of Uganda. The Lugwere speakers were first asked to group 1,000 Lugwere words into piles of semantically similar words. They classified the 1,000 words into approximately 100 domains. They were then asked to add other words and create additional domains as needed. They added 2,000 words to the original 1,000 and ended with 300 domains organized in a hierarchy.

This list was produced by a group of Gikuyu (Bantu, Kenya) speakers who were asked to classify 1,000 words. The workshop ws the first of two workshops designed to develop an emic list of domains for a Bantu language. The second workhsop was for Lugwere of Uganda. The words were selected from VanOtterloo’s Kifuliiru list and chosen to cover the entire semantic range of the vocabulary of a language. The words were translated into Gikuyu so that the participants would be working with Gikuyu words. The participants were asked to put the words into piles of semantically similar words. They classified the 1,000 words into approximately 100 domains arranged in a hierarchy.

Roger Van Otterloo developed a list of domains for his Kifuliiru (Bantu, Congo) dictionary based on his considerable work on the language. It is one of the largest and best developed lists of domains for a non-Indo-European language that I am aware of. It was edited by Alison Nicolle and (with Roger's permission) formed the original basis for the DDP list of domains. It has around 900 domains.

Yukawa Y. A Classified Vocabulary of the Luba Language. Institute for the Study of Languages and Cultures of Asia and Africa; 1992.
Yukawa Y. A classified vocabulary of the Nilamba language. Institute for the Study of Languages and Cultures of Asia and Africa; 1989.

Developed for a project on the linguistic and cultural history of Tanzania, the list contains 1563 words in 13 domains.

Yukawa Y. A tentative questionnaire for the words of Bantu languages. Journal of Asian and African Studies. 1979;17:139-212.

Yukawa used his list in two published dictionaries (Yukawa 1989 and Yukawa 1992). His wordlist contains 82 domains.

English

This is also an excellent classified dictionary. It is designed as a tool for beginning learners of English, so is much simpler than the Longman Language Activator. It has fewer words in each domain and tends to lump several domains together under a general heading. But it also includes the concrete domains which are lacking in the Longman work. Longman’s domains tend to be more abstract, while the Oxford domains tend to be based more on scenarios. However there is a great deal of overlap and consistency between the two. A high proportion of the domains and vocabulary are the same. It has 630 domains.

This is by far the largest and best developed classified dictionary of English that I am aware of. Unfortunately it does not include domains for concrete objects. I generally agree with its classification of English words, and have found its groupings and definitions very helpful. I have relied on it more than any other work. I highly recommend buying a copy of this book as an excellent example of a classified dictionary. It is designed as a tool for advanced learners of English. It has 1052 domains.

Although there are numerous versions of Roget’s on the market, only some of them retain the organization contained in this version. The greatest drawback to Roget’s is that most entries contain more than one semantic concept. This is because most of the head words are polysemous. So one must do some analysis to separate the words in an entry into cohesive sets. Many of the head words are highly abstract and all of them are nouns, with the result that some are not even good English words. It also lacks definitions. However it is still a good source for lists of words classified under important concepts. The organization of the entries is highly abstract and often non-intuitive. It has 984 domains.

The FrameNet project is an attempt to identify and describe English semantic frames. Although semantic frames do not exactly correspond to semantic domains, there is a close correspondence. The list is far from complete but provides a very useful perspective on the semantics of lexical sets. It has around 370 frames.

Sino-Tibetan

A large Chinese classified dictionary with the words transcribed in Pinyin.

Greek

The domains in this work are limited due to the limited vocabulary and subject matter of the New Testament. The primary drawback to their system is that they classified words on the basis of a division into objects, events, and relationals. This is not (in my opinion) how the mind organizes words. The result is that closely related words are often found in different sections, simply because of their part of speech. Les Bruce and the LinguaLinks team modified the list slightly and included it in the LinguaLinks program. They also classified 25,000 English words as part of a tool to enable lexicographers to classify the words of their language. There are 846 domains in Louw and Nida’s dictionary and an additional 30 in the LinguaLinks revision.

General

Fleming I. Communication analysis: A stratificational approach. Dallas: Summer Institute of Linguistics. 1988.

Ilah Fleming developed a list of basic semantic constructions and constituents from a universal, etic viewpoint. Her purpose was to facilitate emic semantic analysis in previously under-described languages. Although her list is not a list of semantic domains, it is an excellent source for semantic concepts that tend to be expressed by functors.

Murdock GP. Outline of cultural materials. Human Relations Area Files; 1982.

The OCM categories were designed as a system for classifying anthropological notes. As such it lacks many lexical domains and some of its categories are not lexical in nature. However it is fascinating to see how much overlap there is between the OCM categories and the semantic domains in the other lists. It has 732 categories.

A short list of 218 basic words in 15 domains.

Indo-European

Buck’s purpose was to list words from selected Indo-European languages that express selected semantic concepts. Consequently he chose to organize his dictionary by semantics rather than alphabetical order. Since he only includes around 1,100 concepts, his concepts are usually very basic. So his entries often correspond to the domain labels in other lists of domains. It has 172 domains.

No front page content has been created yet.