Friday, December 01, 2006

Abstract vs Concrete: the two genders?( the catogorization debate)

In my previous posts I have focussed on distinctions in cognitive styles based on figure-ground, linear-parallel, routine-novel and literal-metaphorical emphasis.

There is another important dimension on which cognitive styles differ and I think this difference is of a different dimension and mechanism than the figure-ground difference that involves broader and looser associations (more context) vs narrow and intense associations (more focus). One can characterize the figure-ground differences as being detail and part-oriented vs big picture orientation and more broadly as analytical vs synthesizing style.

The other important difference pertains to whether associations and hence knowledge is mediated by abstract entities or whether associations, knowledge and behavior is grounded in concrete entities/experiences. One could summarize this as follows: whether the cognitive style is characterized by abstraction or whether it is characterized by a particularization bias. One could even go a step further and pit an algorithmic learning mechanism with one based on heuristics and pragmatics.

It is my contention that the bias towards abstraction would be greater for Males and the left hemisphere and the bias towards Particularization would be greater for Females and the right hemisphere.

Before I elaborate on my thesis, the readers of this blog need to get familiar with the literature on categorization and the different categorization/concept formation/ knowledge formation theories.

An excellent resource is a four article series from Mixing Memory. I'll briefly summarize each post below, but you are strongly advised to read the original posts.

Background: Most of the categorization efforts are focussed on classifying and categorizing objects, as opposed to relations or activities, and the representation of such categories (concepts) in the brain. Objects are supposed to be made up of a number of features . An object may have a feature to varying degrees (its not necessarily a binary has/doesn't has type of association, one feature may be tall and the feature strength may vary depending on the actual height)

The first post is regarding classical view of concepts as being definitional or rule-bound in nature. This view proposes that a category is defined by a combination of features and these features are of binary nature (one either has a feature or does not have it). Only those objects that have all the features of the category, belong to a category. The concept (representation of category) can be stored as a conjunction rule. Thus, concept of bachelor may be defined as having features Male, single, human and adult. To determine the classification of a novel object, say, Sandeep Gautam, one would subject that object to the bachelor category rule and calculate the truth value. If all the conditions are satisfied (i.e. Sandeep Gautam has all the features that define the category bachelor), then we may classify the new object as belonging to that category.


Bachelor(x)= truth value of (male(x))AND(adult(x))AND(single(x))AND(human(x))

Thus a concept is nothing but a definitional rule.

The second and third posts are regarding the similarity-based approaches to categorization. These may also be called the clustering approaches. One visualizes the objects as spread in a multi-dimensional feature space, with each dimension representing the various degrees to which the feature is present. The objects in this n-dim space, which are close to each other, and are clustered together, are considered to form one category as they would have similar values of features. In these views, the distance between objects in this n-dim feature space, represents their degree of similarity. Thus, the closer the objects are the more likely that they are similar and the moire likely that we can label them as belonging to one category.

To take an example, consider a 3-dim space with one dimension (x) signifying height, the other (y) signifying color, and the third (z) signifying attractiveness . Suppose, we rate many Males along these dimensions and plot them on this 3-d space. Then we may find that some males have high values of height(Tall), color(Dark) and attractiveness(Handsome) and cluster in the 3-d space in the right-upper quadrant and thus define a category of Males that can be characterized as the TDH/cool hunk category(a category that is most common in the Mills and Boons novels). Other males may meanwhile cluster around a category that is labeled squats.

Their are some more complexities involved, like assigning weights to a feature in relation to a category, and thus skewing the similarity-distance relationship by making it dependent on the weights (or importance) of the feature to the category under consideration. In simpler terms, not all dimensions are equal , and the distance between two objects to classify them as similar (belonging to a cluster) may differ based on the dimension under consideration.

There are two variations to the similarity based or clustering approaches. Both have a similar classification and categorization mechanism, but differ in the representation of the category (concept). The category, it is to be recalled, in both cases is determined by the various objects that have clustered together. Thus, a category is a collection or set of such similar object. The differences arise in the representation of that set.

One can represent a set of data by its central tendencies. Some such central tendencies, like Mean Value, represent an average value of the set, and are an abstraction in the sense that no particular member may have that particular value. Others like Mode or Median , do signify a single member of that set, which is either the most frequent one or the middle one in an ordered list. When the discussion of central tendencies is extended to pairs or triplets of values, or to n-tuples (signifying n dim feature space) , then the concept of mode or median becomes more problematic, and a measure based on them, may also become abstract and no longer remain concrete.

The other central tendencies that one needs are an idea of the distribution of the set values. With Mean, we also have an associated Variance, again an abstract parameter, that signifies how much the set value are spread around the Mean. In the case of Median, one can resort to percentile values (10th percentile etc) and thus have concrete members as representing the variance of the data set.

It is my contention that the prototype theories rely on abstraction and averaging of data to represent the data set (categories), while the Exemplar theories rely on particularization and representativeness of some member values to represent the entire data set.

Thus, supposing that in the above TDH Male classification task, we had 100 males belonging to the TDH category, then a prototype theory would store the average values of height, color and attractiveness for the entire 100 TDH category members as representing the TDH male category.

On the other hand, an exemplar theory would store the particular values for the height, color and attractiveness ratings of 3 or 4 Males belonging to the TDH category as representing the TDH category. These 3 or 4 members of the set, would be chosen on their representativeness of the data set (Median values, outliers capturing variance etc).

Thus, the second post of Mixing Memory discusses the Prototype theories of categorization, which posits that we store average values of a category set to represent that category.


Similarity will be determined by a feature match in which the feature weights figure into the similarity calculation, with more salient or frequent features contributing more to similarity. The similarity calculation might be described by an equation like the following:
Sj = Si (wi.v(i,j))
In this equation, Sj represents the similarity of exemplar j to a prototype, wi represents the weight of feature i, and v(i,j) represents the degree to which exemplar j exhibits feature i. Exemplars that reach a required level of similarity with the prototype will be classified as members of the category, and those fail to reach that level will not.

The third post discusses the Exemplar theory of categorization , which posits that we store all, or in more milder and practical versions, some members as exemplars that represent the category. Thus, a category is defined by a set of typical exemplars (say every tenth percentile).

To categorize a new object, one would compare the similarity of that object with all the exemplars belonging to that category, and if this reaches a threshold, the new object is classified as belonging to the new category. If two categories are involved, one would compare with exemplars from both the categories, and depending on threshold values either classify in both categories , or in a forced single-choice task, classify in the category which yields better similarity scores.


We encounter an exemplar, and to categorize it, we compare it to all (or some subset) of the stored exemplars for categories that meet some initial similarity requirement. The comparison is generally considered to be between features, which are usually represented in a multidimensional space defined by various "psychological" dimensions (on which the values of particular features vary). Some features are more salient, or relevant, than others, and are thus given more attention and weight during the comparison. Thus, we can use an equation like the following to determine the similarity of an exemplar:
dist(s, m) = åiai|yistim - ymiex|

Here, the distance in the space between an instance, s, and an exemplar in memory, m, is equal to the sum of the values of the feature of m on all of dimensions (represented individually by i) subtracted from the feature value of the stimulus on the same dimensions. The sum is weighted by a, which represents the saliency of the particular features.

There is another interesting clustering approach that becomes available to us, if we use an exemplar model. This is the proximity-based approach. In this, we determine all the exemplars (of different categories) that are lying in a similarity radius (proximity) around the object in consideration. Then we determine the category to which these exemplars belong. The category to which the maximum number of these proximate exemplars belong, is the category to which this new object is classified.

The fourth post on Mixing Memory deals with a 'theory' theory approach to categorization, and I will not discuss it in detail right now.

I'll like to mention briefly in passing that there are other relevant theories like schemata , scripts, frames and situated simulation theories of concept formation that take into account prior knowledge and context to form concepts.

However, for now, I'll like to return to the prototype and exemplar theories and draw attention to the fact that the prototype theories are more abstracted, rule-type and economical in nature, but also subject to pragmatic deficiencies, based on their inability to take variance, outliers and exceptions into account; while the exemplar theories being more concrete, memory-based and pragmatic in nature (being able to account for atypical members) suffer from the problems of requiring large storage/ unnecessary redundancy. One may even extrapolate these differences as the one underlying procedural or implicit memory and the ones underlying explicit or episodic memory.

There is a lot of literature on prototypes and exemplars and research supporting the same. One such research is in the case of Visual perception of faces, whereby it is posited that we find average faces attractive , as the average face is closer to a prototype of a face, and thus, the similarity calculation needed to classify an average face are minimal. This ease of processing, we may subjectively feel as attractiveness of the face. Of course, male and female prototype faces would be different, both perceived as attractive.

Alternately, we may be storing examples of faces, some attractive, some unattractive and one can theorize that we may find even the unattractive faces very fast to recognize/categorize.

With this in mind I will like to draw attention to a recent study that highlighted the past-tense over-regularization in males and females and showed that not only do females make more over-regularization errors, but also these errors are distributed around similar sounding verbs.

Let me explain what over-regularization of past-tense means. While the children are developing, they pick up language and start forming the concepts like that of a verb and that of a past tense verb. They sort of develop a folk theory of how past tense verbs are formed- the theory is that the past tense is formed by appending an 'ed' to a verb. Thus, when they encounter a new verb, that they have to use in past tense (and which say is irregular) , then they will tend to append 'ed' to the verb to make the past tense. Thus, instead of learning that 'hold', in past tense becomes 'held', they tend to make the past tense as 'holded'.

Prototype theories suggest, that they have a prototypical concept of a past tense verb as having two features- one that it is a verb (signifies action) and second that it has 'ed' in the end.

Exemplar theories on the other hand, might predict, that the past tense verb category is a set of exemplars, with the exemplars representing one type of similar sounding verbs (based on rhyme, last coda same etc). Thus, the past tense verb category would contain some actual past tense verbs like { 'linked' representing sinked, blinked, honked, yanked etc; 'folded' representing molded, scolded etc}.

Thus, this past tense verb concept, which is based on regular verbs, is also applied while determining the past tense of irregular verb. On encountering 'hold' an irregular verb, that one wants to use in the past tense, one may use 'holded' as 'holded' is both a verb, ends in 'ed' and is also very similar to 'folded'. While comparing 'hold' with a prototype, one may not have the additional effect of rhyming similarity with exemplars, that is present in the exemplar case; and thus, females who are supposed to use an exemplar system predominantly, would be more susceptible to over-regularization effects as opposed to boys. Also, this over-regularization would be skewed, with more over-regularization for similar rhyming regular verbs in females. As opposed to this, boys, who are usinbg the prototype system predominantly, would not show the skew-towards-rhyming-verbs effect. This is precisely what has been observed in that study.

Developing Intelligence has also commented on the same, though he seems unconvinced by the symbolic rules-words or procedural-declarative accounts of language as opposed to the traditional confectionist models. The account given by the authors, is entirely in terms of procedural (grammatical rule based) versus declarative (lexicon and pairs of past and present tense verb based) mechanism, and I have taken the liberty to reframe that in terms of Prototype versus Exemplar theories, because it is my contention that Procedural learning , in its early stages is prototypical and abstractive in nature, while lexicon-based learning is exemplar and particularizing in nature.

This has already become a sufficiently long post, so I will not take much space now. I will return to this discussion, discussing research on prototype Vs exemplars in other fields of psychology especially with reference to Gender and Hemisphericality based differences. I'll finally extend the discussion to categorization of relations and that should move us into a whole new filed, that which is closely related to social psychology and which I believe has been ignored a lot in cognitive accounts of learning, thinking etc.

Sphere: Related Content


R2K said...

Real science on a blog! I love it.

Anesha said...

Hi Nice Blog . I don't really know a lot about Human Anatomy study or art, but that's just my 2 cents. Really great job though, Krudman! Keep up the good work!