Editor's note: Mike Fassino is president of EnVision Knowledge Products, Media, Pa. This is the second installment of a three-part series on neural networks. The first part, "Understanding back-propagation" appeared in the April 1997 issue of QMRR.

In the first article of this series I explored supervised learning neural nets, especially the back-propagating network.

To train a supervised learning neural network, one presents both the independent and dependent variables. The network's weights are slowly and systematically adjusted so that whatever the pattern of input variables, the network's output value will be maximally similar to the actual output value. As I described in the previous article, a back-propagating neural network learns a mapping from the input to the output variables. Supervised learning neural networks can only be used when it is appropriate to speak in terms of independent and dependent variables and in those applications where we know the actual value of both. The question naturally arises: what about those situations where there is no right answer, such as market segmentation and perceptual mapping, or situations where we do not know the value of the dependent variable, such as sales forecasting? In these applications, there is no specific mapping to learn and the network must discover the pattern and structure that exists in a database.

This article addresses this question by introducing a second neural network paradigm: unsupervised learning neural nets. Our focus will be on segmentation and perceptual mapping. Forecasting will be the subject of the third and final article in this series.

Unsupervised learning neural networks are appropriate when there is no uniquely correct answer and the network must learn to recognize patterns within the data, and to then organize the data so that it supports understanding of the structure and relationships among data elements. In this article, we will be especially concerned with a type of neural network known as a Kohonen self-organizing map, named for its developer, the Finnish computer scientist Tuevo Kohonen. The only other significant unsupervised learning neural network is called an ART (adaptive resonance theory) map and will not be discussed here.

The Kohonen self-organizing map (or SOM for short) results in a two-dimensional map showing the location of both respondents and survey items. In assembling the two-dimensional map, the Kohonen SOM performs three functions that make it especially relevant for market researchers. These three functions will be verbalized below and then illustrated with data from a large panel survey of attitudes toward personal investment. The three central functions of the Kohonen SOM are:

1. Topological mapping: Respondents who have a similar pattern of ratings across the set of input items end up being located close to each other in the two-dimensional map. Obviously, this is a kind of clustering.

2. Emergent co-ordinate system: Different areas of the two-dimensional map become selectively tuned to specific survey items. The selective tuning causes a co-ordinate system of survey items to emerge which defines the two dimensions of the map. By knowing a respondent's location on the map, we can "read off" their pattern of response to the survey items.

3. Optimal discrimination: With a slight modification to the basic SOM, one can design a system in which the underlying co-ordinate system (and, therefore, the clustering of respondents) explains the greatest amount of variation in some outcome variable.

The easiest way to begin understanding these three central properties of the SOM is to dive into an actual example. Once the way to interpret the SOM is clear, I will explain how it works. I will end this article with some brief suggestions and illustrations of the unique strengths and weaknesses of the SOM.

The data

To illustrate the Kohonen self-organizing map, we will analyze a database from a large national panel. Approximately 3,000 respondents rated their level of agreement with 22 statements about personal investment strategies. The 22 statements are shown in Table 1 (see p. 49). For our illustration, we randomly selected 1,000 respondents to develop the SOM. One of the most important steps in successfully using any neural network, including the self-organizing map, with survey data is appropriately preprocessing the data. Just giving the network raw data is rarely effective.

The results

The SOM finds nine segments in this database of 1,000 respondents and provides two ways to visualize the results. One can look at the center of each of the nine segments, as shown in Figure 1, or at individual respondent points, as shown in Figure 2. Figure 1 shows the location of the center of each of the nine segments, the circles show each segment's relative size. The extreme precision and structure of Figure 1 is no accident or mistake - it is the essence of the Kohonen self-organizing map.
Figure 2 shows each of the 1,000 individual respondent points. Since many points completely overlap, clusters appear as areas of greater density of data points. For example, Figure 3 shows all of the Segment 1 respondents' locations. Figure 3 represents a "blow up" of the lower left corner of Figure 2. There are 83 respondents in Segment 1. The fact that all 83 of these respondents fit into such a small area of the two-dimensional grid indicates that they form a compact cluster - all 83 have very similar ratings across the 22 segmenting items. As in a traditional point-vector perceptual map, the closer together two or more respondent points lie to each other, the more similar their ratings across all 22 items. Figure 4 shows a comparable "blow-up" for Segment 9. There are 116 respondents in this segment; again, all are tightly clustered together. It is evident that Segment 9 is not as highly compact as Segment 1, for reasons that will become clear later.

The following table shows how an individual item from our pool of 22 items works in the self-organizing map.

Deviation Scores: I enjoy managing my own savings and investments.

SEGMENT DEVIATION SCORE

7 8 9 -0.85 -0.79 -1.21
4 5 6 0.49 0.06 0.04
1 2 3 1.04 0.65 0.57

The left panel of the table helps you orient by showing the location of the nine segment centers as in Figure 1. The right panel shows the deviation score for each segment on the first item, "I enjoy managing my own savings and investments."

The deviation score is calculated by simply subtracting each segment's average rating from the overall grand average calculated from all 1,000 respondents. The structure of the co-ordinate system is now apparent. As you move vertically across the grid, from Segment 1, 2 or 3 toward Segment 7, 8 or 9, agreement with this item decreases. This characteristic of ratings to change smoothly as one moves across the two-dimensional grid is what is meant by a co-ordinate system. The fact that the neural network builds this co-ordinate system from the data rather than having it imposed on the data is what is meant by emergent. Thus, one of the characteristics of a Segment 1 respondent is a very high level of agreement with "I enjoy managing my own savings and investments."

The following table shows another example, this time for "Although I am interested in some growth, my primary goal is to preserve the principle of my assets."

Deviation Scores: Although I am interested in some growth, my primary goal is to preserve the principle of my assets.

SEGMENT DEVIATION SCORE

7 8 9 -0.76 0.39 0.79
4 5 6 -0.87 -0.62 0.99
1 2 3 -1.15 0.16 1.08

Again, the co-ordinate system is quite clear. As you move horizontally across the grid, agreement with this statement systematically increases. There is virtually no systematic relationship between agreement and vertical movement.

The impact matrix

As these two examples illustrate, the emergent co-ordinate system defines smooth patterns of change in agreement as one moves across the SOM grid. Even with only two dimensions, the number of patterns that the SOM is able to capture is quite large. The impact matrix shows in a single glance the relationship between variables and grid orientation:

The impact matrix provides two pieces of information for interpreting the self-organizing map:

1. The size of the coefficient shows the degree to which an axis is defined by the attribute, just as in factor analysis. Thus, "I enjoy managing my own savings and investments" has a strong definitional weight on the vertical axis as we saw in the previous illustration.

2. The sign of the coefficient shows the direction of movement. A large negative weight on the horizontal axis means that as one moves from left to right, agreement decreases. Conversely, a large positive weight on the horizontal axis implies that as you move from left to right, agreement increases. Large negative weights on the vertical axis mean that agreement declines as you move downward on the grid while large positive weights imply that agreement increases as you move upward.

Some items will have relatively high weights on both axes, representing important compound or diagonal movements like that shown for "I would never pay a fee, commission, or load to buy a mutual fund" (20), where Segments 2, 3 and 6 are the most willing to pay. Interestingly, these segments are predominately risk averse and cover the spectrum of comfort in managing their own investments.

It is a simple thing to translate the incidence matrix into a single, simple picture that shows what's going on, as shown in Figure 5. Table 1 shows the deviation scores for all 22 items.

The vertical axis of the self-organizing map differentiates between people who feel they need help in managing their personal investments and those who feel personally up to the challenge. The horizontal axis provides a scaling of risk aversiveness. The segments to the far right (e.g., 3, 6 and 9) are more averse to risk than the segments to the left.

Segments 1 and 3 both prefer to manage their own investments, but Segment 3 is going to have a lower tolerance for risk than Segment 1, as indicated by their substantially lower levels of agreement with:

  • "I am willing to tolerate a short-term decline in the value of my investments if that's what it takes to achieve higher long-term returns."
  • "I prefer investments that require me to be somewhat involved and make decisions every so often."
    And higher agreement with:
  • "Although I am interested in some growth, my primary goal is to preserve the principal of my assets."

Moreover, as tolerance for risk decreases, there is a tendency for respondents to see themselves more as savers than investors. Similarly, as tolerance for risk decreases and a feeling of needing help with personal investments increases, there is a tendency to view investment products as overly complicated, as illustrated by the diagonal line in Figure 5.

Notice that a potential new product concept, "An investment programs that focuses on a life stage approach to investment management," is attractive to the middle tier of segments (4, 5 and 6). Using the other interpretive features of the map, it would be easy to position this product concept selectively to each of these segments: targeting Segment 4 requires that the life cycle program provide the opportunity for substantial growth with acceptable levels of risk, while targeting Segment 6 requires a low risk savings positioning. Positioning in terms of personal vs. professional management will not meaningfully differentiate the segments.

Summarizing, as one moves from segment to segment, things change smoothly. The great discontinuities you are probably used to with cluster-based segmentation are not present in the SOM. This probably jibes more with your own introspection that, as a person, we are more or less like, say a VALS Belonger or Achiever, rather than having all of the attributes of a Belonger and none of those associated with an Achiever.

It is easy to interpret any individual point in the SOM. For instance, Figure 4 showed all of the Segment 9 points and while they clearly cluster around the center, there is a subset of points (indicated with an arrow) that are apparently more open to risk than the majority of Segment 9. Similarly, there is a second subset (indicated with a bracket) that desires more personal involvement in their investments.

We turn now to a very brief overview of how the self-organizing map works. Some of the mathematical detail is very complex and, for that, the interested reader is referred to Kohonen's recent book, Self-Organizing Maps (Springer-Verlag, 1995). A self-organizing map consists of at least three layers:

1. An input layer where the network obtains information about the data (the 22 rating statements in our example).

2. An output layer where the network reports its results - in this case, the horizontal and vertical co-ordinates of each respondent's location in the two-dimensional grid. As I suggested previously, there may be additional layers, in which case the output of the SOM serves as an input to a back-propagating neural network or a traditional statistical procedure like linear regression. This is how my company's NeuroSeg program works.

3. A grid-like hidden layer, frequently referred to as a Kohonen layer.

I will describe how the hidden Kohonen layer works. There are three important features of the Kohonen layer: lateral interconnections, competitive learning and selective tuning.

Lateral interconnections

A small Kohonen layer is shown in Figure 6. The figure shows 16 processing units arrayed in four rows of four units per row indicated by the circles. Notice that each of the 16 processing units is connected to all of its surrounding neighbors, indicated by the lines. These lateral interconnections are very important. The processing unit in the Kohonen layer works like the processing units described in my previous article. In an SOM, however, each unit's activity influences its neighbors activity and is influenced by the activity of its neighbors. Since each unit has a neighbor and is in turn a neighbor, each unit's activity influences the activity of all other processing units via the lateral interconnections. Of course, a unit influences its closest neighbors more strongly than those far away. A processing unit's activity can be either positive (excitatory) or negative (inhibitory). If a processing unit having a negative connection with its neighbor becomes activated, it makes it more difficult for its neighbor to become activated. Conversely, if the unit has an excitatory connection with its neighbor, its activity makes it easier for its neighbor to become activated. Activation means responding to a certain pattern in the input data.

Selective tuning

Training the SOM begins with presenting a line of data to the Kohonen layer. In our example case, this line of data consists of 22 variables. On the very first presentation, the processing units randomly respond to the 22 variables, some becoming activated by particular elements (questions), others being inhibited. Each time a processing unit becomes activated by a given input variable, its connection to that variable is strengthened. The next time it sees the particular variable, it is more likely to become activated. By becoming activated by a given variable, a processing unit will also inhibit other processing units from becoming activated by the same variable. After only a few presentations of data, small local neighborhoods of processing units become activated to particular patterns within the input data.

Beginning with a random seed, the Kohonen layer quickly builds an internal representation of the data wherein specific processing units become selectively tuned to specific data elements. If a processing unit elsewhere in the grid tries to become activated by these items, they are suppressed through lateral inhibition. This combination of suppressing other neurons while strengthening their own association with a given input pattern leads to the formation of the smooth co-ordinate system; neural network researchers refer to this as competitive learning since, in a very real sense, all of the processing units compete with each other for the chance to become activated by a data element and thereby have their connection to this element strengthened.

In our example, the processing units located in the lower left of the grid became selectively tuned to:

  • "I enjoy managing my own savings and investments"
  • "I stay informed about the types of investments on the market today"
  • "I can make good decisions myself about when to buy and sell investments to maximize my gains"

The selective tuning to these elements means that a respondent having very strong agreement with these items would map onto the lower right of the grid. As agreement with these items decreases, the respondent maps away from the lower right, with a high level of disagreement mapping onto the upper right of the grid. Because of the lateral inhibition and excitation, respondents with similar patterns of ratings across all 22 items are pushed and pulled into a common area of the grid, while respondents with different patterns are pushed and pulled into different areas of the grid. All the pushing and pulling that goes on during training results in clusters that represent the central tendencies of the data. In fact, the Kohonen layer maps the probability density function of each response pattern.

This is a technical, but very important, point. Response patterns that are very rare occupy a very small proportion of the Kohonen layer while patterns that are relatively frequent take up a large proportion. Within this large proportion, the Kohonen layer is able to stretch itself to localize even small differences within groups of people whose central tendency is similar. This elasticity of the Kohonen layer (which comes about through the lateral excitation and inhibition and competitive learning) is what accounts for the easily observed clustering in Figures 1 and 2.

In Figure 2, there are nine segments and all of the respondents plot near one of the segments. The gaps of open space indicate response patterns where very few respondents fall. (These, as I mentioned, are respondents whose pattern of rating across the 22 items is very rare. These might be incorrectly coded or keypunched data, people who failed to understand the scale, people who intentionally gave inconsistent responses, or people who are just by nature contradictory, like those few in this study who felt they needed help with their investments but didn't want anyone to help them).

Applications and extensions

I will now address four very exciting and important things about applying the Kohonen SOM to segmentation data:

1. The segments appear to be very stable. Have you ever done a segmentation study on the same universe of respondents using the same questions at two different points in time and found that the proportion of respondents falling in each segment had shifted? This shifting reflects instability in the underlying segmentation scheme.

The Kohonen self-organizing map is extremely stable. This is a very important feature for segmentation analysis. It means that a particular respondent's data, measured at two different points in time, has to just be similar at both points to map the respondent into the same segment. This test-retest stability issue is a frequent problem with Q-factor analysis and discriminant analysis, which pay particular attention to global aspects of the database, like the pattern of covariation among items. The SOM, while it certainly pays attention to the entire database, turns out to care more about the interrelationship of items within a respondent. We can illustrate this with our large mutual fund database. Remember that we randomly chose about 1,000 of the almost 3,000 respondents available.

To give some illustration of stability, we carefully selected five of the 22 items and trained a supervised learning neural net like that described in my first article with our original 1,000 respondents. Here we satisfy the criteria for a supervised learning neural net: we knew the value of the independent variables (in this case, the ratings on five of the 22 items) and the value of the dependent variable (the segment the SOM put people in using all 22 items). A back-propagating neural network quickly learns the highly nonlinear mapping from the five items to segment code. We then took 1,000 new respondents from the database - 1,000 respondents not used in our original SOM - and used our back-propagating neural network to determine in which segment a person fell based on the five items.

Finally, we ran these new 1,000 respondents through our trained SOM using all 22 items. If the SOM segmentation is stable, then the segment codes we derive using only five items should line up with what we get using all 22 items. The results are shown in the following table. Along the top we show the segment our fresh 1,000 respondents went into with the full 22 items on the trained SOM. Along the side we show the segment they went into with the five-item short form and a supervised learning neural network. The entries on the diagonal show the percentage of respondents that landed in the same segment with both schemes:

LONG FORM (SOM)
SEGMENT
Short-Form 1 2 3 4 5 6 7 8 9
1 100 1 0 2 0 0 0 0 0
2 0 96 0 0 4 0 0 0 0
3 0 0 100 0 0 0 0 0 0
4 0 3 0 94 0 0 0 0 0
5 0 0 0 0 96 0 0 0 0
6 0 0 0 0 0 97 0 0 3
7 0 0 0 4 0 0 100 0 0
8 0 0 0 0 0 0 0 100 3
9 0 0 0 0 0 3 0 0 94

Even with only five of the 22 items, we are able to correctly classify over 97 percent of our new sample of respondents. (Again I emphasize that we very, very carefully selected the five items and we very, very carefully preprocessed all the data.) Anyone who isn't impressed with this finding should ask their favorite market research statistician how they think they would do on this same task using Q-factor analysis or K-means clustering to form the segments and multiple linear discriminant analysis to test the five item short form. On this database, we achieved 41 percent correct prediction using these traditional methods.

Of course, the best way to establish the test-retest reliability is to measure the same people at different points in time. Unfortunately, we do not have this kind of data available and so we must settle for the simulation outlined above.

2. The SOM can be easily modified so that the segments are always managerially relevant. Have you ever done a segmentation that told a nice story only to look at a cross-tab of market share by segment and find absolutely no difference between segments? Our NeuroSeg product adds a supervised learning neural network to the output of the Kohonen layer. This has some benefits for market researchers. For instance, you might want to do a segmentation and have the segments relate to customer satisfaction. In this case, you input the segmenting items to the Kohonen layer and the satisfaction rating (or ratings) to an output layer. NeuroSeg has an additional layer analogous to the back-propagation layer I described in the previous article sitting between the Kohonen and output layers. The errors in predicting satisfaction then feed back to the Kohonen layer, causing adjustments of all the lateral inhibition and excitation. After a moderate amount of training, the Kohonen layer becomes selectively tuned in such a way that the clustering of respondents maximally predicts overall satisfaction. In fact, the resulting SOM will have a co-ordinate system on satisfaction so that as you move across the grid, satisfaction systematically changes.

Any data can be used in the output layer to ensure that the resulting segmentation is relevant. For example, respondents in the investment database we have been examining were asked to identify the company with which they would make their next mutual fund investment. If we sent this question to the output layer of NeuroSeg, the database would not only be segmented, but the resulting segments would maximally differentiate between companies. Think of it as a simultaneous cluster and discriminant analysis where the results of the discriminant analysis are fed back to the clustering so that the resulting clusters give the best possible discrimination.

3. It is easy to determine the number of clusters that should be retained. Although I will not explain it in any detail here, there is a measure of how well the two-dimensional grid in the Kohonen layer is working. This measure is called the quantization error, or QE. The lower the QE, the better the segmentation.

Sometimes, of course, there is a trade-off. You might want six segments even though 12 have a much better QE. Generally, because of the self-organizing co-ordinate system, solutions with a large number of segments represent finer gradations of solutions with a smaller number of segments. In these cases, deviation charts like those shown earlier change much more slowly and smoothly as you move across the grid. In other words, a 12-segment solution is a lot like a six-segment solution, only each segment is much more compact and homogenous. In other cases, however, the added segments allow for entirely new patterns of interrelationships to emerge.

4. The SOM works very well for perceptual mapping data where you have a sample of respondents evaluating two or more products on three or more attributes. In this case, products with similar perceptual structure map close to each other and the attributes form a smoothly varying self-emerging co-ordinate system. The impact matrix and deviation scores show how respondents structure the marketplace and you can then easily read off the position each product occupies within this structure. Space prohibits us from showing examples of a neural network-based perceptual map, but I will be glad to send an example to anyone who is interested.

There are also some serious disadvantages and limitations of the Kohonen self-organizing map, including:

1. The SOM cannot be used with categorical data and doesn't do very well with ordinal data. With this kind of data, one will get better results with correspondence analysis than a self-organizing map. The more robust the measurement, the better the SOM works.

2. If you are using a large number of items as the basis of the segmentation, you will need a large number of respondents. This is due to the competitive learning paradigm which requires many observations in order to accurately tune the weights.

3. The Kohonen self-organizing map frequently finds a lot of segments. With interactive presentation technology, this may or may not be a serious problem since, as I suggested above, solutions with a large number of segments usually find very fine gradations which, for the purposes of marketing strategy development, can be combined.

4. If you are used to the non-overlapping, highly discontinuous kinds of segments that cluster analysis finds, the idea of smooth transitions and overlapping segments might be awkward.