Development of tonal centres and abstract pitch as categorizations of pitch use.


Modelling how people establish a sense of tonality and encode pitch invariance
are important elements of research into musical cognition. This paper describes
simulations of processes that induce classifications of pitch and interval use
from a set of nursery-rhyme melodies. The classifications are identified with
keys and degrees of the scale. The extractive process has been implemented in
various forms of shunting, adding and tracking memory, and ART2 networks, Kohonen
feature maps and feedforward nets are used as classifiers, in modular
combinations. In the model, stable tonal centres emerge as general categories of
pitch use over the short to medium term, while degree categories emerge from
classifying interval use over the longer term. The representations of degree
reflect the similarity relations between degrees. Overall, this research is
concerned with the problem of how to abstract representations of sequences in a
way that is both resilient and adaptive. It uses various extractive processes
cooperatively to derive consistent representations from sequences of pitches, and
shows that by using information generated within one process it is possible to
guide the development of another, in this case functional representation.


KEYWORDS: Music, induction, tonality, pitch use, interval use, sequence
categori-zation, ART2, Kohonen feature maps, feedforward networks.


1. Introduction



The simulations described in this paper[1] are concerned with modelling the
abstraction of profiles of pitch use from music, and how statistical
representations may contribute to the development of musical tonality. Musical
tonality is a complex phenomenon. While not wishing to discuss in detail all the
processes that have been recognized as influencing tonal understanding, it is
worthwhile to outline its four major aspects.


Firstly, there is a precedent stage of psychoacoustic transduction. It is unclear
how the identities and relations emerging from this transduction influence other
tonal processes. While psychoacoustics seems insufficient to account for tonality
as a whole (Storr, 1992; Griffith, 1993a), it is obviously a necessary phenomenal
basis (Terhardt, 1974; Patterson, 1986). The psychophysical transduction of sound
is assumed to be a psychological constant, i.e. it has been acquired over
evolutionary time-scales.


Secondly, there are processes involving a dialogue between social conventions,
top-down schematic processes and psychoacoustic mechanisms. An example of this
kind of process is the evolution of temperaments leading to the system of keys in
western music. This involved an understanding of the mathematics underlying
tuning, the development of keyboard instruments, and the wish to resolve
aesthetic problems inherent in previous tunings. The historical development of
scales and temperaments suggests a multiplicity of processes: some as precedent
as gestalt grouping principles, others as tertiary as the formalization of the
properties and implications of pitch sets. How these perceptual and conceptual
processes interact is part of the ongoing research agenda for cognitive
musicology.


Thirdly, there is the coincidence of tonal with non-tonal musical dimensions
(Breaman, 1990; Handel, 1973). These are considered to include phrasing
(Nar-mour, 1984; Page, 1993), rhythmic and metrical structures (Peretz &
Kolinsky, 1993) and expressive timing (Shaffer et al., 1985). The influence
between tonality and these structures seems to some extent to be mutual. However,
very little is known for certain about the operation of coincidence between
musical dimensions, or how and when it takes place in auditory processing.


Fourthly, tonality involves functions that induce structure from the use that is
made of pitches. These functions are concerned with structure that arises over
time, i.e. they are sequential. Pitch use has three aspects: (1) the memorization
of sequences, (2) the abstraction of the attributes of pitch use within
sequences, (3) the compositional relations--in the general sense of putting
together--between the first two aspects. The simulations reported focus on the
second of these aspects, the abstraction of patterns of pitch use.


2. Computational Models of Key and Tonality


Over the last 30 or so years, various programs have been written with the aim of
identifying, from a stream of pitch information, the key of a piece of music.
Most of these have been concerned with the operation of a fully developed sense
of key (Simon, 1968; Winograd, 1968; Longuet-Higgins & Steedman, 1970; Holtmann,
1977; Ulrich, 1977; Krumhansl, 1990a; Huron & Parncutt, 1993). More recently,
artificial neural networks (ANNs) have been used to model some of the processes
involved in learning about musical structure via simulated exposure to pieces of
music.


The idea that musical schemas emerge from broad perceptual processes,
classi-fying structured sequences of sound, is prominent in the work of both
Bharucha (Bharucha, 1987, 1991), who stresses the chordal structure of tonality,
and Leman (Leman, 1990, 1992), who stresses the harmonic (overtone) structure of
tonality. Frances (1988) argues that there is not enough information in melody by
itself to serve as the basis for tonality. The simulations described below
explore how much can be learned from a minimum of melodic information.


Leman's model (Leman, 1992) involves the temporal integration of the harmonic
constituents of pitches, and is based on the virtual-pitch theory of Terhardt et
al. (1982), and developed by Parncutt (1988). Pitch representations integrated
over a few seconds are related to tonal centres previously developed from
representations of chords. Tonal attribution is a function of the distance
between the current integrated pitch representation and tonal centres. The model
specifies no mechanism to account for the emergence of abstract pitch.


MUSACT (Bharucha, 1987) is a spreading activation model of chord and key
relations. Layers of units representing pitches, chords and keys are connected to
each other by virtue of membership, i.e. a pitch is only linked to those chords
it is a member of. When a set of pitches is fed into the network, the connections
propagate activation to chords and then to keys. Overall, the net settles into a
vote for different chords and keys. MUSACT has been extended using competitive
learning to simulate the acquisition of the chord and key schemas. The learning
model proposes that chords are grouped in pools. The chords in a group are
'yoked' together for the purposes of learning. When a chord of a particular type,
e.g. C major, is mapped to an uncommitted node in a group, the other nodes in the
group are also adjusted so as to be predisposed to the same pattern of pitches at
other points in the chromatic scale. Yoking chords in this way is necessary
because there is nothing to suggest, for example, that the chord c e g (C major)
is of the same type as f a c (F major). The learning mechanism incorporates the
pitch transformations through the chromatic pitch set that relate chords of the
same type built from different roots. By using this mechanism, MUSACT eschews the
use of interval information. However, it is arguable that if it used interval
information, then the chord pools could be developed in a simpler, and more
general, way out of categorizations of the interval structure of chords. The
issues that arise out of whether pitch relations are encoded by mechanisms of the
type proposed in MUSACT, or whether pitch relations are previously encoded as
intervals, allowing their subsequent use by other processes, is beyond the scope
of this paper. The model described below proposes that intervals are encoded, and
that this information is used to establish abstract pitch identity.


MUSACT models the construction of tonality as the integration of pitches into
chords and scales. It specifies only a black-box 'gating' model of the
abstraction of pitch into an invariant form such as the tonic sol-fa, or scale
degrees. Although the tonic sol-fa is primarily an aide-memoire, and the system
of degrees is primarily an analytical device, both describe an abstract, single
octave scale. These abstract scales play an important role in learning and
remembering music. MUSACT uses pitch class and key identities to gate pitch into
a form where the tonic of a scale is always 0. An equivalent mechanism is
described by Scarborough et al. (1989). How might such a mechanism be learned? It
is difficult to see how to establish a direct phenomenal basis for abstract pitch
in the overtone characteristics of pitches, as this does not vary by key--yet the
abstract function, and scale position, of pitches and chords depends upon the key
they are played within. However, differentiation does arise in a representation
of how pitches are related to other pitches over a period of time. These
relationships can be thought of either in terms of the statistics of the pitch
transitions (Griffith, 1993a), or in terms of the statistics of the interval
transitions associated with sets of pitches.


The model described below investigates constructive mechanisms that extract and
classify the statistics associated with the use of pitch classes and their
associated intervals. These mechanisms induce outlines of tonal structure
representing diatonic centres and degrees of the scale.


3. Pitch Use and the Development of Tonality


The music used in the simulations is a set of nursery rhymes. These are
represented by vectors that identify the pitch classes and intervals used in
western music.


The tonality of pieces of music is often described in terms of subsets of
pitches. In western music the diatonic scales are pre-eminent. Each diatonic
major scale consists of seven pitches related by a pattern of tone (T) and
semitone (S) intervals--T T S T T T S. Thinking about tonality as a set
emphasizes that a piece in a particular key tends to use those pitches which are
members of the scale identified with this key. Furthermore, different pitches are
emphasized by being used more and less frequently. Figure 1 shows how the
salience of pitches, from Krumhansl (1990a), differentiates the scales of C and
G--which differ by only one pitch.


Although this description is specific to the western pitch system the underlying
principle is more general. The emphasis of some pitches over others through
repetition and return occurs in many musical systems, including Indian, Chinese
and Japanese (May, 1980). Deutsch (1975, 1978) has argued that repetition aids
memorization, and (Krumhansl, 1990b) has developed this kind of insight into a
theory of tonal structure. The psychological experiments of Krumhansl (1990a)
indicate a strong correlation between the measured salience of pitches in
different scales, and the frequency of occurrence and accumulated duration of the
pitch classes in pieces of music.


However, what is not clear is whether or not patterns of frequency of pitch
occurrence that arise over pieces of music are distinct enough to allow the
induction of categories underlying tonal structure, i.e. keys and degrees. The
first set of simulations illustrate in a simple model that it is possible to
track and classify the statistics associated with pitch use into tonal centres.


The frequency of pitch occurrence is a very general characterization of a pitch
sequence. Underlying it there are local patterns, such as phrases, verses, etc.,
and the overall pitch profile arises from them. The repeated use of short pitch
sequences forms contextual patterns that vary with the key. This is closely
related to the function of a pitch as a degree of the scale (Butler & Brown,
1984; Butler, 1989), and also with its identity in abstract mnemonic scales.


There is a variety of opinions about the status of encodings of interval in tonal
structure, for example, Knunhansl (1990b) and Butler and Brown (1984), and in
melodic memory, for example, Dowling (1984, 1988). However, the close
relationship between the two is well established. The representation of pitch use
in terms of intervals allows comparison between pitch use in different keys.


The model focuses on the frequency of occurrence of pitches and intervals in
diatonic major scales. As such, it can make no claim to be a complete model of
tonal induction. It is a partial synthesis of insights in the work of Krumhansl,
Butler and Brown and others (Krumhansl, 1990a; Butler & Brown, 1984; Brown, 1988;
Butler, 1989).[2] More generally, tonality is conceived to arise from a diverse
set of processes (Bharucha, 1987; Terhardt, 1984; Bregman, 1990; Balzano, 1980).


Several concerns have influenced the development of the model. Firstly, the idea
that classification is an adaptive process, concerned with developing stable,
consistent representations of aspects of experience. Secondly, that
classification is focused at appropriate levels of granularity, both
representationally and temporally, resulting in categorizations that are more or
less general (Lakoff, 1987). Thirdly, that representations derived at one level
of granularity may be used as a source of attentional focus by processes making
other classifications. These characteristics--adaptive stability, appropriate
categorical granularity and attentional structure--are all viewed as desirable
properties. In particular the simulations are interested in how categorical
processes which attend to different dimensions of pitch use, over different
representational and temporal spans, can be used together.


The experiments to be described all use the same overall procedure. A stream of
vectors representing the melodies of the nursery rhymes are tracked by a memory
function. The content of the memory is then presented to an Adaptive Resonance
Theory (ART) type of ANN classifier.


4. Inducing Key from Patterns of Pitch Use


The nursery rhymes used in the simulations are from three collections
(Chesterman, 1935; Mitchel & Blyton, 1968; Anon, 1979). The training set
consisted of 60 songs, the test set of a further 25. Each song was present in the
training set transposed to all keys, giving in all 720 sequences in the training
set and 300 in the test set. The songs' profiles of frequency of pitch occurrence
were all highly correlated with the pitch salience profiles in Krumhansl (1990b).


In these first experiments, the songs are input to a self-organizing ART ANN
classifier, ART2 [3] (Carpenter & Grossberg, 1987). The model is outlined in
Figure 2. Each sequence is represented by a succession of vectors, each of which
identifies a pitch class. Three representations of pitch were compared.


The first representation, FREQ, is a simple identity vector of 12 elements. The
identity of the pitch is indicated by the value of one element being 1. In the
DUR representation, the value associated with a pitch reflects the length of the
note. A crotchet is 1, quavers are 0.5, etc. The SLI, timesliced representation,
involves the repetition of each FREQ vector a number of times to reflect its
duration. Semi-qua-vers are presented twice, quavers four times, etc. A similar
representation was used by Todd (1989).


The pattern of pitch use presented to the ART2 classifier is constructed in a
memory (TRK), which tracks the number of occurrences of pitches in melodies. This
memory is implemented in a network form as a process of slow learning, shown in
Figure 3. The memory vector has elements equivalent to the input vector. At the
start of each song the values in the memory are set to zero. The values developed
within the memory during the passage of a song reflect the number of times
different pitches occur. The memory is presented to the ART2 classifier at the
end of each song.


Where values in the input vector X are between 0 and n, the learning in the TRK
memory is as follows, where x[sub i] is the input value, w[sub i] is the memory
value, initialized to zero, and the response of the memory is determined by
eta:[4]


if x[sub i](t) > 0: w[sub i](t + 1) = w[subi ](t) + eta(x[sub i](t))(1 - w[sub
i](t))


else : w[sub i](t + 1) = w[sub i](t)


eta[sup T] rates of 0.005, 0.01, 0.05, 0.1 and 0.5 were used. In all simulations
the network developed twelve nodes, each equivalent to a key. They were stable
over the different eta[sup T] rates and emerge with ART2 vigilance (rho) set
between 0.91 and 0.97. The percentage of mappings to the home[5] key was
91.5-98.3%. After the network had been trained, it was tested with the developing
memory, pitch by pitch. This showed a generalization of between 79% and 90.3%.
Results for the FREQ representation were slightly better than for either the DUR
or SLI representation.


In both the training and test set the end-of-song memory pattern for those songs
lacking degree 3 and, or 4,[6] were irregularly misplaced. The accuracy of the
pitch-by-pitch mappings reflects how closely the TRK memory approximates the
overall pattern of frequency of occurrence encoded in a key exemplar as a song
progresses. The songs that are consistently misattributed when the memory is
pitch often have atypical representations of degree 5 and some presented pitch by
have degree 6 or 7 missing.


The typicality of each song was measured by correlating the frequency of
occurrence of pitches in each song with the average frequency of occurrence of
pitches in all songs. There is a high correlation--at the 0.001 level of
significance--between end-of-song activation and song typicality for all
representations and eta[sup T] rates. However, the correlation between the number
of correct pitch-by-pitch attributions and typicality is generally insignificant
at the <0.10 level for all eta[sup T] rates, as are the correlations between song
length and song typicality, between end-of-song activations and song length, and
between correct mappings and song length.


The pattern of attributions indicates variations in pitch use within songs. The
pattern of frequency becomes more stable over time. The relationship between
typicality, activation and number of attributions is shown in Table I for four
songs. Two are typical in terms of the overall frequency of occurrence of their
degrees, two are not. Two are well attributed, two are not. Because a song is not
typical of a key does not mean that it is closer to another. The total activation
for all the key nodes over time was calculated for the selected songs, and
plotted in Figure 4. The final activations of the songs does not reflect how
quickly the pattern of activation emerges.


The networks were also tested with 27 pitch sequences, as described in Brown
(1988). These sequences were designed to show that the pitches used in a
se-quence--the tonal content--are not enough to determine correct key
attribution. Only 35% are correctly attributed at the end of sequence. 45% of
attributions are to the adjacent key, and the rest--all of which are chromatic
sequences--are to more distant keys. If these are excluded on the grounds that
the network was trained on diatonic melodies, the correct attributions rise to
42% and the incorrect attributions are all, bar one, to the adjacent keys. The
network's performance is better than was expected. However, the information
available to the network is obviously not sufficient to ensure it makes the
correct attributions.


Overall, the viability of inducing the form of pitch salience derived by
Krumhansl (1990a), is supported by the simulations. However, presenting the TRK
memory at the end of songs means that only developed memory patterns are paid
attention. If the memory is presented to the classifier after each pitch has been
incorporated in the memory, then the categories are less clear-cut. This reflects
the undeveloped patterns tracked at the start of songs, which form nodes in their
own right. It could be argued that as more coherent nodes develop, these
impoverished nodes would eventually be pruned away, by some form of cognitive
economy. An alternative argument is that the classification of the statistics of
frequency of occurrence, at this level of generality, is likely to reflect an
attentional mechanism that is concerned with relatively stable patterns of pitch
use. It certainly seems feasible to construct a mechanism that will only present
stable patterns for classification.


The derived key centres exhibit relations that are congruent with some of the
topological relations described by Krumhansl (1990a). This reflects the nature of
the vector representations derived, and the concomitant geometrical proximity of
the exemplars for keys adjacent on the circle of fifths. Figure 5 shows an image
of the surface of a Kohonen feature map (Kohonen, 1989) developed from a set of
ART2 key exemplars. The key areas are discrete and each is adjacent to its
neighbours in the circle of fifths.


5. Abstracting Pitch from Patterns of Interval Use



The encoding of abstract pitch (degree) is accepted as an important part of the
mechanism that memorizes melodies. The work of Dowling (1984, 1988) indicates
that melodic memory involves the encoding of abstract pitch and the pattern of
intervals between pitches. These two components appear to be used with greater
and lesser accuracy in different situations. The model that is outlined below
investigates the relationship between the two by suggesting that it is possible
to derive representations of abstract pitches as classifications of patterns of
memorized interval use associated with pitches. The model assumes that the
categorical interval between two pitches in the chromatic scale has been
identified. It is a very similar model to that used to derive key centres from
frequency of pitch occurrence. However, it differs in two ways. First, the
statistics of the use of pitch classes are separated, so that the use of
different pitches can be compared. Second, pitch use is specified in terms of the
interval relationships between pitches. If pitch use is described directly in
terms of other pitches its representation is limited by the positional
specification of pitch within the input vector space[7]--and by implication
within actual pitch space. Interval, on the other hand, represents pitch
relations directly.


The model investigates whether different degrees of the scale are associated with
different patterns of intervals, sufficiently distinct to delineate
interval-based categories of pitch use. If they do, the representation will lend
support to the intervallic model of pitch function advocated by Browne (1981) and
Butler and Brown (1984). The model should reflect, for example, that the tonic 1
and the dominant 5 are functionally more similar than the tonic 1 and the leading
tone 7. This kind of property is more interesting than the position or identity
of the degree.


5.1. The Intervals of the Major Scale


In these simulations, intervals are identified in vectors in the same way as
pitches--a vector position being associated with an interval. The sets of
intervals within the diatonic major scale are described in Browne (1981), and are
shown in Table II. The pattern of intervals associated with each degree of the
scale is very similar, and will have limited discriminatory value.


However, as with the frequency of occurrence of pitches, the way in which the
sets of intervals are used is quite different, as can be seen in Figure 6. The
values plotted are calculated by summing the intervals that precede each pitch in
the songs.


The resulting counts of the different intervals associated with each pitch are
then expressed as percentages. The overall pattern of interval use shows a marked
preponderance of intervals between pitches that are near neighbours in the scale,
as might be expected in a set of melodies. In the model the representation of the
intervallic context of pitches is constructed in a set of memories similar to
those used to extract frequency of pitch occurrence, except that twelve memories
are used--one for each pitch.


5.2. An Outline Model of pitch Abstraction


An outline of the model of pitch abstraction is shown in Figure 7. The functions
within the model are bipartite. Firstly, a process of self-organizing, bottom-up
statistical extraction, using ART2, classifies patterns constructed in echoic and
tracking memory models. The memory patterns are consolidated in a set of discrete
pitch-in-key representations guided by the key identified in the process
described in Section 4. These pitch-in-key representations comprise the most
general description of interval use associated with pitch. When they are
classified the result is a set of nodes identified with degrees of the scale.


The second element of the model is concerned with acquiring associative mappings
between pitch, key and degree. The degree identities, such as emerge from the
self-organizing statistical extraction, are used as teaching patterns in a
supervised learning model. Two kinds of association are learned. The first are
the associative mappings often taken to epitomize the abstract system of degrees.
This involves mappings between pitch, key and degree, to allow the recovery--from
a combination of pairs of identities of the third identity. For example, if we
know the identity of pitch and key, e.g. f??? and G major, then we know the
degree--7. Conversely, if we know the key and degree, G major and 7, then we know
the pitch, f???. Similarly, the identification of pitch and degree allows the
extrapolation of key. The second kind of mapping is between the developing
interval memories and degree identities. This mapping allows the identification
of degree directly from developing interval memories, as a song progresses, and
encodes intervallic patterns as tonal descriptors parallel to the identification
of keys.


The self-organizing part of the model was implemented using four similar memory
models, which derive representations of the intervallic context of each pitch
class. The general form of the memory is shown at the bottom of Figures 8 and 10.
It comprises two stages. The first stage traces the occurrence of intervals over
all the pitch classes. This echoic (ECH) or trace memory is of the shunting
(multipli-cative), adding type (Grossberg, 1978), and is similar to that used by
Gjerdingen (1990). The box in Figure 10 shows the trace values that might occur
in such a memory using an eta[sup E] rate of 0.5, applied to the sequence e right
arrow b right arrow c right arrow a. Here one memory type is described. The SAT
memory is a shunting, adding, tracking memory. It is reset to zero at the start
of each song. Where values in the input vector X are either 0 or 1, x[sub j] is
the input value and w[sub j] is the echoic memory value, initialized to zero, and
eta[sup E] is the memory rate:


if x[sub j](t) > 0: w[sub j](t) = (w[sub j] (t - 1) + (eta[sup E]x[sub j](t)))


else : w[sub j](t) = eta[sup E]w[sub j](t - 1)


The ECH memory models used are straightforward and are easily implemented. They
are described in more detail in Griffith (1993b). They produce a trace of events
in which the recency of each interval is reflected by a value between 0 and 1.
The results they produce are very similar.


The second stage is a set of TRK memories, like that used to track the frequency
of occurrence of pitch. One memory is dedicated to each pitch class and receives
input from the initial ECH memory. The tracking memory is identical in all four
models and is the same as that used in the model of key identification described
in Section 4, except that the frequency of occurrence is tracked separately for
the twelve pitch classes. Learning takes place in the TRK memory associated with
a pitch only when that pitch occurs.


The two-stage memory computes two things. Firstly, the ECH memory produces a
vector that reflects the recency of intervals. The actual values reflect the
eta[sup E] rate and the configuration of the adding and shunting elements.
Elements that are more prominent (recent) in the ECH memory have larger values.
The second TRK memories are specific to pitch; the tracking only takes place in a
pitch memory when that pitch occurs. This memory tracks the pattern of intervals
in the ECH memory, emphasizing the intervals that are consistently prominent in a
pitch's context by scaling up the tracking of the more recent intervals relative
to older, lower-valued intervals.


As the TRK memory for a pitch is only activated when that pitch occurs, the
intervals associated with less frequent pitches are tracked less often and the
pattern that is tracked in the TRK memory is sensitive to the idiosyncracies of
pitch use in particular songs. This problem is intrinsic to the pattern of events
being represented in this way and cannot be resolved by setting the eta[sup E]
memory rate of the initial ECH memory to a high value. If it is, the interval
events will stay longer in the ECH memory, but the subsequent TRK memory-tracking
representations become less differentiated. The lower the eta[sup E] rate the
closer the contents of the TRK memory should approximate the frequency of
occurrences of intervals associated with each pitch.


5.3. Topological Maps of Pitch-interval Use


The representations developed within the pitch memories were investigated
initially using a Kohonen feature map (KFM) (Kohonen, 1989). These initial
simulations looked specifically at how differentiated the representations
developed in the two stage memories were. All four types of memory were used. The
KFM used was a two-dimensional, 12 by 12 surface. The local update area was
initialized to 11, and the learning rate, eta[sup K], was 0.3. The set of
experiments covered four memory types, three eta[sup E] rates--0.75, 0.5 and
0.25--and five eta[sup T] rates--0.005, 0.01, 0.05, 0.1 and 0.5--making a set of
60 simulations. The experiment is outlined in Figure 8.


The memories were presented at the end of a song. The pattern of interval use in
a pitch memory at this point reflects the pattern of intervals associated with a
particular pitch in a particular song and key. The degree identity associated
with the pitch and the key, e.g. g in C major being degree 5, is used to tag the
points plotted on the KFM. If the set of songs were homogeneous in their
association of pitches and intervals, then we would expect the KFM to develop
discrete areas associated with specific degrees. However, we have already seen in
Figure 4 that, pitch by pitch, there is a variety of pitch use in the different
songs. In fact, all the maps produced by these KFM simulations were highly
fragmented (Figure 9), with much intermixing of different degrees. The specific
pitch memories represent the variability of pitch use within songs more
accurately, but this makes the patterns more diffuse, and abstraction more
difficult.


How well these pitch memories reflect the variation in pitch use found among the
songs was assessed by comparing the contents of the memories with what was
expected to be in them. The reference figures for intervallic context are those
used to plot Figure 6, for the association of degrees and intervals. The mean
vectors for all the memories of each degree are calculated and compared to these
figures. The correlations for each degree are generally at the 0.001 level--a
minority are significant at the 0.01 level. The range of correlations confirms
that the performance of the memories is influenced by the setting of eta[sup E]
in the initial tracking memory. The lowest eta[sup E] rate of 0.25 gives the
highest correlations over all memory types, which confirms the characteristics of
the memory functions. With a lower eta[sup E] rate, the ECH memory contains a
shorter history and so the TRK memory approximates more closely the frequency of
interval occurrence. The larger the eta[sup E] rate the more the TRK memory is
tracking sets of values that reflect a sequence of intervals, and diverges from
simply counting the associations between a pitch and its immediately prior
intervals.


5.4. Integrating Patterns of Interval over Pitch and Key



The initial KFM simulation confirms that the variations in pitch use in different
songs--represented as patterns of intervals--is too great to allow the emergence
of clear patterns of intervals that reflect the functional contexts of pitches
identifiable with degrees. However, the network is not limited to learning the
pitch representations active during a song, in isolation. The key of each song,
identified in the process outlined in Section 4, is available to guide the
attention of a further process learning pitch representations identified with a
key.


Figure 10 outlines the simulation. The representations developed in the two-stage
memories are tracked using the same slow learning as an ART2 network (Carpenter &
Grossberg, 1987). This learning takes place in a layer of memories in which the
recipient node is determined by the current key and pitch class.[8] The
representations developed within these nodes can be used as the input to a
self-organizing ART2 network for classification.


In these simulations the association of pitch and key is implemented, for
convenience, by the use of indices. Another simulation established that this
mapping can be acquired via a process of self-organizing classification. Vectors
identifying pitch and key were concatenated to form pitch-in-key identities, e.g.
c in C or c in F. By setting the vigilance in an ART2 network high, only
identical pitch-in-key identity vectors are mapped to the same node. Each node
identified in this way can be associated with a second set of memory weights to
be used in learning pitch-in-key representations.


The integration of intervallic patterns over keys was simulated for the same set
of eta[sup T] rates and eta[sup E] rates, as for the KFM experiments. Each
interval was presented to the network and passed via the ECH memory into the TRK
memory identified with the current pitch. Then using the identity of the pitch
and key the contents of the TRK memory was learned by the appropriate
pitch-in-key node. The weights for this node learn in the same way as used within
the ART2 network. The result is that the pitch-in-key nodes develop
representations that lie at the centre of the cluster of memory vectors mapped to
them (Hinton, 1989). When these pitch-in-key representations are classified in a
self-organizing ART2 network, seven clearly differentiated categories of interval
use emerge, each populated with one degree of the scale only. The hierarchical
similarity relations between the degree exemplars developed in this way are shown
in Figure 11. The greatest similarities are between A degrees 1, 5 and 4, with 1
and 5 being more similar to each other than 4 is to either. Degrees 2, 6 occur as
a pair quite close to the core trio, while 3 and 7 are more distant (Griffith,
1993b). This pattern conforms well to musical notions of functional similarities
between degrees of the scale, but does not measure the importance of each degree
in the scale. This is implicit in the relative size of the areas associated with
each degree representation when these are mapped on a KFM (Figure 12).


The learning of the interval representations over pitch-in-key and the clear
degree categories that emerge from classifying them leaves open the question of
how well the pitch-by-pitch or end-of-song memories map to the degree categories.
The fragmentary nature of the KFM maps learned from the end-of-song memories
suggested that it would not be productive to look at the even shorter-term
pitch-by-pitch memories. Consequently, the end-of-song memories for a selection
of three eta[sup E] and eta[sup T] rates over the four memory types were taken
and classified to the seven degree nodes. The results for the test set are
broadly equivalent over the memories and parameter settings. The percentage of
correct attributions was very low, 42-57%. The percentage of attributions to
nodes of degrees a fifth above and below the correct degree identity is 16-23%
and 9-13%, respectively, and 2-8% were not mapped to any of the exemplars at all.
In this set of songs the use of the dominant 5 and subdominant 4 in some songs is
very similar to that of the tonic 1. Overall, it confirms the variation in the
data set and that this variation reflects functional similarity between degrees.
It also implies that the model is incomplete. Another mechanism is required to
enable the identification of degree directly from a developing interval memory.
It is very easy to see shortcomings of this kind as commensurate with human
performance. However, what characterizes human performance in this case is not
clear. The current assessment is based upon a reading of the score by a trained
musician. The significance of this appraisal in terms of the development of
tonality is not clear, nor how to construct a meaningful psychological test of
general human performance to compare with it.


5.5. Mapping Key and Pitch and Interval Memories to Degree



The abstract pitch representation formed within the model has several possible
uses. It can be used as input to further networks--e.g. memorizing a melody or
prototyping patterns of degree use. Similar representations have been used by
Gjerdingen (1990) and Page (1993), and it was for this purpose that the model was
originally conceived. Here, it is also used to acquire the associative mappings
between pitch & key arrow right degree; between key & degree arrow right pitch,
and from pitch & degree arrow right key. These mappings were learned in a
back-propagation network (Rumelhart et al., 1986), but could have been achieved
as easily with an ARTMAP architecture. The network consisted of 24 input units,
12 hidden units and 12 output units for degrees of the scale.[sup 9] The eta[sup
B] rate was 0.1 and the alpha rate used was 0.9.9.


The developing pitch-by-pitch and end-of-song memories were also mapped to
degree, using a back-propagation net.[sup 10] The reason for using a supervised
network to learn the classification of interval memories to degree identity was
to facilitate more accurate recovery of the degree identity from the memory
patterns. The accuracy of this direct attribution in the ART2 network outlined in
Section 5.4 is quite low. Also, the self-organizing process of degree induction
is arguably rather indirect, with various stages and complexities. If the
pitch-by-pitch or end-of-song memories can be mapped directly to a degree
identity, this mapping can be used as a tonal descriptor parallel to the
identification of key arrived at by tracking pitch frequency of occurrence
(Butler & Brown, 1984).


The association between emerging memory patterns and degree identity was
simulated for the extremes and central combinations of eta[sup T] and eta[sup E]
rates over all the memory types. To do this, the developing memories were
preserved either at the end of each song, or pitch-by-pitch, and then used as
input to the back-propagation net. The target was the degree associated with the
current pitch-in-key identity. The simulations were all run with the same
resources and from the same initial conditions. The network consisted of 12 input
and output and 18 hidden units, and used an eta[sup B] of 0.01 and an alpha of
0.9. The simulations were allowed to run for 1000 cycles. This approach was taken
because it was found that over a range of resources and random initial conditions
the network learned upwards of 90% of the patterns reasonably quickly, a further
5-7% were learned by cycle 1000, after which very few more were learned.[sup 11]
The networks learned the degree identity of 85-98% of the end-of-song memories.
However, this percentage dropped to 53-74% generalization when the network was
tested. The higher memory rates in both ECH and TRK memories produces the lowest
figures. The network learns 58-93% of pitch-by-pitch memory mappings to degree.
Again, the higher memory rates produce the poorest results. If they are
discounted the networks learn 82-93% and generalize to 73-80% in the test set.
The network trained on pitch-by-pitch patterns generalized to end-of-song
patterns with between 89-93% and 70-76% success for the training and test sets
respectively. Again, high ECH and TRK rates gave the lower results. At the lower
and medium memory rates the results are consistently reasonable, but obviously
open to improvement. It is possible that the process could take more account of
qualifications of the memory by non-tonal factors such as duration or metrical or
rhythmic factors.


5.6. Conclusions



The simulations described in this paper have modelled the induction of two
aspects of tonal structure. Firstly, the acquisition of tonal centres equivalent
to keys and, secondly, the abstraction of pitch function from patterns of
interval use.


The first part of the model uses a tracking memory to extract a representation of
the frequency of occurrence of pitches in nursery-rhyme melodies. The frequency
of occurrence patterns is classified in an ART2 network and the result is a set
of exemplars equivalent to keys. Subsequent testing showed that the exemplars are
robust. The classification of patterns of frequency of pitch occurrence models
one of a number of processes contributing towards tonal structure.


One of these processes is the emergence of the abstract pitch identities that
seem to be used in the memorization of melodies. The model outlines a process of
developmental boot-strapping. The diatonic identities that emerge from tracking
pitch frequency of occurrence are used as an attentional mechanism to define the
reference of a process inducing patterns of interval associated with pitches. An
initial investigation using four types of composite trace and tracking memories
mapped interval memories on to Kohonen feature maps. This showed a considerable
diversity of patterns of interval across songs, and confirmed the need for an
attentional mechanism to guide learning. The model shows how more stable patterns
emerge when this attentional focus is used to integrate memories associated with
pitches in keys. These stable patterns differentiate patterns of interval use
very well, and are classified into discrete exemplars identified with the seven
degrees of the major diatonic scale, across a range of memory types and tracking
rates. Subsequently, pitch, key and degree identifies were used to learn
three-way map-pings that provide the basis for encoding, transposition and
retrieval of songs in any key. Also, the developing patterns of interval use were
mapped to degree. The model outlines how a process of statistical extraction,
implemented in an ANN mechanism (ART2) allows the construction, from pitch
sequences, of categories that reflect important aspects of tonal structure. The
model assumes categorical pitch classes and intervals have already been
identified. It may be that some of the processes are better modelled in terms of
more distributed representations of pitch and interval. This is a moot point, as
is the general question of the relations between levels of representations, e.g.
psychoacoustic and categorical. What the model has explored is whether the
statistics of pitch and interval use over time are a good basis for identifying
keys and degrees. The indication is that these statistics do allow the
construction of stable representations of pitch use.


The model combines both unsupervised and supervised paradigms as elements of its
overall function. The intention has been to use these different ANN paradigms in
functionally appropriate ways. The model outlines an effective procedure that
uses its own derived information to guide subsequent processes. Further, work
will investigate improving the generalization of the pitch-by-pitch
memory-to-degree mappings; using the degree mappings in conjunction with key
identification; using the representations in a model of transposition between
keys; using metrical and phrase-boundary information, as well as improving the
functional involvement of pitch-duration information. The model also needs to be
evaluated over a much wider set of songs using other types of scales.


It is clear from much of the experimental work over the last 30 years that
tonality is a complex mechanism. It seems unlikely that it can be resolved to a
single representational process. Rather it is likely that functions attending to
different aspects and descriptors are used and their various implications
resolved. In some situations, some indicators will predominate, in others where
perhaps some information is degraded, other descriptors may be relied upon. The
model described here has focused on how patterns of pitch and interval can be
used to induce exemplars of tonal centres and abstract pitch. It is a first
approximation of an inductive mechanism capable of learning the functionality of
key and degree.


Acknowledgements


My thanks to the editor of Connection Science, Noel Sharkey, for organizing the
reviews for this paper and to Peter Todd and the two reviewers for their comments.


Notes



1. The research presented in this paper is part of the author's doctoral
research, supervised by Noel Sharkey and Henry Shaffer and funded by a SERC
studentship.


2. This synthesis is discussed more fully in Griffith (1993b).


3. The implementation of ART2 used here was adapted from code originally written
by Paulo Guadiano at Boston University. This code implemented the net illustrated
in Figure 10 of Carpenter and Grossberg (1987).


4. Because a variety of memories and learning mechanisms are described in this
paper the eta rate used in each is superscripted with an identifier to avoid
confusion, e.g. eta[sup T] is the eta for the TRK memory, eta[sup K] is the eta
for the Kohonen feature map, etc.


5. As classification is unsupervised the idea of a correct mapping is not the
same as for a supervised network. However, any node has an identifiable
majority--for example if 91.5% of the instances mapped to a node n are identified
with C-major, this node is taken to be the home node for this key.


6. The network has not identified degrees of the scale at this point. The use of
degree names is a descriptive convenience, and also recognizes that all the songs
are presented in all keys.


7. In the pitch vector the equivalence of pitches in different keys can only be
recovered by rotating the vectors until the patterns coincide.


8. This layer consists of 84 nodes. If the tunes were not diatonically limited
the number would be 144.


9. The simulation used 12 outputs to ensure compatibility with future simulations
in which the data set may not be diatonically limited.


10. Again this could have been achieved by an ARTMAP network.


11. The criterion for determining that a pattern had been learned was the
required output becoming the maximum value in the output vector--rather than
being within a tolerated distance of the output.


Table I
The typicality and attribution of four songs from the training and test
set, using the FREQ representation at an eta[sup T] rate of 0.05


Set Song Typicality Activation Attribution (%)


Train Bye Baby Bunting 0.6180 0.7850 2.75-22.2 Test Little Bo
Peep 0.6684 0.9107 100.0 Train The Man With The Gun 0.8908
0.9469 50.98-74.51 Test Ding Dong Bell 0.8452 0.9691 100.0


Table II
Intervals between the degrees of the major diatonic scale


Degree 1 2 3 4 5 6 7


1 U M2 M3 P4 P5 M6 M7 2 m7 U M2 m3
P4 P5 M6 3 m6 m7 U m2 m3 P4 P5 4 P5
M6 M7 U M2 M3 T 5 P4 P5 M6 m7 U M2 M3 6
m3 P4 P5 m6 m7 U M2 7 m2 m3 P4 Tm
6m 7 U




DIAGRAM: Figure 1. The prominence of pitches in the scales of C and G major. The
figures were derived using the probe-tone method (Krumhansl, 1990a).


CHART: Figure 2. A model of the memorization and classification of pitch use in a
simple memory model and competitive-learning classifier.


CHART: Figure 3. Simple memory function which tracks the frequency of occurrence
of pitches in a sequence.


DIAGRAM: Figure 4. The total activations for Bye Bye Baby Bunting (atypical),
Little Bo Peep (atypical), The Man With The Gun, (typical) and Ding Dong Bell
(typical), FREQ simulation using an eta[sup T] of 0.05, plotted over time.


ILLUSTRATION: Figure 5. Distribution of key areas over a KFM using the exemplar
weights developed in FREQ simulation using an eta[sup T] of 0.05. The KFM was a
10 x 10 surface, started with an initial update area of area 7 x 7 and a learning
rate eta[sup K] of 0.3.


DIAGRAM: Figure 6. The percentage of intervals associated with the degrees of the
scale in the training set.


CHART: Figure 7. Outline of the model of pitch abstraction showing the
relationship between processes classifying pitch use into keys and interval use
into degrees.


CHART: Figure 8. The mapping of composite memory functions tracking the frequency
of occurrence of interval against pitches in a song on to a KFM.


ILLUSTRATION: Figure 9. KFM of interval representation of pitch. Simulation AST
eta[sup E] 0.25 and eta[sup T] 0.005.


CHART: Figure 10. Outline of a model showing the classification of abstract pitch
from patterns of interval use.


GRAPH: Figure 11. Hierarchical cluster analysis of the abstract pitch exemplars
developed within an ART 2 network classifying AST memories of interval use
extracted over pitches and keys, eta[sup E] 0.25 and eta[sup T] 0.005.


ILLUSTRATION: Figure 12. Mappings of interval representation of abstract pitch.
Simulation illustrated is AST using eta[sup E] 0.25 and eta[sup T] 0.005.


References


Anon (1979) The Nursery Rhyme Book. London: Amsco Music Publishing.


Balzano, G. (1980) The group-theoretic description of 12-fold and microtonal
pitch systems. Computer Music Journal 4, 66-84.


Bharucha, J. (1987) Music cognition and perceptual facilitation: a connectionist
framework. Music Perception, 5, 1-30.


Bharucha, J. (1991) Pitch, harmony and neural nets: a psychological perspective.
In P.M. Todd & D.C. Loy (Eds), Music and Connectionism, Cambridge. MA: MIT
Press/Bradford Books.


Bregman, A. (1990) Auditory Scene Analysis. Cambridge, MA: MIT Press.


Brown, H. (1988) The interplay of set content and temporal context in a
functional theory of tonality perception. Music Perception, 5, 219-250.


Browne, R. (1981) Tonal implications of the diatonic set. In Theory Only, 5,
3-21.


Butler, D. (1989) Describing the perception of tonality in music: A critique of
the tonal hierarchy theory and a proposal for a theory of intervallic rivalry.
Music Perception, 6, 219-242.


Buffer, D. & Brown, H. (1984) Tonal structure versus function: studies of the
recognition of harmonic motion. Music Perception, 2, 5-24.


Carpenter, G. & Grossberg, S. (1987) ART2: Self-organization of stable category
recognition codes for analog input patterns. Applied Optics, 26, 4919-4930.


Chesterman, L. (1935) Music for the Nursery School. London: Harrap.


Deutsch, D. (1975) Facilitation by repetition in recognition memory for tonal
pitch. Memory and Cognition, 3, 263-266.


Deutsch, D. (1978) Delayed pitch comparisons and the principle of proximity.
Perception and Psychophysics, 23, 227-230.


Dowling, W. (1984) Assimilation and tonal structure: comment on Castellano,
Bharucha, and Krumhansl. Journal of Experimental Psychology, 113, 417-420.


Dowling, W. (1988) Tonal structure and children's early learning of music. In J.
Sloboda (Ed.), Generative Processes in Music. Oxford: Oxford University Press.


Frances, R. (1988) La Perception de la Musique. Hillsdale, NJ: Lawrence Erlbaum
Associates. Originally Published 1954. Libraire Philosophique J. Vrin, Paris
Translated by J.W. Dowling.


Gjerdingen, R. (1990) Categorisation of musical patterns by self-organizing
neuronlike networks. Music Perception, 7, 339-370.


Griffith, N. (1993a) Modelling the Acquisition and Representation of Musical
Tonality as a Function Of Pitch-use through Self-Organising Artificial Neural
Networks. Department of Computer Science, University of Exeter. Unpublished PhD
thesis.


Griffith, N. (1993b) Representing the tonality of musical sequences using neural
nets. Proceedings of the First International Conference on Cognitive Musicology,
pp. 109-132, Jyvaskyla, Finland.


Grossberg, S. (1978) Behavioral contrast in short term memory: serial binary
memory models or parallel continuous memory models, Journal of Mathematical
Psychology, 17, 199-219.


Handel, S. (1973) Temporal segmentation of repeating auditory patterns. Journal
of Experimental Psychology, 101, 46-54.


Hinton, G.E. (1989) Connectionist learning procedures. Artificial Intelligence,
40, 185-234.


Holtzmann, S.R. (1977) A program for key determination. Interface, 6, 29-56.


Huron, D. & Parncutt, R. An improved key-tracking method encorporating pitch
salience and echoing memory. Psychomusicology (in press).


Kohonen, T. (1989)Self-organization and Associative Memory. Berlin: Springer
Verlag.


Krumhansl, C. (1990a) Cognitive Foundations of Musical Pitch. Oxford: Oxford
University Press.


Krumhansl, C. (1990b) Tonal hierarchies and rare intervals in music cognition.
Music Perception, 7, 309-324.


Lakoff, G. (1987) Women, Fire and Dangerous Things: What Categories Reveal about
the Mind. Chicago: University of Chicago Press.


Leman, M. (1990) The Ontogenesis of Tonal Semantics: Results of a Computer Study.
Reports from the Seminar of Musicology SM-IPEM 18, Institute of Psychoacoustics
and Electronic Music, University of Ghent.


Leman, M. (1992) The theory of tone semantics: concept, foundation, and
application. Minds and Machines, 2, 345-363.


Longuet-Higgins, H. & Steedman, M. (1970) On interpreting Bach. Machine
Intelligence, 6, 221-239. May, E., (Ed.) (1980) Musics of Many Cultures: An
Introduction. Los Angeles, CA: University of California Press.


Mitchel & Blyton (1968) The Faber Book of Nursery Songs. London: Faber.


Narmour, E. (1984) Toward an analytical symbology: The melodic, harmonic and
durational functions of implication and realization. In M. Baroni & L. Callegari
(Eds), Musical Grammars and Computer Analysis. Florence: Olschki.


Page, M.P.A. (1993) Modelling Aspects of Music Perception using Self-Organizing
Neural Networks. PhD thesis, University of Wales College of Cardiff.


Parncutt, R. (1988) Revision of Terhardt's psychoacoustical model of the root(s)
of a musical chord. Music Perception, 6, 65-93.


Patterson, R. (1986) Spiral detection of periodicity and the spiral form of
musical scales. Psychology of Music, 14, 44-61.


Peretz, I. & Kolinsky, R. (1993) Boundaries of separability between melody and
rhythm in music discrimination: a neuropsychological perspective. The Quarterly
Journal of Experimental Psychology, 46A, 301-325.


Rumelhart, D., Hinton, G. & Williams, R. (1986) Learning internal representations
by error propagation. In D. Rumelhart & J. McClelland reds), Parallel Distributed
Processing: Explorations in the Microstructure of Cognition, Vol. 1: Foundations.
Cambridge, MA: MIT Press.


Scarborough, D., Miller, O. & Jones, J. (1989) Connectionist models for tonal
analysis. Computer Music Journal, 13, 49-55.


Shaffer, L., Clarke, E. & Todd, N. (1985) Meter and rhythm in piano playing.
Cognition, 20, 61-77.


Simon, H.A. (1968) Perception du pattern musical par auditeur. Science de l'Art,
V, 28-34.


Storr, A. (1992) Music and the Mind. Glasgow: Harper Collins.


Terhardt, E. (1974) Pitch, consonance, and harmony. The Journal of the Acoustical
Society of America, 55, 1061-1069.


Terhardt, E. (1984) The concept of musical consonance: A link between music and
psychoacoustics. Music Perception, 1, 276-295.


Terhardt, E., Stoll, G. & Seewann, M. (1982) Algorithm for extraction of pitch
and pitch salience from complex tonal signals. The Journal of the Acoustical
Society of America, 71, 679-688.


Todd, P. (1989) A connectionist approach to algorithmic composition. Computer
Music Journal, 13, 27-43.


Ulrich, W. (1977) The analysis and synthesis of jazz by computer. Proceedings of
the 5 th IJCAI, pp. 865-872.


Winograd, T. (1968) Linguistics and the computer analysis of tonal harmony.
Journal of Music Theory, 12, 2-49.


~

By NIALL GRIFFITH N. Griffith, Department of Computer Science, University of
Exeter, Prince of Wales Road, Exeter EX4 4PT, UK. E-mail: ngruk.ac.exeter.dcs.



Presented from:Griffith, Niall, Development of tonal centres and abstract pitch
as categorizations of pitch use.., Vol. 6, Connection Science, 01-01-1994, pp
155.



Psychoacoustics - The Magic of Tone and the Art of Music - Document One


Return to the GS WorldView Index 'Parent Directory'