![]() |
![]() |
|
MusicXML in Practice: Issues in Translation and AnalysisRecordare LLC Originally published in Proceedings First International Conference MAX 2002: Musical Application Using XML (Milan, September 19-20, 2002), pp. 47-54. Copyright © 2002 Michael Good. All rights reserved. ABSTRACTSince its introduction in 2000, MusicXML has become the most quickly adopted symbolic music interchange format since MIDI, with support by market and technology leaders in both music notation and music scanning. This paper introduces the key design concepts behind MusicXML, discusses some of the translation issues that have emerged in current commercial applications, and introduces the use of MusicXML together with XML Query for music analysis and information retrieval applications. KeywordsMusicXML, music notation, music information retrieval, music analysis, XQuery, Finale, Dolet. 1 INTRODUCTION TO MUSICXMLMany people have recognized the possible benefits of using XML for music representation [1, 2, 7]. Similarly, many efforts have been made over the years to come up with a higher-level interchange format for music notation than is provided by Standard MIDI Files [12, 16]. MusicXML combines these two trends, using XML technology to develop a new interchange standard for music notation. MusicXML has been developed from a commercial perspective as opposed to a research perspective. Recordare's business model is founded on the commercial opportunities enabled by a standardized, Internet-friendly format for symbolic musical data. We needed to quickly demonstrate the viability of an interchange standard in the context of commercial musical applications. Our major technical risk was that we would develop yet another interchange format that could not or would not be adopted by the market leaders in music application software. MusicXML Design TechniquesPrevious efforts at music interchange standards have tended to fail in one of two ways. In the case of SMDL [14], the design goal was overly ambitious, leading to an overly complex language with very few implementations. In the case of NIFF [6], the design goal was insufficiently general. The graphics-oriented focus of the language made it adequate for scanners and notation programs with graphics-oriented formats. Sequencers, databases, and notation programs with more underlying awareness of musical semantics were ill-served by the graphical focus. A critical mass of application support never developed. From our industrial perspective, we looked to MIDI and HTML as exemplars for developing a new music interchange standard. MIDI and HTML are both powerful enough to solve a good variety of industrial-strength problems. On the other hand, they are simple enough to learn and implement so that people can learn the basics easily, smoothly adding additional features over time. XML 1.0 shares these dual characteristics of power and simplicity. XML’s widespread adoption throughout the information technology industry allows music software to make use of the technology investments made by much larger industries. Several techniques were used to develop a usable, useful, powerful interchange standard:
MusicXML Application SupportMusicXML’s success is apparent from its quick adoption compared to any symbolic music interchange format since MIDI. Figure 1 shows the software products and projects supporting MusicXML as of July 2002 [11].
Finale 2003, SharpEye Music Reader, and TaBazar are all shipping with MusicXML support on Windows. Recordare’s Dolet software supports MuseData and Finale 2000 to 2003. Finale 2003 can import SCORE files, allowing a two-step conversion into MusicXML. MusicXML files imported into Finale can be printed to the FreeHand MusicPad Pro electronic music stand, providing a new electronic format for MusicXML scores. Project XEMO has demonstrated a Java-based MusicXML Notation Viewer, currently in alpha test, running on Windows, Macintosh OS X, Linux, and Solaris systems. Middle C Software has announced support for MusicXML in their future scanning software products. NoteHeads has announced plans to import MusicXML files in version 1.7 of their Igor Engraver notation product. The KGuitar open-source guitarist environment for Linux, FreeBSD, and Solaris systems added MusicXML support in version 0.4.1. The NIFF and MIDI converters have been developed as internal prototypes at Recordare. MusicXML’s successful adoption contrasts not only with past interchange languages, but also with other XML formats for symbolic music representation proposed in the past several years. Most of these formats cannot represent the full range of music possible in MusicXML. None has any commercial product support as of July 2002, much less support from music software industry leaders. 2 MUSICXML DESIGN ISSUESMusicXML follows MuseData and other formats in separating underlying musical
representation from the specifics of a particular engraving or music
performance. As with MuseData, the three domains are combined within a single
format. The “logical domain” of music is found in MusicXML’s elements, while
details of the visual and performance domain are found in MusicXML’s attributes.
There are also dedicated elements for The integration of the three domains into a single format speaks to the need to cover an adequate range of music applications in a single notation format. The distinction between elements and attributes facilitates the segmentation of domains both for learning MusicXML and building applications. Distinctions between domains tend to be cleaner in theory than in practice. Given MusicXML’s commercial focus, it made sense not to be overly rigorous about these theoretical distinctions. To introduce how MusicXML represents musical scores, here is the musical equivalent of C's "hello, world" program for MusicXML. Here we will create about the simplest music file we can make: one instrument, one measure, and one note, a whole note on middle C:
Here is the musical score represented in MusicXML: <?xml version="1.0" standalone="no"?>
<!DOCTYPE score-partwise PUBLIC
"-//Recordare//DTD MusicXML 0.6b Partwise//EN"
"http://www.musicxml.org/dtds/partwise.dtd">
<score-partwise>
<part-list>
<score-part id="P1">
<part-name>Music</part-name>
</score-part>
</part-list>
<part id="P1">
<measure number="1">
<attributes>
<divisions>1</divisions>
<key>
<fifths>0</fifths>
</key>
<time>
<beats>4</beats>
<beat-type>4</beat-type>
</time>
<clef>
<sign>G</sign>
<line>2</line>
</clef>
</attributes>
<note>
<pitch>
<step>C</step>
<octave>4</octave>
</pitch>
<duration>4</duration>
<type>whole</type>
</note>
</measure>
</part>
</score-partwise>
For scores of this simplicity, MusicXML’s design roots are clearly apparent. This is basically an XML version of the MuseData representation. Several of MusicXML design elements, including the interchangeability between partwise and timewise formats, have been described previously [5]. Here we will focus on some additional design aspects that have proven to be important for music translation, and that look to be important for future work in musical analysis. One key design choice is that each aspect of music semantics is represented in a different element. This provides the greatest flexibility for diverse music applications, especially once music information retrieval is included in the application mix. Our example analysis programs below will demonstrate some of the benefits of this design choice. Another key design element carried over from MuseData is the importance of
separately representing what is heard vs. what is notated [13]. Take the issue of note duration. MusicXML follows MIDI and MuseData by putting the denominator of music duration, the number of divisions
per quarter note, in a separate, usually unchanging
This type of dual representation of sound and graphics, so crucial to support diverse industrial applications, contrasts with the graphical representations used in NIFF and the WEDELMUSIC XML format [1]. NIFF is a binary format, but if we translate its binary elements directly into an XML document, our middle C whole note would look something like this: <Notehead Code="note" Shape="2" StaffStep="-2"> <Duration Numerator="1" Denominator="1"/> </Notehead> The The very indirect nature of pitch representation makes NIFF and other graphical formats unusable for most performance and analysis applications. It even makes for problems in its intended use in the visual transfer between scanning and notation applications. The NIFF importer included in Sibelius 2.11 has bugs that are directly correlated to missing one or more of the multitude of steps needed to accurately determine musical pitch from NIFF data. Graphical formats have a long history in music representation, and are appropriate as internal formats for many applications, but have severe problems when used as the foundation of a general music interchange format. 3 MUSICXML TRANSLATION ISSUESWhen we first started building programs to move between MusicXML and other music formats, we called them converters. Conversion implies the centrality of the change from one format to another. We have since realized that a more productive metaphor might be that of translation; the interpretation of one form of human expression into another [15]. Our first software translation products are named after the 16th-century French translator Etienne Dolet. At Recordare we have produced four MusicXML translators to date: two-way translators for Finale and MuseData, and one-way translators from NIFF and to Standard MIDI Files. Each translation brought up different issues of interpretation that need to be successfully addressed to make an effective interchange format. MuseData TranslationOur MuseData translator was built together with the initial design of MusicXML. The first version of MusicXML was primarily an adaptation of the MuseData format into XML form, with the addition of a timewise format to simulate Humdrum’s two-dimensional lattice structure within a hierarchical language. Our design decision was to make MusicXML a superset of MuseData, so that we could do a 100% conversion (as we thought of it then) from MuseData into MusicXML and then back again. We adopted features in entirety even when we were unclear of their utility for a general-purpose translation language. Since the initial design, we have removed some of the documented features that are not used in practice, and included some undocumented features that are indeed used in MuseData files available from CCARH. Given that MusicXML covers a superset of MuseData features, we encountered no major translation difficulties with this first piece of software. This gave us confidence that the XML language was indeed strong enough to serve as a basis for music representation without hidden problems that would only be revealed through implementation experience. The translation issues have emerged later, with more programs translating back and forth to MusicXML. These programs may make use of features that are present in MusicXML but not in MuseData, or may be used to translate classical repertoire from later eras than MuseData’s design focus. For instance, translating pieces by Chopin, Mahler, and others with multiple large tuplets causes problems when the number of divisions needed for precise durations leads to notes with durations that cannot fit into a 3-character MuseData field. NIFF and MIDI TranslationsWe next added translators for two binary formats: from NIFF to MusicXML and from MusicXML to Standard MIDI Files. By testing translation with NIFF’s highly graphical format and MIDI’s performance-only format, we could determine whether MusicXML really did have the scope to handle a variety of music formats far afield from our MuseData and Humdrum starting points. Building the NIFF translator is what gave us our more detailed understanding of the problems with highly graphical formats for music interchange. While our prototype works fine for testing purposes, it was clear that building an industrial-strength NIFF translator, while possible, would take a great deal of time and effort. We decided instead to persuade the author of the most commercially important application writing NIFF files (SharpEye Music Reader) to write MusicXML files as well. SharpEye was the first product to support the MusicXML format, and their implementation experience demonstrated that MusicXML could indeed be implemented successfully by third party developers. Given the existence of MuseData to MIDI translators, we were not surprised when the MusicXML to MIDI translator posed no major challenges. An interesting aspect of both these translators is our use of XML as an intermediate format for both the NIFF and MIDI files. This creates an easier-to-program structure for these binary formats. Finale TranslationTranslating to and from Finale posed the largest challenges for MusicXML to date. As a fully-featured industrial application, it poses the expected challenges of dealing with a program whose feature set exceeds that of the interchange format. In many cases we added features that were necessary to support effective import from SharpEye to Finale (for instance, system and page breaks), but others are still unsupported. The more interesting issues come from the differences in structure between
Finale and MusicXML files. In many of the fundamentals, there is a great deal of
similarity. Finale’s underlying frame structure (a single measure on a single
staff) has up to four layers, and each layer can have one or two voices. The
layers and voices are similar to MusicXML’s When we get to articulations and expressions, things work very differently in Finale and MusicXML. Finale is designed to be open-ended and extensible, so there are few of the bult-in abstractions present in MusicXML. These abstractions must instead be inferred from the definition of a musical symbol in the Finale database. As an example, what Finale structure should translate to a
In practice we have found that the font definition works more reliably. Finale is a notation program, and people generally pay much more attention to appearance than playback within Finale files. But using this definition limits your translation to fonts that you have seen before, so that you know what musical glyphs are associated with each code in a music font. Currently, the major barrier to even better Finale/MusicXML translations is the incomplete documentation for the Finale format provided in the current Finale 2000 plug-in developer’s kit. This is akin to a translator working just by the context of word usage, with only an incomplete set of dictionaries for reference. Coda has suggested that this documentation may be improved in a future version of the developer’s kit. 4 MUSICXML AND MUSICAL ANALYSISMusic analysis and retrieval using large datasets of symbolic musical data has been hampered by the lack of an adequate, standardized format for symbolic music representation supported by commercial software tools. This gap makes it difficult to acquire and reuse either musical data or musical tools. The tools that are developed for music analysis research do not have the technical underpinnings to scale up to large-scale commercial usage of music information retrieval. The need to use databases to build collections of symbolic music information is well understood [7], but the technology has been lacking. Building scalable database systems is a costly undertaking. It makes more sense for music applications to leverage the investment of other, better-funded application areas such as electronic commerce, as long as that technology is adequate—not necessarily ideal—for the needs of musical applications XML has the potential to finally break through the database barrier through the efforts of the World Wide Web Consortium’s XML Query working group. The group’s mission is “to provide flexible query facilities to extract data from real and virtual documents on the Web, therefore finally providing the needed interaction between the web world and the database world. Ultimately, collections of XML files will be accessed like databases.” [18] The current focus of the XML Query Working Group is the XQuery 1.0 language. Though this language is still a work-in-progress, available only in working draft form, there are already a dozen prototype implementations available for evaluation. These come both from major relational database vendors like Oracle and Microsoft as well as native XML database vendors like Software AG. The combination of an XML language for music and an XML query language is not sufficient by itself to break through the database barrier for music information retrieval. The two languages must be able to work together to solve musical problems. Early XQuery working drafts had significant problems in this area, lacking powerful facilities to deal with queries that combine aspects of sequence and hierarchy. These shortcomings have been addressed in the XQuery 1.0 working draft of April 30, 2002, and we have now been able to build our first interesting musical queries using XQuery and MusicXML. Given XQuery’s importance and scope, it is likely to be some time yet before the language definition is completed, issued as a W3C recommendation, and commercial tools made available for effective development of XQuery applications. Fortunately, for research purposes, many analysis applications can be developed effectively today with existing tools: the XML Document Object Model (DOM) [17] and the XML Path Language 1.0 (XPath) [3]. Musical analysis is not just applicable in musicological research; it can also be useful in music publishing. For instance, as Recordare publishes its editions of classical art songs, it is helpful to show the range of each song. This process can be automated by a musical analysis program working on the MusicXML data. Figure 3 shows a screen shot from a program that generates a distribution graph of the pitch range for any particular part in a piece of music. Here we are computing the range for the voice part of the last song in Schumann’s Frauenliebe und Leben, Op. 42. Figure 3: Pitch Range Distribution Analysis Program Figure 4 shows the synopsis produced by clicking on the “Report” button. It focuses on the low and high notes. Figure 4: Pitch Range Synopsis Report The program that generates this synopsis report is easy to write in MusicXML. For comparison, we will show two implementations. The first uses the DOM, programmed in Visual Basic 6.0 with Microsoft’s MSXML3 parser. An equivalent program can be built using XQuery. Our example uses the QuiP 2.1.1 prototype program from Software AG, which is based on the April 30 working draft of XQuery 1.0. QuiP and XQuery are both works in progress, so the syntax of a working program is likely to change by the time XQuery becomes a formal recommendation from the World Wide Web Consortium. DOM ApproachThe DOM approach is implemented within a function that takes a MusicXML
document and MusicXML part ID as input, and returns the dialog box string as
output. After the initial variable declaration and initialization, the variable
The program then loops through each pitch, calling the After all the pitches are searched, the program returns a string composed from the saved values for the lowest and highest MIDI pitches, along with their musical spellings and the measure where they were first encountered. Function FindRange _
(ThisXML As DOMDocument30, _
ByVal PartID As String)
Dim oRoot As IXMLDOMElement ' Root of XML document
Dim oNodes As IXMLDOMNodeList ' Pitches to analyze
Dim oElement As IXMLDOMElement ' Current pitch
Dim oMeasure As IXMLDOMElement ' Parent measure
Dim lPitch As Long ' Current pitch
Dim lMinPitch As Long ' Lowest MIDI pitch
Dim sMinPitch As String ' Spelling of low pitch
Dim lMaxPitch As Long ' Highest MIDI pitch
Dim sMaxPitch As String ' Spelling of high pitch
Dim sMinMeasure As String ' Measure for low pitch
Dim sMaxMeasure As String ' Measure for high pitch
lMinPitch = 128
lMaxPitch = -1
Set oRoot = moXML.documentElement
Set oNodes = _
oRoot.selectNodes( _
"//part[@id='" & PartID & "']//pitch")
' Search each pitch for the lowest and highest
' values, saving the spelling and measure number.
Do
Set oElement = oNodes.nextNode
If oElement Is Nothing Then Exit Do
lPitch = MIDINote(oElement)
If lPitch < lMinPitch Then
lMinPitch = lPitch
sMinPitch = SpellNote(oElement)
Set oMeasure = _
oElement.selectSingleNode _
("ancestor::measure")
sMinMeasure = _
oMeasure.getAttribute("number")
End If
If lPitch > lMaxPitch Then
lMaxPitch = lPitch
sMaxPitch = SpellNote(oElement)
Set oMeasure = _
oElement.selectSingleNode _
("ancestor::measure")
sMaxMeasure = _
oMeasure.getAttribute("number")
End If
Loop
FindRange = "Lowest note is " & sMinPitch & _
" (MIDI " & lMinPitch & _
") in measure " & sMinMeasure & vbCrLf & _
"Highest note is " & sMaxPitch & _
" (MIDI " & lMaxPitch & _
") in measure " & sMaxMeasure
End Function
The ' Return MIDI note value from a MusicXML pitch
' element, ignoring microtones.
Function MIDINote _
(ThisPitch As IXMLDOMElement) As Long
Dim oElement As MSXML2.IXMLDOMElement
Dim lTemp As Long ' Temporary pitch
' Get octave
Set oElement = _
ThisPitch.selectSingleNode("octave")
lTemp = 12 * (CLng(oElement.Text) + 1)
' Get pitch step
Set oElement = _
ThisPitch.selectSingleNode("step")
Select Case oElement.Text
Case "a", "A": lTemp = lTemp + 9
Case "b", "B": lTemp = lTemp + 11
Case "c", "C": lTemp = lTemp + 0
Case "d", "D": lTemp = lTemp + 2
Case "e", "E": lTemp = lTemp + 4
Case "f", "F": lTemp = lTemp + 5
Case "g", "G": lTemp = lTemp + 7
End Select
' Get alteration if any
Set oElement = _
ThisPitch.selectSingleNode("alter")
If Not oElement Is Nothing Then
lTemp = lTemp + CLng(oElement.Text)
End If
' Assign and exit
MIDINote = lTemp
End Function
The ' Spell the pitch as a string, e.g. "C#4"
Function SpellNote _
(ThisPitch As IXMLDOMElement) As String
Dim oElement As IXMLDOMElement
Dim sSpell As String ' Temporary string
Dim sAlter As String ' Alteration string
' Get pitch step
Set oElement = _
ThisPitch.selectSingleNode("step")
sSpell = UCase$(oElement.Text)
' Get alteration if any
Set oElement = _
ThisPitch.selectSingleNode("alter")
If Not oElement Is Nothing Then
Select Case CLng(oElement.Text)
Case -2: sAlter = "bb"
Case -1: sAlter = "b"
Case 0: sAlter = vbNullString
Case 1: sAlter = "#"
Case 2: sAlter = "##"
Case Else
sAlter = "(" & oElement.Text & ")"
End Select
sSpell = sSpell & sAlter
End If
' Get octave
Set oElement = _
ThisPitch.selectSingleNode("octave")
sSpell = sSpell & oElement.Text
' Assign and exit
SpellNote = sSpell
End Function
XQuery ApproachOur XQuery implementation follows a similar approach to the DOM
implementation. Since QuiP is a standalone prototype tool for learning XQuery,
we have hardcoded the file name and part ID that were parameterized in the DOM
example. This example takes a very simple approach to the query, reviewing all
the pitches twice in order to locate the minimum and maximum values. Once we
have these values, we then find the pitch elements whose MIDI note values match
the high and low values. XQuery results are returned in XML format, so we do not
need a define function MIDINote(element $thispitch) returns integer
{
let $step := $thispitch/step
let $alter :=
if (empty($thispitch/alter)) then 0
else if (string($thispitch/alter) =
"1") then 1
else if (string($thispitch/alter) =
"-1") then -1
else 0
let $octave :=
integer(string($thispitch/octave))
let $pitchstep :=
if (string($step) = "C") then 0
else if (string($step) = "D") then 2
else if (string($step) = "E") then 4
else if (string($step) = "F") then 5
else if (string($step) = "G") then 7
else if (string($step) = "A") then 9
else if (string($step) = "B") then 11
else 0
return 12 * ($octave + 1) + $pitchstep + $alter
}
let $doc := document("MusicXML/Frauenliebe8.xml")
let $part := $doc//part[./@id = "P1"]
let $highnote :=
max(for $pitch in $part//pitch
return MIDINote($pitch))
let $lownote :=
min(for $pitch in $part//pitch
return MIDINote($pitch))
let $highpitch :=
$part//pitch[MIDINote(.) = $highnote]
let $lowpitch :=
$part//pitch[MIDINote(.) = $lownote]
let $highmeas :=
string($highpitch[1]/../../@number)
let $lowmeas :=
string($lowpitch[1]/../../@number)
return
<result>
<low-note>{$lowpitch[1]}
<measure>{$lowmeas}</measure>
</low-note>
<high-note>{$highpitch[1]}
<measure>{$highmeas}</measure>
</high-note>
</result>
This query returns the following result in XML: <?xml version="1.0"?>
<result>
<low-note>
<pitch>
<step>C</step>
<alter>1</alter>
<octave>4</octave>
</pitch>
<measure>16</measure>
</low-note>
<high-note>
<pitch>
<step>D</step>
<octave>5</octave>
</pitch>
<measure>12</measure>
</high-note>
</result>
Melody retrieval provides a more typical XQuery example, using a FLWR
(for-let-where-return) expression. Here we are looking for the instances of the
Frere Jacques theme in the key of C. We simply this query to look just
for the pitch step sequence of C, D, E, C. This query also assumes a partwise
MusicXML file. It will match instances of the pitch sequence that cross
<result>
{let $doc :=
document("MusicXML/frere-jacques.xml")
let $notes := $doc//note
for $note1 in
$notes[string(./pitch/step) = "C"],
$note2 in $notes[. follows $note1][1],
$note3 in $notes[. follows $note2][1],
$note4 in $notes[. follows $note3][1]
let $meas1 := $note1/..
let $part1 := $meas1/..
let $part2 := $note2/../..
let $part3 := $note3/../..
let $part4 := $note4/../..
where string($note2/pitch/step) = "D"
and string($note3/pitch/step) = "E"
and string($note4/pitch/step) = "C"
and (string($part1/@id) =
string($part2/@id))
and (string($part2/@id) =
string($part3/@id))
and (string($part3/@id) =
string($part4/@id))
return
<motif>
{$note1/pitch} {$note2/pitch}
{$note3/pitch} {$note4/pitch}
<measure>{$meas1/@number}</measure>
<part>{$part1/@id}</part>
</motif>
}
</result>
When run against a simple three-part round of Frere Jacques prepared in Finale and exported to MusicXML, the query returns six instances of the motif, the first of which is shown below: <?xml version="1.0"?>
<result>
<motif>
<pitch>
<step>C</step>
<octave>5</octave>
</pitch>
<pitch>
<step>D</step>
<octave>5</octave>
</pitch>
<pitch>
<step>E</step>
<octave>5</octave>
</pitch>
<pitch>
<step>C</step>
<octave>5</octave>
</pitch>
<measure number="1" />
<part id="P1" />
</motif>
<motif>
<!-- Remaining 5 motifs removed
for brevity -->
</result>
5 CONCLUSIONMusicXML has built on the collective work of the XML and music representation communities to become the most widely adopted symbolic music interchange format since MIDI. As the language develops, it will encounter further challenges in the areas of translation and analysis. Our commercial experience to date bodes well for handling translation issues as MusicXML expands to include more data for tablature, percussion notation, and sequencer applications. Our recent XQuery experience gives us new hope that industry-standard XML database tools, combined with MusicXML-based representations, will provide powerful new tools for problems in musical data analysis and information retrieval. ACKNOWLEDGEMENTSEleanor Selfridge-Field, Walter B. Hewlett, Barry Vercoe, and David Huron provided valuable advice and encouragement, along with their outstanding prior work in music representation. Graham Jones, Ian Carter, Jane Singer, William Will, and Craig Sapp were especially helpful during the Dolet for Finale beta test. REFERENCES
Home - Music - Software - MusicXML - Events - Search - Store - About Us - Publications |
||