|
The National Gallery
of the Spoken Word (NGSW) will create a significant, carefully organized
on-line repository of spoken word collections. A collaborative project
among the humanities, engineering , and library science, the gallery will
provide the first large-scale repository of its kind through the identification
and digital preservation of crucial materials in tape libraries throughout
the United States. It will pioneer developments in informati on storage
as it creates a recognized set of standards for preservation and access,
and constructs sophisticated and integrated search mechanisms. Just as
important, the collaborators on this project identify a complex set of
opportunities for research, t eaching and outreach, because the most significant
measure of the value of the project will be the users it attracts. High
school teachers, college professors, government officials, journalists
and engaged citizens will therefore be crucial collaborators in the creation
of the NGSW.
The NGSW will address critical technical
problems that remain unsolved in the delivery of high-quality voice materials
on the WWW. First, many older analog versions of speech resources suffer
from machine noise, copying distortion, background soun d and media deterioration.
As one of its primary tasks, the NGSW will create a repository of high
quality digital versions of key spoken material with standard bibliographic
and metadata access. Second, while a number of search techniques work
well for text, search techniques for very-large-scale databases do not
yet exist for spoken materials. Participants in this project include researchers
who are recognized leaders in the development of algorithms for searching
using acoustic and linguistic models. Third, in the first attempts to
present sound files on the WWW, little attention has been paid to standardization
of digitization techniques. This project will create a set of standards
for future development of sound on the web, including for formatting ,
sampling procedures, archiving of sound, and the presentation of materials.
The NGSW will help create a history of
sound in the age of its virtual reproducibility. By bringing the spoken
word across the Internet into living rooms, classrooms, research laboratories,
libraries, and government offices, and by delivering the transformative
power of language, rhetoric and speech via the Internet, the NGSW has
the potential to create a worldwide virtual community. However, the difference
between the classical polis and its online equivalent will be the profoundly
democratic na ture of the online community. This process is already occurring
around the world, as is recorded by studies like Rhonda and Michael Hauben's
Netizens, but it has been based primarily on writing: e-mail, online
chat, bulletin boards. As Internet t elephony becomes more feasible and
popular, more and more online community-building discourse is likely to
take the form of speech. Unless an organized project like the NGSW preserves
and publicizes historical speech, however, the emerging online discour
se will be deprived of the historical context out of which it emerges.
By making sure that all voices are represented, we can promote open and
democratic online discourse, with a rich trove of material to support
each perspective, rather than permitting information-rich countries, nations,
ethnic groups or ideological movements to define online oral and aural
history.
Project Significance: Voices on the Internet
At the heart of this project stands a
set of insights derived from media theory (McLuhan 1962, Singeltary and
Stone 1988). Speech incites reaction, while writing admits nothing in
return. Reading is fundamen tally private; speech is necessarily social
and even political. Indeed, the Western rhetorical tradition is founded
on the public nature of speech and the physical proximity of audience
and speaker. While speech recording and radio broadcasts in the early
20th century virtually diminished the distance between speaker and audience,
the Internet has opened a new phase in the history of the reproducibility
of speech by translating the spoken word into a digital file that can
be heard quickly, anywhere in the world. This technology, however, is
still very young.
In order to create a fully communicative
online public sphere, the NGSW will focus upon four key themes:
* The development of political culture
in the twentieth century
* Popular culture in its historical context
* The dialectic of community and the individual
* International relations and conflicts
in the modern era
In addition to these themes, we intend to
situate these oral sources in several key contexts. These voices originate
in historical, contemporary and personal contexts, and the NGSW will provide
interpretive materials to help orient them for the user. In this way the
NGSW, like physical museums, will provide both a storage place for the
overall collection, and a public exhibit "space" for its most evocative
elements. However, unlike a physical museum, our virtual storehouse need
never rotate items out of the exhibited collection; we can continue to
build the most accessible parts of the exhibits and leave them on display.
Moreover, the NGSW avoids such traditional preservation problems as physical
deterioration, inaccessibility to fragile materials, unavailability to
simultaneous multiple users, and inaccessibility to persons with disabilities.
Michigan State University's (MSU) H-Net
(http://www.h-net.msu.edu) has trained scholars from all over the world
in the use of the Internet. Its specific training projects, involving
teachers and scholars from South Africa, Senegal, Poland, Russia, Portugal
and Japan, have developed public discourse through creative online dialogues.
Its most ambitious thematic project, the "Pluralism and Unity" resource
created for the World Exposition in Lisbon (http://www.expo98.h-net.msu.edu),
initiates a publ ic discourse on many dimensions of American democracy,
and will serve as the starting point for an international discussion of
the same themes. As the partner responsible for coordinating development
of the NGSW, H-Net will draw on its experience to deve lop a broad user
base and to engage users as active participants in the community of public
discourse. We especially target the following groups of users:
* Educational users: The collections
in the NGSW will comprise a valuable resource for educators and students
at every level. By creating Web-based materials, lesson plans and exhibits,
the project partners will work directly with a range of sc hool systems
and individual teachers to provide instruction and technical support for
classroom use of the NGSW. Materials from the NGSW could be used under
a wide spectrum of educational strategies, ranging from a traditional
teacher-centered approach t o a more open and exploratory student-centered
approach. Students with visual disabilities could especially benefit from
the ready availability of sound resources.
* Government and policy-makers: An
easily available record of articulate public opinion can be used by policy-makers
to help develop programs which take full account of past arguments, including
public perceptions. As the NGSW develops collect ions based on interviews
with policy makers, it will provide a growing base of information about
crucial and controversial subjects unmediated by historical or other interpretations.
* Broadcast: Clean, broadcast-quality
audio files can provide an enormous resource for radio, television and
Internet journalists. This could prove to be an important route by which
the NGSW reaches a broader public.
* Research: Since not all the material
in the NGSW is available in transcript form, researchers in a variety
of historical and social science fields will find primary material that
is not readily available in any other form. At the same time, a s scholars
increasingly incorporate multimedia and hypertext in their own work, the
voices in the NGSW will be available for inclusion in on-line publications.
* Publishers: As publishers move to
bring multimedia materials into online and CD-ROM publications, a resource
like the NGSW will prove invaluable to their efforts. With proper management,
it is possible that commercial publishers could prove t o be sources of
revenue and intellectual partners in the NGSW's continuing development.
* Engaged citizens: The Internet would
not exist without the enthusiastic support of a much broader public base
than the constituencies above. The American public has demonstrated repeatedly
that it will engage in and learn from serious cultura l and historical
debates if these are made properly accessible to them; cultural tourism
is booming, for example, and C-SPAN continues to have a devoted group
of followers. This NGSW will reach out to all levels of American society,
engaging the widest p ossible range of the members of the democracy.
Project Significance: Technological Innovation
The participation of the Speech Processing
Laboratory at MSU (Department of Electrical and Computer Engineering)
in this project allows the development of a robust search capability.
The great weakness of m ost sound data collections is the need to use
transcripts to navigate at a detailed level. As the NGSW attracts audience
and attention, users will benefit from a search capability that will allow
identification of key words, concepts and names. In collab oration with
the Robust Speech Processing Laboratory at Duke University, we will work
to develop solutions to the challenge of enabling keyword, topic, speaker,
and language searches of the sound files themselves. This search facility
will help make the s ite much friendlier for users in all of our target
categories.
In collaboration with the MSU Library,
the NGSW will also develop a set of standards and practices for the preservation
and presentation of recorded speech. First, we intend to develop standards
of clarity for files containing spoken sound (as oppo sed to musical preservation
and presentation) in cooperation with a range of institutions already
working in this area (see below). Second, clarifying copyright issues
regarding sound files is necessary. Third, the project will evaluate current
software as well as develop new processes for removing machine noise and
reducing copying distortion. Fourth, we will continue our commitment to
training staff and users in the use and development of these materials.
Finally, project staff will engage in active promotion of the site, travelling
to conferences and seeking publicity that will bring attention to the
quality and range of the NGSW's collections to ensure the broadest possible
public participation.
Institutional Context
The NGSW represents an important outgrowth
of ongoing educational and research initiatives in progress at each of
the partnership institutions. First, participants in this project have
demonstrated expertis e in the analysis, use, and development of speech
materials. The Deller and Hansen laboratories represent more than 35 combined
years in speech processing research funded by an array of federal, state,
and private agencies including the NSF, ONR, NIH, US Veterans' Administration,
the Whitaker Foundation, Ameritech, IBM, AT&T and others. Deller and
Hansen co-authored the internationally used textbook Discrete Time
Processing of Speech Signals, which will appear in a new edition in
1999. The bas eline development for the NGSW search engines will be a
state-of-the-art topic-spotting algorithm developed by Hansen's group
at Duke. This development is described below in the section "Search Algorithms
and MetaData." Synergistic applications of Deller' s recent work in system
identification adaptive filtering for speech recognition and coding, with
Hansen's longstanding expertise in speech enhancement will deliver the
machinery needed for web-based enhancement tools.
The Chicago Historical Society [CHS]
is preserving and cataloging Studs Terkel's vast collection of oral interviews.
The majority of the Studs Terkel interview tapes are currently at risk
of physically disintegrating because of a manufacturing erro r in the
cassette tapes themselves. The project to preserve these cassettes has
three phases. The first, funded by Chicago public radio station, WFMT,
and the Chicago Community Trust, involved the creation of a sound preservation
laboratory, transcription of the labels on each reel of tape, and the
appointment of Studs Terkel as Distinguished Scholar in Residence at CHS.
The second, funded in part by the National Endowment for the Humanities
(NEH), involved systematically appraising, prioritizing, arrangi ng, and
cataloging the tapes in order to make them available to researchers and
the public. A third phase will involve preservation reformatting of the
tapes and creation of a cool storage environment for the original tapes.
Jerry Goldman at Northwestern University
has pioneered the delivery of historical audio material over the Internet
with two award-winning projects, "Oyez, Oyez" and "History and Politics
Out Loud." H-Net has received a large grant from the NEH in c ollaboration
with Goldman to expand "History and Politics Out Loud," digitize the 900
original hours of MSU's Vincent Voice Library (VVL), and create web-based
audio resources for scholarly and educational use. This project will build
upon H-Net's use of digitized voice files and JavaScript animations to
develop historical arguments around citizenship and national identity
in turn-of-the-century America. This project represents many accumulated
hours of practical experience with technical and presentatio n issues
the NGSW will face. H-Net and the CHS are also partnering with the Archives
of the African National Congress to develop on-line oral histories of
the South African Liberation struggles.
Second, MSU has made a very serious commitment
to providing the infrastructure and resources necessary to bring online
material into the classroom and the public sphere. MSU was the first large
state university to provide email accounts to all its students. Currently,
the MSU system supports over 45,000 email accounts and permits each to
host webpages. MSU hosts MichNet, an important backbone of the Internet
in the Midwest. MSU has consistently supported projects such as H-Net:
Humanities and S ocial Sciences OnLine, the largest organized consortium
of scholarly networks in the world. MSU Libraries is actively involved
in text digitization efforts with classroom applications. As an acknowledgement
of MSU's commitment to pioneering educational t echnology, the State of
Michigan has awarded the university an annual grant of $10,400,000.
Third, many participants have developed
related expertise in a number of other critical areas. Therefore, the
NGSW will pioneer creation of an extensive testbed whose utility will
be largely generalizable. Application and integration of intellectu al
tools developed under programs such as Phase 1 of the Digital Library
Initiative are necessary to realizing the full potential of this research.
By bringing together five units at MSU including H-Net, the Speech Processing
Laboratory, the MSU Libraries , MSU Museum, and faculty in the College
of Education, as well as Northwestern University, and the CHS, the NGSW
will draw on the strengths of each of these individual units. Developing
this project in a partnership between a premier urban historical soci
ety, a public land-grant and two private universities, the NGSW will demonstrate
how these techniques and research methods can be applied and used across
a range of institutions of diverse size and mission, as well as for a
broad constituency of academic and public individuals. Collaboration with
the Linguistic Data Consortium (LDC) at the University of Pennsylvania
and Duke University's Robust Speech Processing Laboratory (RSPL) will
contribute additional digitization, archiving, production, distribution
, and database management expertise. Applying these materials in a range
of educational levels and settings serves as an important testing ground
for the utility of these applications in classrooms across the nation.
Project Partners
H-Net: Humanities and Social Sciences OnLine
H-Net will coordinate project efforts, construct the web-based user-interface,
and design and implement evaluation metrics. While its technical and institutional
hub resides at MSU, H-Net is an interdisciplinary and worldwide consortium
of scho lars and teachers. A diverse organization engaged in many disparate
and wide-ranging projects, H-Net has a longstanding commitment to maintaining
high scholarly and pedagogical standards while increasing access and availability
of resources to internation al scholars and students. The energizing force
behind H-Net is a shared desire to develop new opportunities for scholarship
and teaching that stem from the rapid technological development in computers
and electronic communication.
With more than 94 networks, which reach over 90 countries, H-Net
is the largest distributor of electronic discussion lists in the world,
and hosts one of the most extensive websites in the humanities and social
sciences. Both an academic and public audience currently involves H-Net
in a number of cooperative endeavors to digitize and make widely available
archival materials, artwork, artifacts, oral histories, and music. Creation
of online databases, exhibitions, photo archives, educational outreac
h programs, and research tools are an important part of H-Net's work.
Central to these endeavors is a commitment to build unique, active, cordial
and enduring cooperation and dialogue among scholars and teachers around
the globe. It is a challenge to inte grate resources into partnering institutions
and to overcome the physical limitations of individual collections. Yet
such partnerships can be an effective way to serve local needs, while
also developing collaborative endeavors. H-Net's partnerships with t he
CHS, American Historical Association, American Political Science Association,
and Oral History Association will contribute expertise and an extensive
audience for the NGSW.
The NGSW will be housed on servers at MSU. From MSU, H-Net will create
and manage the web-based interface as well as facilitate development of
educational materials and online exhibits. Because it is such an interdisciplinary
organization, which re aches both academic and public audiences, H-Net's
outreach endeavors will involve the largest possible number of users.
Through its discussion networks and website, H-Net will ensure that the
NGSW is highly publicized. H-Net will also play a role in evalu ation
of the NGSW and related resources.
Engineering Collaborators and Facilities
Speech Processing Laboratory: Michigan State University:
The Speech Processing Laboratory at MSU (SPL) is a cognate lab in the
Signal Processing Laboratories Consortium in the Department of Electrical
and Computer Engineering at MSU a nd the primary site for the fundamental
speech-processing research undertaken in this project at MSU. Much of
the major equipment necessary for this work is already in place in this
laboratory and in the general computing environment in the College of
Eng ineering. Some of the "local" facilities in the SPL include an array
of networked (Windows NT) Pentium-II-based personal computers, a SUN Sparc
10/30 workstation (Unix), and extensive software support for algorithm
development and signal acquisition and p reprocessing. In addition to
facilities for signal processing, support facilities for graphics, word-processing,
and the like are also in place.
Robust Speech Processing Laboratory at Duke University:
The Robust Speech Processing Laboratory (RSPL) in the Department of Electrical
Engineering has been actively involved in sponsored research for ten years.
Most of the necessary comp uting and data acquisition equipment is in
place at the RSPL. Equipment and facilities available in RSPL include
a network of 7 SUN SPARCStations with 16-bit linear digital audio I/O
(SS-10/402, SS-5's, SS-4's, LX's), 17 GigaBytes of disk storage, a 600Mb
yte optical read/write disk system, a dual-channel 48kHz Gradient 16-bit
stereo audio I/O system with dual telephone interface, a 48kHz DAT-Link
16-bit stereo audio I/O system, a collection of X-window workstations
and IBM PS-2 PC's with real-time digital signal processing hardware, three
Sony Digital Audio Tape Decks/units (DTC-700, DTC-3, DTC-7), Shure SM10A
head mounted close talking microphones, a Kenwood KA-88 integrated amplifier
and other audio equipment, and a 130 square foot sound resist anti-aud
io chamber. Signal processing analysis packages such as Matlab and a site
license for Entropic Systems Waves-ESPS/HTK speech analysis packages.
RSPL has an extensive library of speech data (5 GigaBytes), and is also
a member of the LDC.
Linguistic Data Consortium: The LDC is a consortium
of universities, companies and government research laboratories. It creates,
collects and distributes speech and text databases, lexicons, and other
resources for research and developmen t purposes. The University of Pennsylvania
is the LDC's host institution. The LDC was founded in 1992 with a grant
from the Advanced Research Projects Agency (ARPA), and is partly supported
by a grant from the Information and Intelligent Systems division of the
National Science Foundation. LCD will derive a subcollection from the
NGSW for use in its speech-processing research collection.
InvoTek, Inc.: InvoTek, Inc.: Tom Jakobs, President
of InvoTek, Inc., is a licensed Professional Engineer in the state of
Arkansas and has been developing devices to assist people with disabilities
for the past 10 years. Over this period he has been responsible for the
design of 9 products and numerous custom designs for individuals with
severe disabilities, including vision aides. InvoTek will consult with
the speech laboratories at MSU and Duke University to insure that interfaces
to t he NGSW are friendly to persons with disabilities.
Collections and Digitization Sites
Michigan State University Libraries: The VVL is
a unit within the MSU Libraries. It houses over 50,000 audio recordings
from the dawn of recording to the present. Its holdings include rare recordings
from Edison cylinders, wire record ers, and unique dictation devices.
The MSU audio library is named after G. Robert Vincent, inventor of the
V-Disk in World War II, and chief sound engineer at the Nuremberg Trials,
who founded the VVL when he came to MSU in 1961. The MSU Library has init
iated a digitization program for the original 900 hours of the VVL's holdings
in partnership with H-Net, and will digitize the NGSW materials. Through
this collaboration, the MSU Library has demonstrated its ability to:
* Select appropriate voice material for this project;
* Use the latest technology to enhance the audio quality and comprehension
of the voice material, by reducing machine noise and distortions from
copying;
* Efficiently transfer these often-fragile materials from audiotape
to digital form.
The MSU Libraries cataloguing department has extensive experience
in the creation of bibliographic records. The Library has been creating
and contributing MARC records to OCLC since 1975. Approximately 25,000
Voice Library titles in analog format have been cataloged by the MSU Libraries
and are accessible in the online catalog. In the last several years the
Library has been actively cataloging electronic resources, including a
partnership role in a CIC cooperative cataloging project to provide bi
bliographic access to ARTFL, a digital archive of classics in French language
and literature. The Library holds an elected seat on the Policy Committee
(Executive Board) of the Program for Cooperative Cataloging (PCC), an
international cooperative catalo ging effort housed at the Library of
Congress, and is a participant in the PCC's NACO project for name authorities.
The MSU Library holds over four million volumes, which makes it the
25th largest collection in the Association of Research Libraries - larger
than the libraries of some Ivy League universities. It serves not only
the 42,000 students at MSU, but als o every citizen of Michigan through
its community borrower program, and has active outreach programs to school
districts. The library also has an active text-digitization program, which
works closely with the curriculum needs of undergraduate writing clas
ses. Through its Digital Sources Center the library supports a wide range
of projects including SGML, GIS, text and recorded speech digitization.
The Library also serves as a source of copyright information for the university
community.
Michigan State University Museum: Founded in 1857, the
MSU Museum was one of the first museums on a college campus in the Midwest,
and is one of the oldest university museums in the country. Research is
supported by grants from a variety of public and private sources, as well
as such federal and state agencies as the NEH; Michigan Council for the
Arts, Michigan Humanities Council, NSF, Army Corps of Engineers, National
Geographic Society, National Park Service, Institute of Museum Service
s, and the Smithsonian Institution.
In addition to serving as a research center for extensive public
archeology throughout the Great Lakes region, agricultural history, vertebrate
paleontology, and vertebrate biology, the MSU Museum is the state center
for research, preservation, and documentation of Michigan and Midwest
traditions such as quilting, rag-rug making, decoy carving, Mexican-American
music, foodways, African-American gospel music, inland river boat building,
and Great Lakes maritime traditions. Staff folklife researchers have developed
a national model for identifying and assessing traditional cultural resources.
Part of the Michigan Traditional Arts Research Collection includes over
300 hours of oral history interviews with Native-American and Mexican-American
folk arti sts, quilters and musicians from throughout the Midwest. Interview
topics range from life histories of migrant fieldworkers, to creation
stories and local histories. All of these materials are catalogued. Many
of the tapes include accompanying patterns, q uilts, photographs and other
physical artifacts. The majority of the interviews with Native-American
folk artists also have been transcribed.
Chicago Historical Society: The nation's premier historical
society, the CHS is a privately endowed, independent institution devoted
to collecting, interpreting, and public presentation of the rich multicultural
history of Chicago and Ill inois, as well as selected areas of American
history, through exhibitions, programs, research collections, and publications.
The CHS is committed to using its resources for research and education.
In addition to a number of educational initiatives and tra ining sessions,
CHS has secured title to the Studs Terkel tapes and the intellectual property
they encode. CHS also holds a collection of audio and videotaped histories
conducted by high school students in Chicago Neighborhoods.
The Terkel collection includes 9,000 hours of interviews done by
Studs Terkel since 1954 on WFMT radio in Chicago. Terkel interviewed broadly
in politics and the arts. A complete listing runs for pages but includes
anyone of significance in America n life and culture who visited Chicago
while Terkel's program ran: from Bertrand Russell to Abbie Hoffman, from
Bob Dylan to Luciano Pavorotti. The collection also includes all the interviews
Terkel has done over many years for his books. The collection i s especially
strong in the arts, including musicians, painters, sculptors, novelists,
essayists, poets and artists of all kinds. The collection presents an
almost encyclopedic view of the arts in the United States in the last
45 years and, accordingly, is an important research resource. The NGSW
will provide the widest possible access to these tapes, as well as the
opportunity for electronic public history presentations bearing upon American
history and culture in the 20th century.
The Chicago Neighborhoods material includes about 50 hours of interviews
in each of four Chicago neighborhoods (Douglas/Grand Boulevard, Rogers
Park/West Ridge, Pilsen/Little Village, and Near West Side/East Garfield
Park). High school students tra ined by CHS conducted most of the interviews.
CHS asked them to interview their families and also to find a neighborhood
elder who could talk about what the area "used to be like." The interviews
are all with ordinary people talking about their lives and the places
where they live. They are moving and touching - and also funny in very
unexpected ways.
CHS works with approximately 5,000 teachers each year from Chicago
and suburban schools. About 50,000 elementary school students visit CHS
each year. CHS runs a regular series of workshops for teachers. These
workshops feature the use of the CHS w ebsite materials (http://www.chicagohistory.org]
and occur both in Chicago schools and at the CHS facilities. CHS has a
special relationship with the schools in each of four neighborhoods cited
above, and has developed online and print curricular materia ls on the
history of each neighborhood. CHS also has a specially-funded project
with a group of west-side schools called "History Explorers" and Douglas
Greenberg has a personal mentoring relationship with the administration,
faculty and students of the P aul Cuffee School on Chicago's south side.
Northwestern University: Northwestern University houses
the "Oyez! Oyez! Oyez!" multimedia relational database devoted to United
States Supreme Court and constitutional law, created and run by Jerry
Goldman. Supported by grants from NEH a nd NSF (9602170), the Oyez project
has become an authoritative resource for scholars, students, and professionals
across a range of disciplines. The Oyez audio archive holds more than
500 hours of oral arguments; this number will double in the next two ye
ars. When complete, the audio archives will provide an accessible portal
to the great constitutional controversies of the last half of the twentieth
century.
While the aauthority and authenticity of the audio database has proved
popular to tens of thousands, our preliminary assessment reveals that
the database has also proved frustrating for many scholars and students.
In a sense, this is not surprising . The arguments are lengthy, ranging
from one to three hours. Digesting these materials is difficult work.
In addition, the arguments are not syllogistic exercises, neatly laid
out in Socratic fashion. They are expositions, with frequent interruptions
fro m the justices who may move from topic to topic as their concerns
direct them. Finding coherence in such material is daunting to experienced
and dedicated scholars and professionals. It can be overwhelming to the
uninitiated.
Recent advances in streaming technologies offer the promise of surmounting
audio's inherent linearity. Synchronized Multimedia Integration Language
(SMIL) is a proposed specification of the WWW Consortium (W3C) (Lohr 1998).
SMIL is a powerful way o f linking text, images, and audio over the WWW.
The Oyez project is currently using SMIL, to mark up argument transcripts
to synchronize audio and still images of the speakers with the displayed
text of the argument. This requires the use of transcripts s tored in
the Supreme Court Library. They require scanning and verifying against
the audio source materials. In addition, voices must be identified and
all text must be tagged and time-coded. Once synchronized, however, it
will be possible to search the da tabase using standard Boolean logic
and return multimedia replies. We maintain that such capabilities will
break the linearity barrier that today makes audio materials on the WWW
both compelling and frustrating.
SMIL offers the promise of a real advance in the ability to stream
multimedia in a useful (i.e. searchable) format on the WWW. The Oyez project
has a steady core of users who could easily be called upon to test interactions
of a SMIL database. With more than 80,000 page views a month, the Oyez
project can attract and hold the interest of a diverse audience. An empirical
test of SMIL is the logical next step, and the Oyez project is a good
place to begin. Northwestern University offers a state-of-th e-technology
network and an established infrastructure to distribute substantial multimedia
collections. Previous grant support has provided the hardware, licensing
and content base to test the robustness of various SMIL implementations.
Educational Testbeds and Partnership School Districts
Integrating the NGSW resources into a range of K-12 school districts,
as well as college and university classrooms across the United States
is an important part of this project. Faculty and staff in MSU's College
of Education will oversee the K -12 initiatives, while H-Net will coordinate
the university outreach. The partnership school districts span a range
of communities, from urban and suburban districts that face a serious
lack of social and economic resources, to poor rural districts, to o ne
district with a wealth of resources. These districts were chosen both
because of perceived needs by the districts themselves, and because of
the enthusiasm of individual teachers and superintendents. Each of these
districts also has strong and long-sta nding ties to the collection sites
and other partnership institutions. Through H-Net's networks and college
outreach programs, international faculty will also use the NGSW's collections
in their lectures and discussion sections across a broad range of dis
ciplines. NGSW resources will be integrated into classrooms at MSU, Northwestern,
and Duke University as well.
Beginning in the second year of the project, the College of Education
at MSU will conduct training sessions and summer workshops for teachers
from three targeted school districts in Michigan. The MSU College of Education
will provide follow-up and technical support for the districts both through
the university's 24-hour helpline, coordination with district staff, and
site visits. The College's Departments of Teacher Education and Educational
Technology have international reputations as leaders in e ducation of
teachers. The vision of these programs parallels MSU's basic land-grant
philosophy, which is rooted in a strong commitment to contribute positively
to the challenges facing resource-poor rural and urban communities. Workshops
described in the section on CHS will train teachers from the Chicago area
in the use of web technologies. Members of CHS, including Greenberg, also
work directly in the city's school system through mentoring relationships
with administrators, faculty, and students.
Targeted school districts include public school districts such as
Baldwin, Benton Harbor, and Oak Park in Michigan, and the Paul Cuffee
School in Chicago. The Baldwin Community School district is located in
a small, rural community of approximately 6,000 persons in upper Michigan.
37% of the students are African-American, and 31% of the students live
below the poverty margin. In spite of its size and location, Baldwin has
a well-developed technology program, provides regular hands-on technology
tra ining for its teachers, and employs a teacher to manage technology
in the district. A newly-hired superintendent has also taken over management
of the system, and has implemented a five-year plan designed to engage
parents, community businesses, and organ izations in collaborative efforts
to improve academic performance and prepare students for post-secondary
work. The NGSW will be a fundamental part of this plan.
Benton Harbor and the Paul Cuffee School are both urban districts.
Benton Harbor is a small, urban town of 36,000 persons situated along
the St. Joseph River on the western coast of Michigan. Once a prosperous
fruit-growing region, Benton Harbor wa s also home to foundries and plants
for automobile parts, the Heath Company and Whirlpool. The loss of these
foundries and auto plants in the 1960s and 1970s, seriously undermined
the local economy. 84.3% of the 6,400 students in the Benton Harbor School
District currently qualify for free and reduced lunches, well above the
state average. During the last two years, however, community stakeholders
concerned with the education, health, and social welfare of youth in Benton
Harbor have committed themselves to long-term partnerships with the school
district to improve educational resources. The NGSW fits squarely with
the aims of these community businesses and parents, and will be an important
part of these initiatives.
The Paul Cuffee School is also located in an economically poor, urban
neighborhood. However, the CHS has played a large role in revitalizing
the school by providing a wealth of resources. Through local donors and
commitment of the district, the sch ool has acquired computer technology
to enhance educational resources. Greenberg has personally been involved
in CHS's mentoring program and has worked extensively in the Cuffee School.
NGSW is a natural extension of this work, and will provide further re
sources to the Cuffee classrooms.
Oak Park is another targeted district. Located in the Detroit area,
Oak Park is an ethnically diverse and economically stratified community.
40% of students in Oak Park's public school system are eligible for the
free and reduced lunch program. The district has recently implemented
a six-year strategic planning process to effect substantive improvement
in student achievement. Forming a professional development committee with
representation from the administration, classroom teachers, community,
and parents, is focusing focus on improving student achievement and establishing
parameters for measuring success. The NGSW will continue to improve access
to classroom resources, and to set new measures for success in the districts.
By integrating these mul timedia resources into lesson plans and student
research projects, the NGSW can help to transform the students' learning
experiences, while continuing to build on community initiatives and goals.
Discussion of Key Aspects of Project
Preservation and Access:
Preservation has at least three meanings for librarians and archivists
(Conway 1996), and is central to preserving the quality, longevity, integrity
and accessibility of the digital data contained in the NGSW. First, preservation
makes valuable resources available. Digital conversion can be one of the
most cost-effective and viable means of preserving deteriorating audiotapes
with appropriate standards setting. Second, digital conversion can be
used to create a high-quality copy of an item, thus protecting the original.
By obviating physical handling of audiotapes, digitization prevents further
deterioration. Third, protecting the data stream from corruption or destruction
through carefu l choice of a storage medium is necessary to ensure a long
life expectancy of a digital audio system. The NGSW meets all these preservation
needs, as well as allowing use of these digitized resources by the widest
possible audience.
Longevity of the collections will be
ensured by storing the sound data in standardized formats to facilitate
the ability to migrate data, indexes and software to future technologies.
The NGSW collaborators will also remain intimately involved in th e creation
of those access systems (Comm. on Preservation and Access 1995). Quality,
defined in this context as the usefulness and usability of systems, is
conditioned significantly by the limitations of capture, storage, and
replay technology. Digital P>
conversion places less emphasis on obtaining
a faithful reproduction of the original in favor of finding the best representation
of the original in digital form. By developing mechanisms and techniques
for judging quality of digital audio reproductions , the NGSW will make
it possible to capture and preserve as much intellectual and aural content
as is technically possible and then make that content available to listeners
in ways that are most appropriate to their needs.
Setting standards for digitization of
speech files is an essential component of preservation (see below). We
will adhere to sampling frequency and resolution standards (16 kHz/16
bit) that faithfully preserve acoustic content and endeavor to develo
p digital enhancement and filtering techniques that improve perceptual
measures while minimally effecting authenticity. Another important task
in the creation of massive acoustic database is the specification of lossless
compression routines that make ef ficient use of available channel bandwidth.
Indexing and careful authentication procedures to make sure files are
not altered intentionally or accidentally (Lynch 1995, Wiebel 1995) will
be important steps toward ensuring the physical integrity of the di gital
audio files. Developing metadata interchange standards for audio files,
including tools and techniques that will allow structured, documented,
and standardized information about data files and databases to be shared
across platforms, systems, and in ternational boundaries are all central
objectives of the NGSW. Finally, these steps will assure the widest possible
access to these historically significant resources. The NGSW will develop
access systems that take advantage of the multiple interfaces pos sible
on the WWW. The collaborators will also encourage vendors to provide open
system architectures for audio digitization and encourage backward compatibility
in new system designs.
Digitization and Search Engine: The
digitization efforts proposed for the NGSW will involve several of the
nation's largest physical repositories of historically-significant spoken
word collections which because of physical problems of preservat ion and
access, have previously been accessible to only a small number of individuals.
Digitizing these speeches, oral histories, and newscasts will make these
voices available to a wide public audience for the first time. At the
same time, this effort wi ll preserve these recordings for future generations.
Many of collections are currently decaying and in danger of being lost
altogether. MSU's VVL and Museum, and the CHS house some of the nation's
most significant speech holdings. By digitizing approximat ely 18,000
hours of these important collections, this project will serve as a model
for preservation activity at other universities, at historical societies,
and for private collections.
Currently several speech digitization
standards are in use on the Internet, yet no archival standards have been
adopted for speech files. In order to ensure the widest possible access
to these materials, development of such standards is key. MSU wi ll work
with the LDC to develop speech digitization standards and preserve these
holdings. Because the LDC is an open consortium of universities, companies
and government research laboratories which creates, collects and distributes
speech and text databa ses, lexicons and other resources for research
and development purposes, its guidance and partnership will contribute
an important research and knowledge base to this endeavor. Based on H-Net's
successful collaborative model of decentralized control, digi tization
will be conducted at the individual collection sites in consultation with
the LDC and the MSU and Duke speech processing laboratories. This process
will assure that collection curators retain control over their individual
collections while ensuri ng uniformity of digitization standards and protocols.
Extensive consultation between engineers
at MSU's SPL, Duke University's RSPL, and the LDC will also address methods
used to compress speech data for efficient storage. In collaboration with
the individual collection sites, the project partners wil l preserve and
digitize the recordings, and construct search and retrieval mechanisms.
Balancing preservation concerns with the ability to provide quick access
to acoustic features for search will be a central aim of this project.
Each of the project partners will strive
to preserve the original quality of recordings while also allowing the
materials to be efficiently searched. Algorithms will be developed to
enhance noisy and degraded recordings. This will improve listening quality,
intelligibility and authenticity of the recordings, while also allowing
each user of the database to adjust the enhancement to suit his or her
particular needs. Ultimately, this project will produce web-based equalization,
noise reduction, and e nhancement software that can be used by the researcher,
educator, student or general public to adjust the acoustics to optimize
important perceptual features.
Search Algorithms and Metadata: This
project will result in construction of a search engine, or set of search
engines, to find speech citations in response to four classes of keyboard
inquiries:
* Inquiry Type 1: Speaker and subject
are known (and entered by the user in response to initial interrogations).
* Inquiry Type 2: Speaker and approximate
wording are known (entered). The intention of the search is to locate
the precise wording of a particular quotation.
* Inquiry Type 3: Identify speaker
of given (or approximately given) quotation (entered).
* Inquiry Type 4: Find all speeches
on a given topic (topic entered).
The SPL at MSU and RSPL at Duke will work
collaboratively on this problem and will incorporate results developed
by LDC for the editing process. The basis for the search engine(s) will
be an efficient search algorithm for topic identification developed by
Hansen at the RSPL. The topic-spotting system developed by the RSPL is
based on context-dependent, continuous density hidden Markov models (HMMs)
(Pellon and Hansen 1997, 1998). The user specifies a set of text-based
keywords for a topic search. The s potter automatically extracts the phonetic
pronunciation for each keyword from a 120,000 word phonetic dictionary
developed at Carnegie Mellon University. If the keyword does not exist
in the dictionary, a set of letter-to-sound rules is used to approxima
te the phonetic transcription. Keywords are then modeled using quasi-triphonic
HMMs. Non-keyword speech is modeled using context-independent HMMs. For
pre-recorded data, the topic spotter can process data at approximately
6 times real-time (24 hours in 4 hours of computing). It can handle arbitrarily
long files and has been used to scan as much as 24 hours of data (from
CNN Headline News) at a time.
The digitized data will contain a wide
range of recording conditions, microphone variation, background/telephone/channel
distortions, and distortions due to age and condition of the analog media.
The RSPL topic search engine has processing phases t o address a range
of these issues. While scanning an input audio stream, the keyword spotter
can classify the incoming data as (male/female), (music/noise), and (high
quality/telephone quality). The (music/noise) model was trained from material
captured f rom NPR radio broadcasts. In addition, for tasks requiring
speaker identification, the group at RSPL has recently developed a speaker
identification method based on non-uniform feature sampling which achieves
the same performance (99.3%) at a rate 23 time s faster (Pellon and Hansen
1997) than a standard Gaussian-mixture model-method approach by MIT Lincoln
Lab (Reynolds and Rose 1995).
Starting with Hansen's search engine,
a major challenge will be to develop methods for adaptation and enhancement
of the existing acoustic and language models to perform satisfactorily
on various partitions of the VVL and other databases. Much work will need
to go into this partitioning in conjunction with the methods used for
search. This work will begin immediately upon the availability of a small
database of digitized speech in the early stages of the sampling process,
and will continue for most of the project duration to incorporate new
data, results, archive structures, and methods for adaptation as these
result from cognate research. The other main "search" issue is how to
incorporate the user-supplied information (speaker, dates, etc.) into
the search and, more importantly, to find fast, hierarchical search methods
to permit as much searching in near-real-time as possible. Some current
work in Deller's lab on fast HMM evaluation could be very useful for speeding
up the search for appropriate utterances (Deller and Snider 1993, Lee
and Deller 1998) . In addition, it will be necessary to maintain a dictionary
of available speakers for search, and to allow for fast re-training methods
for new speech data as they become available (Arslan and Han sen 1998,
1999).
Incorporating metatags and other identifiers
will play a role in making search possible across an expanding collection
of sound resources spanning a broad range of original recording type,
age, and subject matter (Heery 1996). Several levels of met adata will
also be used to help users identify speeches based on content, or audio-oriented
data which will provide information about the sound itself. Through a
system of relational databases which will balance the bibliographic and
audio-oriented expert ise of the partners, the NGSW will develop and apply
the first system of audio-oriented metadata while optimizing the effectiveness
of speech search tools (De Rose 1995). Searching mechanisms for audio
files must provide access based not only on author, t itle, edition, imprint,
subjects, etc., but also on characteristics of the sound itself. The ability
to search large bodies of audio information for such ephemeral qualities
as the speakers' accent could reveal relationships as yet unnoticed or
unremarked , and possibly open up new areas of scholarship. In addition,
markup that identified relevant content and structure would facilitate
such a discovery process.
Although there is no clear agreement
in the library and archival communities about how audio files should be
encoded, SGML is the standard coding system for text and offers many benefits
for sound files. SGML imposes no fixed set of component type s, and is
a public, non-proprietary standard, to which software vendors conform.
Such encoding will require some extra effort, but careful selection of
syntax and conventions will make the encoding task manageable (Herwijnen
1986). In addition to SGML ta gging, traditional library catalog records
and metadata will be used to ensure a workable access system for users
of the NGSW from the beginning of the project. MSU Library staff will
create MARC bibliographic records that will be accessible in the Librar
y's online catalog and on OCLC, an international bibliographic database.
These records will be converted using one of several existing programs
from MARC to Dublin Core metadata records that will provide bibliographic
access via the WWW. Both MARC and Dub lin Core are international standards
that assure consistent bibliographic record structures for effective retrieval
(USMARC, 1994 & OCLC, 1997). Access points on both MARC and Dublin
Core records will include authorized forms of speakers and keyword a ccess
to key subject concepts. Name authority standards will be applied to assure
that consistent forms of speakers' names are used as access points. These
bibliographic records will allow the user to define or narrow a search
to a particular topic or spe aker through a textual approach.
The LDC currently has automated procedures
to facilitate the task of developing the audio-oriented metadata, while
the SPL and RSPL laboratories at MSU and Duke University will work collaboratively
to provide quality assessment (Deller, Proakis, a nd Hansen 1993) and
user-friendly interfaces for adjusting perceptual quality. Use of a standard
relational database management system will facilitate this coordination
of efforts. MSU will operate the servers and the database management system,
and perfo rm standard operational tasks including tape backups. Through
careful design of metadata and searching techniques, the NGSW will begin
to answer some of the questions most challenging to electrical and computer
engineering as well as library and informati on science. Our paramount
concern in this project is making this digital information available to
future generations. Metadata, digitization standards, and carefully designed
search systems will help ensure the longevity and data quality of these
digital documents (Rothenberg 1997; Comm. on Pres. and Access 1996).
Enhancement, Restoration and Robust Search
Mechanisms: Development project of algorithms for enhancement of noisy
and degraded recordings will also be a central objective of this project.
This will improve the listening quality, intelligibility, and authenticity
of the recordings. Yet, these features are not necessarily improved concurrently.
Tradeoffs will exist in any attempt to improve one of these perceptual
features. Each user of the database may wish to adjust the enhancement
to suit a par ticular need. A historian or archivist may view particular
types of background noise as important context for a given subject matter,
while a linguist or engineer may be more interested in a specific speech
or recording feature. One objective of this rese arch is to develop web-based
equalization, noise-reduction, and enhancement software that can be used
by the researcher, student or general inquirer, to adjust the acoustics
to optimize important perceptual features. This will further facilitate
use of th e NGSW materials to suit a range of multi-disciplinary purposes.
The MSU and Duke speech processing laboratories
will work collaboratively on this issue. Hansen's group at Duke has been
responsible for formulating a number of effective speech enhancement algorithms
based on constrained iterative spectral constra ints (Auto-LSP) (Hansen
and Clements 1991), auditory constrained speech enhancement algorithms
(ACE-I [Nankumar and Hansen 1995], ACE-II [Hansen and Nankumar 1995]),
and morphological based constraints (MCE) (Hansen 1994). In addition,
many of the traditi onal and more recent speech enhancement algorithms
developed by other researchers are also available for system integration.
One novel approach is a text-based speech enhancement method where knowledge
of the phone sequence is used to formulate a dependen t enhancement method
for the requested audio stream (Hansen and Pellom 1997). Research will
be needed to formulate acoustic background classifiers that prescribe
which enhancement method would be most effective for a given distortion.
Pre-defined enhancem ent configurations will be made available for users
wishing to select a preferred enhancement method, as well as suggestions
for settings that yield the most "authentic" sounding reproduction with
some optimal degree of noise suppression.
Some novel research that combines Hansen's
expertise on quality assessment (Hansen and Arlsan 1995, Hansen and Nandkumar
1995), with Deller's work on set-membership identification (Deller 1996)
will be explored to develop a user-friendly interface for adjusting perceptual
quality. Work on this issue will begin in Year 2, following collection
of a suitable database.
Compression, Encryption, and Copyright
Protection: All academic communities inherently contain differing
and often conflicting perspectives on intellectual property issues. As
producers of intellectual property, university presses and faculty ar
e concerned with preserving copyright protection; as consumers of intellectual
property, university libraries and faculty are more concerned with issues
of "fair use"; while instructional design groups are both producers and
consumers. These conflicting p erspectives lay at the heart of H-Net which
is constitutionally committed to open access at the same time it emerges
as one of the largest humanities publishers in the digital age. Public
discussion helps to develop national policies on intellectual prope rty
rights that will be in the best interests of higher education. Copyright
issues for recording of the spoken word are especially important and largely
unresolved because there is a lack of litigation and case law in this
area. One of the research proje cts in this grant is to examine the issues
and to develop guidelines for this and other voice collections. In this
work, we start from the position that: a) 17 USC 108F(3) exempts news
broadcasts; b) 17 USC 107 allows the NGSW to make fair use of broadca
st segments, depending on the amount and substantiality of the segment;
c) For broadcasts since 1978, permission from broadcast networks should
be sought in a good faith effort to secure their support; d) For voices
since 1923, permission from the speaker s should be sought in a good faith
effort to secure their support. The exception is federal employees speaking
on government business: these speeches are presumed to have the status
of government documents.
Recent research on encryption coding
of speech and images (Kuo, Deller and Jain 1996) conducted in Deller's
SPL at MSU, could be employed to create novel and high-secure "watermarks"
for the speech files. The transform encryption coding tech nique
can potentially be used to create both highly-compressed and highly-secure
signal transmissions with virtually indestructible watermarks. Much research
has been conducted on how to encode digital images with a small "signature"
that is not perceived by the eye because of the natural masking properties
of the human visual system, but will protect commercial interests and
intellectual property (IEEE Int. Conf. ASSP, 1998). This project will
develop a similar system for audio files. The differences bet ween the
speech and image files are significant because of the differences in the
physical properties of the sensory stimuli (dynamic auditory signals vs.
static images), in their digital representations (bandwidth, signal dimension)
and in the human perc eption of these stimuli. The "masking" effects and
construction of digital techniques to exploit these effects post new challenges.
Developing such a system will make it easier to obtain distribution permissions
and new resources from other parties. Work on this problem will begin
as soon as a minimal database is available.
User Interfaces for Blind Persons:
At approximately the midterm of the project, following development of
a prototype user interface for the evolving NGSW, we will begin work on
the development of web-based interfaces for the blind. This work wil l
consist principally of adapting existing prototype interfaces to operate
in a sound-only mode, assuming the availability of certain audio and speech-recognition
capabilities at the user's input terminal. The interfaces will be designed
to support a flex ible array of commercially available, state-of-the-art
audio interface devices. InvoTek, Inc., which has extensive experience
in the development of augmentative and assistive devices, will be the
key developer of these interfaces, in consultation with the speech groups
at MSU and Duke.
Gallery Collections: In addition to
allowing users access to the full repository of sound files, the NGSW
will be composed of collections that span a broad range of topics and
interests. Exhibits will be designed with accompanying text and graph
ics. Connected through a set of relational databases, this system will
facilitate use of the collections in classrooms as well as for a broad
range of resource purposes. The proposed collections will include over
60,000 bibliographic records. Drawing from the rich collections of the
CHS, MSU's VVL, MSU Museum, and Northwestern University, sample collections
within the NGSW would include:
News and Newsmakers: Drawing primarily
on the holdings of MSU's VVL, this will include selections of speeches
by Teddy Roosevelt, Eugene V. Debs and Buffalo Bill Cody, as well as
news broadcasts and special events from 1940 through the 1980s whi ch
are currently housed as part of the Historical Voices and Janak collections
at MSU. Watergate/Vietnam, including a wide variety of perspectives
on Vietnam and Watergate, from presidential speeches to newscasts, is
another strength of this collection. P>
20th Century Inventors and Scientists:
From Thomas Edison's first cylinder recordings to John Glenn talking
about exploring space, this collection will include recordings that
are historically significant both because of their content and speake
r, as well as the technical achievements discussed. These holdings are
currently located in the VVL.
American Life: Using the oral interviews
on which Studs Terkel based his books, and which are owned by the CHS,
this collection will showcase a broad range of American experience and
stories that span social, political and cultural life in the 2 0th century.
Chicago Neighborhoods: Owned by
the CHS, this collection includes family genealogies and oral histories
conducted in several Chicago neighborhoods by local high school students.
These recordings provide a detailed account of urban life, and offe
r a full range of neighborhood accents for linguistic study.
Folklife and Lore: This collection
is composed of taped interviews with a variety of American folk artists.
Recorded stories of Native-American quilters and Mexican-American folk
artists from across the Midwest are a special strength of the coll ection.
These holdings are currently housed at the MSU Museum.
History and Politics Out Loud :
Voices of U.S. presidents, secretaries of state and other government
officials make up the vast majority of this collection, which is housed
at Northwestern University.
Supreme Court Decisions: U.S. justices
and a range of court cases can be heard in these recordings, providing
a far greater range of experience to listeners than reading the transcripts
alone. This collection is also provided by Northwestern Uni versity.
World War II: Including a selection
of broadcast news from the Ripps collection at the VVL, this collection
includes broadcast news recorded from 1940 to 1945, from Pearl Harbor
to the dropping of the atomic bombs.
The range and scope of recorded materials
included in this collection will make the NGSW a central resource for
a broad range of social sciences. The NGSW will also become part of the
educational infrastructure as a place where teachers at all levels c an
go for reliable aural-learning resources.
Educational Resources and Tools: The
NGSW represents an opportunity to think about education at both K-12 and
post-secondary levels in new and exciting ways that make full use of the
new information technologies, while maintaining the highest pe dagogical
and scholarly standards. This project will continue efforts that will
break down communication barriers between teachers and scholars, while
facilitating development of classroom tools which empower instructors,
provide training, support and mod els of successful new teaching techniques,
and forge new links between scholarly societies, museums and educational
institutions. Additionally, the NGSW represents an opportunity to focus
on children who reside in poor and minority school districts by dev eloping
and implementing curricula that promote multiple approaches to learning.
Aural resources can be used to challenge students and develop skills,
while drawing on students' background and interests.
A variety of web interfaces will be tailored
for use by students in a range of grade levels, teachers, and college
faculty. Not only will users be able to search, select and listen to sections
and subsections of sound files, but they will also be a ble to generate
webpages that will contain the information required for the classroom.
Following the "shopping basket" online model of commercial operations
like Amazon.com, the WWW interface will allow teachers to collect materials
and use them to create a webpage. The page itself will either reside on
the project server at MSU, or be downloadable to a server or desktop for
further editing and incorporation into the user's existing Web resources.
The sound files themselves will be served from NGSW's mach ines. Using
relational databases to provide narrative context, video clips, and graphics
will help teachers to easily construct multimedia online lesson plans,
or students to construct multimedia classroom projects. Teachers and students
can also choose t o keep their sites private, or to include them in what
will become a growing public gallery of lesson plans and educational tools
created by NGSW personnel. Using the H-Net model, the galleries will be
"curated" by scholars and teachers to ensure quality.
Users will be able to search for particular
sound clips. An advisory board with representatives from scholarly societies
and a range of universities and K-12 schools will determine criteria,
such as discipline, speaker, time period, and other acces s points to
index the search engine. Each display will contain detailed information
about the file, such as length, language, whether an accompanying video
clip is available, and so on. Once the user finishes collecting material,
she can move to a "collat ing page" which will permit the organization
of links to prepare a classroom presentation or a "reserve readings" page
where students can later link for study and review. Programming would
combine server-side CGI scripts and client-side Java to reduce ser ver
load. Users can draw from the entire repository or specific collections.
For teachers who have no multimedia feed into their classrooms, this method
of WWW delivery will make material collection particularly easy and much
less time-consuming than trip s to various libraries. NGSW will provide
instructions for copying sound files to cassette tapes. Slides for classroom
use can be created from online images. Texts can, of course, be printed
out along with accompanying outline maps and either distributed in paper
form or printed on transparency film.
Exhibits and model lesson plans will
also be provided. Each will be highly interactive and will prompt written
student responses, as well as providing invitations to explore different
levels and pathways within the NGSW website. Student responses c ould
be posted publicly, providing an important opportunity for exchange between
students around the globe. Students will have the option of responding
either to the exhibit itself, or comments by other students.
Educational Testbeds: Testing these
applications and tools in a range of educational institutions, as well
as incorporating teacher-designed materials as a central part of this
site, is fundamental to ensuring wide use of the NGSW. Focusing on s chools
in districts that serve economically disadvantaged and minority children
also provides an important opportunity to use new ways of thinking about
teaching and learning to address the high dropout rates, sporadic attendance
and poor academic perform ance confronting these areas. The NGSW project
will also effectively respond to calls for students to be educated to
meet the challenges of a rapidly changing technological society.
This project will harness the intense
support of teachers who are committed to multiple approaches to learning
and the integration of educational technology into their curricula and
instructional strategies. Over a five-year period, the NGSW will b e integrated
into multiple classrooms within the collaborating districts. We see four
potential models for classroom integration of the materials in the NGSW.
First, and in the simplest case, the teacher can use the collections to
bring sound material in to a traditional lesson plan. Second, the teacher
can invite students to explore one or more of the collections using a
set of thematically appropriate criteria in a guided version of active
learning. Third, students can initiate their own exploration o f the materials,
especially in advanced classes. Fourth, the more controversial issues
can be framed as debates in which students can see contrasting points
of view and work out their own solutions to the issues.
Programmers at the NGSW will create public
interfaces tailored to all four models. The creative use of interactive
database programming will reduce the opportunity cost for busy teachers
who wish to utilize the NGSW's resources in the classroom. The interface
will allow teachers to construct and download classroom presentations;
a student version will help students create their own projects, which
will either be stored on the NGSW's servers or downloaded to a school
or personal computer. In addi tion, we will provide technical support.
Through a series of summer workshops
and training sessions, teachers in these districts will receive professional
development credit and training to help them integrate these materials
into their classrooms. During these workshops, teachers will bu ild web-based
resources to incorporate materials from the NGSW into their lesson plans,
and will be partnered with teachers in different school districts who
teach the same grade levels. In addition to receiving year-round technical
support from MSU's Col lege of Education and staff, these teachers will
also communicate with each other as they work to implement the tools in
their classrooms. Follow-up sessions throughout the academic year, and
site visits by MSU College of Education staff, will also facili tate the
success of these projects.
Similar programs will be implemented
for college faculty and instructors across a range of disciplines. Through
summer workshops, faculty will learn to construct syllabi and tools that
draw on the aural resources in the collection for use both insi de and
outside their classrooms. These lesson plans will also become available
as models for public use. Faculty will be recruited through a variety
of means, including the extensive, international network of over 90,000
H-Net subscribers.
Implementation
First year:
Create prototype interfaces and digitize at least 1000 items to populate
the NGSW. The educational interface, which will allow educators to build
lesson plans, is installed.
Second year: The first teacher
groups will meet in the summer to work on curriculum preparation. First
use of NGSW in K-12 setting. NGSW is used in college teaching. Search
engine is tested. Twenty-five percent of digitization and catalo ging
of originally targeted collections is complete. Construction of exhibits
begins.
Third year: Additional grades
and classrooms begin to use the NGSW. Search engine testing and training
of the system against a wide range of speech-types continues. Outreach
through H-Net recruits other college faculty to use the NGSW in teaching.
Fifty percent of the digitization and cataloging is completed. Construction
of exhibits continues.
Fourth year: Additional grades
and classrooms use the NGSW. Search engine is made available for limited
search types to get feedback from real users. Seventy-five percent of
the digitization and cataloging is completed. Construction of ex hibits
and solicitation of feedback on utility of site continues.
Fifth year: NGSW is institutionalized
at testbed schools. The search engine is completed and installed. One
hundred percent of digitization and cataloging of original collections
is completed. Work begins on new collections and outreach c ontinues to
add oral history and other interview material.
Broad Implications for Research and Generalizability
The real strength of the NGSW project
lies in the scope of its multi-disciplinary and collaborative partners.
The range of institutions, organizations and individuals involved will
ensure that the NGSW will be widely used and easily expandable to meet
a range of research and pedagogical needs. The material archived and made
accessible in the NGSW will be used for purposes that go well beyond pedagogy.
Research in history and the social sciences is gradually moving to the
Web. For example, H-Net publishes the largest and most timely collection
of book reviews in the world. The NGSW will provide primary materials
currently inaccessible to most scholars. Currently, only a few scholarly
projects have emphas ized sound: the most notable of these is "May it
Please the Court," which provided cassette tapes of Supreme Court hearings
as well as analysis of these materials. As online multimedia journals
begin to build interactive resources and so change the nature of scholarly
discourse, historians will be able to include clips from the NGSW as evidence
for their arguments. Cultural historians informed by anthropological research
have focussed on the semiotics of speech, the ways in which speakers signal
their au diences with subtle verbal cues. Writing about this is less effective
than showing it, and the discourse among historians as these materials
are examined will inevitably reveal aspects of the discourse previously
hidden. With the use of the advanced sea rch capability, scholars will
be able to scan large bodies of untranscripted sound files. Since the
reduction of speech to a transcript is an expensive process, the need
to create a transcript is currently a significant barrier to accessibility.
So, for example, large bodies of primary oral history material, such as
the Columbia University Oral History Project, could become far more useful
to scholars with the successful development of the NGSW's unique and innovative
search capability. This project wil l enhance the techniques used in speech
technologies through creation of novel search, compression, and watermarking
techniques. Development of enhancement software will break important ground,
while facilitating use of this resource to suit a variety of research
needs. Making speech resources widely accessible to scholars, teachers,
students, and a wide public audience, remains the guiding aim behind this
endeavor. Through development of standardized digitization processes at
a number of collection sites , this project will further ensure that the
NGSW becomes a network of information that will be able to expand indefinitely
to interface with a wide variety of other campus networks and tools.
Not only will scholars benefit from this,
but the creators of the NGSW believe that this resource will be of substantial
benefit to policy makers. In particular, one of the most significant problems
in policy decisions is a lack of awareness of pr evious arguments on the
same subject. Through the search capability, research assistants for policy
makers will quickly be able to provide historical background material
which can help shape current arguments and decision-making processes.
Historical so und clips can be downloaded and used in the preparation
of briefing documents, which can be delivered online or offline. Although
Washington has had the funding available for research of this kind for
some time, the NGSW will make historical materials r eadily available
for state, county and local governments, and for NGOs. In current debates
over such issues as the environment and welfare, understanding the historical
context can lead to better policy decisions and the formulation of more
persuasive a rguments. By bringing sound clips into the public sphere
as public-domain resources, the NGSW will help level the playing field
in current political debates at every level. As term limits become more
and more common, the common historical memory of poli cy makers disappears
more rapidly. The NGSW can help mitigate this effect by providing a way
to preserve the oral lore of governing communities.
One of the great dangers of the current
state of the Internet is that many people are beginning to exploit primary
materials for commercial purposes. This could have a chilling effect on
multimedia teaching and inhibit the development of innovativ e research
techniques. More important, it can stifle the free-wheeling creativity
and independence of spirit which have been the hallmarks of the Internet
culture. We have no way of predicting what use the broad general public
will wish to make of these materials, but we have confidence that the
Library of Congress, through its American Memory project, has made the
right decision in making its material freely available to the public.
Restricting access to the materials that make up America's collective
cultural memory can only smother an active engagement with the American
past.
Evaluation
The design and implementation process
of the NGSW will be guided by a number of integrated evaluation methods.
An interactive, online evaluation process will be implemented beginning
at the first phase of pr oject development. Use of the Internet and website
to provide and solicit feedback from users and collaborators, user tracking
and questionnaires will all be integral parts of the NGSW website. As
this site is intended to support scholarship, school learn ing, and visits
by curious web surfers, automated measures that identify users, their
navigational choices and their reactions are crucial (Cohen, Tsai, Chechile,
1995; Dede, 1985). Audience profile, concerns, and feedback on usability
will be central to determining who is using the site, how easy it is to
navigate for specific research and educational purposes, as well as ensuring
that this information is accessible to the widest possible public audience.
In particular, we are interested in what keeps a user interested in the
selection they have chosen, especially when the user is a student or a
curious surfer. To make this assessment process manageable, users will
be selected at random and asked to participate in a NGSW interview. Early
efforts will be aimed at measuring the reliability of these data, with
follow-up interviews for some users in order to determine the veracity
of their responses.
Measuring the successes and limitations
of the NGSW in educational settings will be a significant focus of the
evaluation process. These efforts will focus primarily on the partnership
districts and on the teachers who participate in the workshops conducted
by the MSU College of Education. In particular, we are interested in 1)
the use and value of the audio as stand-alone media for education, and
2) how the audio can best be used with complementary visual media. We
anticipate differential patterns of use based on the available media and
the student aptitudes, and expect our research to help tune the NGSW to
assist the widest variety of students. Using an ATI model (Snow, 1989)
with an emphasis on assessing conceptual and complex understanding (May
er and Sims, 1994), learning will be modeled along with covariant measures
of aptitude and use. As the model is calibrated and seamless, mechanisms
for online data collection are available, and we anticipate the use of
the model as an online tool to suppo rt students with quick diagnosis
of poor learning strategies.
As the portions of the NGSW are adopted
for curricular use, groups of teachers will be selected from K-12 and
post-secondary schools to identify additional needs and further evaluate
project performance and usability. We expect this group to inform the
development team of key content and tools whose addition would extend
the use of the NGSW in schools, as well as ways to dovetail components
of the NGSW with traditional parts of the curriculum. Lastly, groups of
teachers will be used to collect data for the purpose of validating outcomes
from learning assessments.
Publication and dissemination of methods
and findings will be a key component to generate and provide scholarly
feedback on the research questions and tools. Published papers and reports
in academic journals and Internet discussion networks will fa cilitate
multi-disciplinary peer review as well as making an increasing number
of individuals aware of the NGSW.
Finally, as this is a true interdisciplinary
effort, it is important to chronicle and assess the collaborative process
(Kruger, Cohen, 1996). Several research groups at different sites are
playing key roles, with the simultaneous goals of producing research,
standards, and a remarkably useable product. Some kinds of collaborations
are likely to serve some goals, but not others. For what purposes are
face to face meetings essential? When is email appropriate? These and
related questions become cruci al for developing the NGSW as well as similar
projects. To help address these questions, key personnel will be required
to keep an online diary of their ideas and reflections on the project
and its goals. We anticipate that these diaries will help keep th e project
on track and offer an opportunity to understand the role different kinds
of collaborations play.
In summary, we expect the assessment
to offer valuable answers to the following questions:
* How do scholars, students and curious users
interact differentially with the resources in the NGSW?
* What interactions best serve individuals
in each of these groups?
* How can we best diagnose potentially unrewarding
visits to the NGSW and improve the chances for success?
* What kind of collaborative processes best
serve large scale research ventures such as the development of the NGSW?
* How can the resources of the NGSW best
serve a diverse set of students and teachers?
REFERENCES CITED
Arslan, L.M., J.H.L. Hansen. "Selective Training
in Hidden Markov Model Recognition," accepted for publication in IEEE
Trans. on Speech and Audio Processing. vol. 7, no. 1, January 1999.
Arslan, L.M., J.H.L. Hansen. "Likelihood
Decision Boundary Estimation between HMM Pairs in Speech Recognition."
IEEE Trans. Speech and Audio Processing. vol. 6, no. 4, pp. 410-414, July
1998.
Beerman, D. and K. Sochats, "Metadata requirements
for evidence," Archives and Museum Informatics, 1966.
Bender, W., D. Gruhl, N. Morimoto, A. Lu.
"Techniques for Data Hiding," IBM Systems Journal . vol. 35, no. 3-4,
pp. 313-336, 1996.
Brassil, J., L. Ogorman. "Electronic Marking
and Identification Techniques to Discourage Document Copying," Info Hiding
, pp. 227-235, 1996.
Cohen, S., Tsai, F., and Chechile, R. A..
"A model for assessing student interaction with educational software.
" Behavior Research Methods, Instruments, and Computers, vol. 27, n. 2,
pp. 251-256, 1995.
Conway, Paul. Preservation in the Digital
World (Washington: Commission on Preservation and Access, 1996).
Caronni, G., H.H. BrÙggermann and W. Gerhardt-HSÿckl,
eds. "Assuring Ownership Rights for Digital Images" in Reliable IT Systems-VIS
'95. pp. 251-263 (Germany: Vieweg, 1995).
Cox, I.J. and Matt L. Miller. "Human Vision
and Electronic Imaging II," SPIE 3016, pp. 92-99, February 1997.
Cox, I.J., J Kilian, T Leighton, T Shamoon.
"A Secure, Robust Watermark for Multimedia," Info Hiding , pp. 185-206,
1986.
Day, M.W. "Extending metadata for digital
preservation." Ariadne. No. 9, May 1997.
Day, M.W. "Preservation of electronic information:
a bibliography." (http://www.ukoln.ac.uk/~lismd/preservation.html), 1997.
Dede, C. Intelligent Computer Assisted Instruction:
A Review and Assessment of ICAI Research and Its Potential for Education.
(Educational Technology Center: Cambridge, MA, 1985).
Deller, J.R., Jr., "Application of OBE algorithms
to speech analysis, recognition, and coding,'' invited chapter in: M.
Milanese, J. Norton, H. Piet-Lahanier, and E. Walter, eds. Bounding Approaches
to System Identification (London: Plenum, 1996).
Deller, J.R., Jr., J.G. Proakis, and J.H.L.
Hansen. Discrete Time Processing of Speech Signals (New York: Macmillan,
1993).
Deller, J.R., Jr., R.K. Snider. "Reducing
redundant computation in HMM evaluation," IEEE Trans. on Audio and Speech
Processing. vol. 4, pp. 465-471, October 1993.
De Rose, S. "Structured Information: Navigation,
Access, and Control," presented at the Berkeley Finding Aid Conference,
[http://sunsite.berkeley.edu/FindingAids/EAD/derose.html], April 1995.
Foote, J., G. Jones, K. Jones, S. Young.
"Talker-Independent Keyword Spotting for Information Retrieval," Proc.
Eurospeech, vol. 3, pp. 2145-2149, 1995.
Hansen, J.H.L. "Morphological Constrained
Enhancement with Adaptive Cepstral Compensation (MCE-ACC) for Speech Recognition
in Noise and Lombard Effect," IEEE Trans. Speech and Audio Processing:
Special Issue on Robust Speech Recognition, vol. 2, n. 4, pp. 598-614,
October 1994.
Hansen, J.H.L., L. Arlsan. "Robust Feature
Stimulation and Objective Quality Assessment for Noisy Speech Recognition
using the Credit Card Corpus," IEEE Trans. Speech and Audio Processing.
vol. 3, n. 3, pp. 169-184, May 1995.
Hansen, J.H.L., M. Clements. "Constrained
Iterative Speech Enhancement with Application to Speech Recognition,"
IEEE Trans. on Signal Processing. vol. 39, n. 4, pp. 795-805, April 1991.
Hansen, J.H.L., S. Nandkumar. "Robust Estimation
of Speech in Noisy Backgrounds Based on Aspects of the Auditory Process,"
Journal of Acoustical Society of America. vol. 97, n. 6, pp. 3833-3849,
June 1995.
Hansen, J.H.L., S. Nandkumar. "Objective
Quality Assessment and the RPE-LTP Vocoder in Different Noise and Language
Conditions," Journal of the Acoustical Society of America. vol. 97, n.
1, pp. 609-627, January 1995.
Hansen, J.H.L. B. Pellom. "Text-Directed
Speech Enhancement using Phoneme Classification and Feature Map Constrained
Vector Quantization," Speech Communications. vol. 21, pp. 169-189, April
1997.
Hardy, I.T. "Internet Archives and Copyright,"
Documenting the Digital Age Conference, 1997.
Heery, R.. "Review of metadata formats,"
(http://www.ukoln.ac.uk/metadata/review.html), 1996.
Herwijnen, Eric. Practical SGML. (Netherlands:
Kluwer Academic Publishing, 1990).
IEEE International Conference on Acoustics,
Speech, Signal Processing. Conference Proceedings, Seattle, May 1998.
International Organization for Standardization.
Information Processing -- Text and Office Information Systems -- Standard
Generalized Markup Language. ISO 8879: 1986 (E).
Koch, E., J. Zhao. "Towards Robust and Hidden
Image Copyright Labeling" Proceedings of 1995 IEEE Workshop on Nonlinear
Signal and Image Processing , pp. 452-455 (Neos Marmaras, Halkidiki, Greece,
June 20-22, 1995).
Kruger, L., Cohen, S., Marca, D., and Matthews,
L. "Using the Internet to extend training in team problem solving, "Behavior
Research Methods, Instruments, and Computers. vol. 28, n. 2, 248-252,
1996.
Kuo, C.J., J.R. Deller, Jr., and A.K. Jain.
"Pre/post filter for performance improvement of transform coding," Image
Communication, vol. 8, pp. 229-239, 1996.
Lee, Y.B. and J.R. Deller, Jr. "State-space
formulations of discrete symbol HMM decoding for fast match," in preparation.
Lohr, S. "Real Networks Hopes New Software
Will Open Up Medium," New York Times, July 13, 1998.
Lynch, C. "The Integrity of Digital Information:
Mechanics and Definitional Issues," Journal of the American Society for
Information Science vol. 45, pp. 77-84, April 1995.
Matsui, K., K Tanaka. "Video-Steganography:
How to Secretly Embed a Signature in a Picture" Journal of the Interactive
Multimedia Association Intellectual Property Project.. vol. 1, n. 1, pp.
1187-205, January 1994.
Mayer, R., & Sims, V. "For whom is a
picture worth a thousand words? Extensions of a dual-coding theory of
multimedia learning. " Journal of Educational Psychology. vol. 86, pp.
389 - 401, 1994.
McLuhan, M. The Gutenberg Galaxy: The Making
of Typographic Man (Toronto, 1962).
Nandkumar, S., J.H.L. Hansen. "Dual-Channel
Iterative Speech Enhancement with Constraints Based on an Auditory Spectrum,"
IEEE Trans. Speech and Audio Processing, Vol. 3, n. 1, pp. 22-34, January
1995.
OCLC. "Dublin Core Metadata." [http://purl.org/metadata/dublin_core],
1997.
OCLC. "Description of Dublin Core Elements."
[http://purl.oclc.org/metadata/dublin_core_elements], 1997.
Payette, Sandra D. and Oya Y. Rieger. "Supporting
Scholarly Inquiry: Incorporating Users in the Design of the Digital Library,"
The Journal of Academic Librarianship . vol. 24, n.2, pp. 121-129, March
1998.
Pellom, B.J., J.H.L. Hansen " An Efficient
Scoring Algorithm for Gaussian Mixture Model based Speaker Identification,"
submitted to Signal Processing Letters, December 1997.
Pellom, B.J., J.H.L. Hansen. "Automatic Segmentation
of Speech Recorded in Unknown Noisy Channel Characteristics," Speech Communication:
Special Issue on Robust Speech Recognition in Unknown Communication Channels.
vol. 24, Fall 1998.
Pellom, B.J., J.H.L. Hansen. "A Duration-Based
Confidence Measure for Automatic Segmentation of Noise Corrupted Speech,"
accepted to ICSLP-98: Inter. Conf. Spoken Language Processing, Sydney,
Australia, December 1998.
Preserving Digital Information (Washington:
Commission on Preservation and Access, 1995).
Reynolds, D., R. Rose. "Robust Text-Independent
Speaker Identification Using Gaussian Mixture Speaker Models," IEEE Trans.
Speech and Audio Processing. vol. 3, n. 1, pp. 72-83, 1995.
Rose, R., D. Paul. "A Hidden Markov Model
Based Keyword Recognition System," Proc. IEEE ICASSP-90. vol. 1, pp. 129-132,
1995.
Rothenberg, J., "Ensuring the longevity of
digital documents," Scientific American. vol. 272, n. 1, pp. 24-29, January
1995.
Rothenberg, J. "Metadata to support data
quality and longevity," (http://www.computer.org/conferen/meta96/rothernberg_paper/ieee.dataquality.html),
1996.
Singeltary, M. and G. Stone. Communication
Theory and Research Applications (Ames, Iowa, 1988).
Smith, J.R., B.O. Comiskey. "Modulation and
Information Hiding in Images" Info Hiding , pp. 207--226, 1996.
Swanson, M.D., B. Zhu, A.H. Tewfik. "Transparent
Robust Image Watermarking" IEEE International Conference on Image Processing,
v III, pp. 211-214, 1996.
Task Force on the Archiving of Digital Information,
Preserving Digital Information. (Washington, D.C.: Commission on Preservation
and Access, 1996).
USMARC Format for Bibliographic Data, including
Guidelines for Content Designation. Washington, Cataloguing Distribution
Service, Library of Congress, 1994.
van Schyndel, R.G., AZ Tirkel, CF Osborne.
"A Digital Watermark," International Conference on Image Processing, vol.
2, pp. 86-90, Austin, TX, 1994
Weibel, Stuart, "The Foundation of Resource
Description," D-Lib Magazine [http://www.dlib.org], July 1995.
|