Scalable Text-to-Speech Software Architecture
Semester 1, 2022
Summary
In this assignment, you will demonstrate your ability to design, implement, and deploy a web application that can process a high load, i.e. a scalable application. You will be asked to deploy a tool that accepts text input and generates synthesized speech output. Specially you need to support:
• Uploading and generating speech for text input of varying sizes.
• Access via a specified REST API for use by front-end interfaces.
• Remaining responsive to the user while generating speech.
Your service will be deployed to AWS and will undergo automated correctness and load-testing to ensure it meets the required scalability.
1 Introduction
Text-to-speech software supports accessibility, enables smart-home devices, and even breaks down language barriers. Unfortunately, text-to-speech is computationally intensive. While the technology has made great advances over the past few decades, many open-source implementations are still inefficient.
Task For this assignment, the University of Queensland is looking to convert all course content into speech. This will support visually impaired students in their studies. All course content fromslack messages and blackboard announcements to textbooks must be converted to speech. You will be responsible for designing and implementing a service to generate synthesized speech for use across the entire university.
Requirements As you might imagine, blackboard announcements occur frequently and should be translated in almost real time. While textbooks are set ahead of semester and may take several days to process. The university will experience peaks of usage. At the start of semester, instructors set many textbooks which need to be processed. The university will also experience usage lows over the summer holiday period when few translations will be required. The university is not willing to pay for the resources required during usage peaks all year round. Your implementation must be able to scale dynamically based on the current amount of jobs to be processed.
2 Interface
Your service will be utilised by almost every system in the university. Every university service must support text-to-speech on the first Monday of semester two. An interface specification has already been developed and distributed to existing service owners, who are working hard to deliver support for their services.
You must implement this interface exactly as described. The interface specification is available to all
service owners online: https://csse6400.uqcloud.net/assessment/chatterbox
3 Implementation
To ensure that your service is able to faithfully generate voice clips that our tests will expect, there are some restrictions on how your service can be implemented.
Your service must utilise the chatterbox command line tool provided for this assignment. This tool is available on pypi and may be installed using pip: https://pypi.org/project/uq-chatterbox/