
Fixover
Amit Ben Mordechai | Zohar Moneta | Yael Kachlon | Bar Vaknin
Collaboration with Dr. Meirav Taieb-Maimon

OUR STORY
In the process of writing a document, many changes need to be done both during the writing process and afterward. While typing or editing, people make many typing mistakes that need to be fixed, bad grammar that needs to be changed, and words that need to be added or removed. All of these cases require repositioning the cursor and this is done mainly with a mouse. The transition between the keyboard and the mouse slows down the workflow considerably and in cases where this transition happens often, it can be very time-consuming.
​
With FixOver, all the user needs to do is simply look at the error and with a voice command, the error will be corrected automatically.

OUR VISION
The Original Vision
In this project, we will be addressing the issue stated above and will attempt to present a solution. Our solution will combine a gaze positioning technique and speech to text. It will allow editing operations while having the user’s hand on the keyboard only. The system will use the gaze positioning technique to identify the location of the gaze on the text editor. This way, when a user would like to change a word, he has his eyes focused on, he will say the word and the word will automatically change.
The Adapted Vision
Coronavirus constraints limited our work with an eye gaze system. Therefore, we have adapted our vision for Fixover to meet this constraint. Fixover is built generically to easily add an eye gaze system to work with Fixover. The current implementation for Fixover is done with the cursor of the mouse. The position of the cursor is a coordinate on the screen, the same as the position of an eye gaze system which is a coordinate on the screen. The system will use the mouse cursor position to simulate the location of the gaze on the text editor and when a trigger word is detected, an edit will take place according to the user’s command. This document explains the requirements of Fixover with an eye-tracking device. There will be a follow-up project that will complete Fixover and have it work with an eye-tracking device.

Fix
This feature is used for cases where there is a spelling error in a word. To correct the error, the user should place the mouse in the error area and say the trigger word "Fix". Afterward, the error will be corrected and highlighted for two seconds.

Add
This feature is used for cases where the user wants to add words. The user should place the mouse where he wants to add the words and say the trigger word "Add" followed by the words he wants to add. Afterward, the words will be added and highlighted for two seconds.

Move Cursor
This feature is used in cases where the user wants to change the cursor position. The user has to place the mouse in the area where he wants to move the cursor and say the trigger word "Move". Afterward, the cursor will move to the mouse area.

Change
This feature is used for cases where the user wants to correct the last error relative to the cursor. The user should say the trigger word "Change". Afterward, the last error relative to the cursor will be corrected and highlighted for two seconds.

Fix Word
This feature is used in cases where there is no spelling error, but the user intended to write another word. The other word should be close to the word the user intended. For example, the user intended to write the word "word". Instead, he wrote the word "world". To correct the error, the user should place the mouse in the error area and say the trigger word "Fix" and afterward the correct word. Afterward, the word changes to the word the user said.

Replace
This feature is used for cases where the user wants to replace a word in the text with a new word, without lexicographic proximity. The user should place the mouse on the word he wants to replace and say the trigger word "Replace" followed by the old word and the new word. Afterward, the old word will be replaced and highlighted for two seconds.

Replace All
This feature is used for cases where the user wants to replace all instances of a word with a new word. The user should say the trigger word "Replace All", followed by the old word and the new word. Afterward, all instances of the old word will be replaced with the new word and highlighted. The user can switch back to the old word if he does not want a change a specific instance of the old word by double-clicking the highlighted word. When the user wants to end this use case he has to say the word "Done" and the highlight will be removed.

ARCHITECTURE
Our system architecture consists of five major components:
Speech-To-Text – This component provides a real-time transcription of audio streams into text. The component benefits from leading-edge speech recognition accuracy powered by deep neural network models. The component will recognize key-word and special user command and alert to the main system component.
Our speech to text component supports two clouds that require an internet connection and one local library that can work in offline mode. The speech to text clouds are Microsoft Azure and Watson IBM. When the Internet is disconnected, the speech to text system is switched from clouds to the local library automatically.
Engine – This component manages the work among all components and responsible for activating the program. The Engine component waiting to alert from the speech-to-text component. When the Engine is triggered by the speech-to-text component, it interacts with the Eye-tracking component and receives (x,y) coordinates. Afterward, the Engine decides which use case to perform according to the trigger word. The engine calls the Text-Editor component to perform the changes in the text file in the given coordinate area.
Eye-Tracking
In the current version, the component returns (x,y) coordinates that represent the mouse position on the screen.
In a future version, the Eye-Tracking component uses the eye-gaze device to consistently follow the user’s gaze to see where he is looking on the screen. Upon request the component returns (x,y) coordinates that represent the user’s gaze location on the screen.
Text Editor – This component provides a working document interface with a text document. This component performs various actions such as editing the document, reading the document, and more. In this current version, the component interacts with Microsoft Word.
Spell Checker – This component finds spelling errors and provides spelling suggestions. In this current version, the component interacts with Microsoft Word spell checker (recommended), NHunspell spell checker.

USER INTERFACE DRAFT


TESTS
Unit Test
In the unit tests, we tested the small system units that verify the proper operation of each function in Speech-to-text, Spell-checker, Text-editor, Engine. For each unit, we examined the good cases, the bad cases, and the edge cases.
Speech-to-text - ‘SpeechToText’ class holds an object from type ‘InterfaceSpeechToText’. This object represents a cloud or a library (Microsoft, IBM, local library), and has tree functions – connect(), listen(), disconnect().To test the ‘SpeechToText’ class, we created a Mock named ‘MockSpeechToText’ that implemented the speech to text interface and simulates the functionality of those three functions to suit the tests.
Text-editor - added a dedicated file on which the tests are performed and contain a pre-written text.
Integration Test
The integration tests were tested at the use case level from Engine to TextEditor. All seven use cases were examined – fix, fix word, add, move, change, replace, replace all. Added a dedicated file on which the tests are performed and contain a pre-written text.
System Test
The system tests examined the workflow of the program and integrated all components, from voice component to text editor component. The system tests examined the opening of the text file, listening to the user, identifying a trigger word, and changing the file as needed.
To simulate listening to the user (the Speech-To-Text component), we created a Mock that implement the speech to text interface and simulates listening to a trigger word. We also created another Mock that implements the Eye-Tracking interface and simulates the cursor position on the screen.
Acceptance Test
acceptance tests were written as a client instruction manual. This file lists manual tests. For each test, the description of the test is written, what to do before the test, and what result to expect.
Our client, "Meirav", in collaboration with us, performed and checked all the acceptance tests and saw that they passed successfully.

Maintenance
In the following document, you will find all the information needed to maintain the system, add new features, and explain the structure and responsibilities of each component.
The document lists the following:

How to add a new speech-to-text

How to add a new command

How to send a message from Speech-To-Text to Engine

How to handle internet disconnected (when working with a cloud)

How to add a new spell checker

How to send a message from Engine to UI

How to add a new use case

How to add a new tracking class

How to add a new text editor

Continued Version Plans
In future versions we will offer support for the following:
​
- 
Eye-tracking system support. The eye-tracking system will monitor the user's eyes gaze and allow the program to be used without the need for a mouse (hands on the keyboard only). 
- 
Support for additional text-editor such as Google Docs etc. 
- 
Support for additional use cases such as "copy-paste", delete, auto-close, etc. 
