Table of contents
- What is the problem?
- Contexts of use
- Q1: What is the current state of collaborative translation practices and technologies?
- Q2: How to best integrate a collaborative translation platform with existing Computer-Assisted Tools?
- Q3: How can Machine Translation help collaborative translation communities?
- Q4: How useful is the current implementation of CLWE?
- Q5: How to better isolate textual elements in a page that need translation?
- Q6: What is the value of supporting cross-lingual searching, and how best to implement it?
- Q7: How could bilingual alignment technology decrease reliance on users for assessing translation completion?
- Q8: Design translation editing interface to prevent original contributions in the context of a translation transaction
- Q: Experiment with alternative up-to-dateness
- Q: Incorporate translation management tools
- Q: ???
What is the problem?
Starting September 2008, Marta Stojanovic and Alain Désilets of the National Research Council of Canada will start a 12 month R&D effort around the Cross Lingual Wiki Engine Project.
As many of you know, choosing a good research question is very difficult task, so please help us by reading the possible ideas below, providing comments, and rating them. A good research question is one for which:
- The answer is not known already, and cannot be found easily.
- The answer matters and has large practical consequences for a particular community.
Thx for your help. We are aiming to choose one of them by mid-september.
Note: We're doing this partly as an experiment in the spirit of « The wisdom of crowds » :
BTW: When you rate ideas, make sure you make your own mind and write your answer down before looking at ratings from other folks.
Contexts of use
While collaborative translation has applications in a wide range of situations, we are particularly interested in research that will have impact in the following contexts:
- Government organizations that have some sort of legal obligation to provide content in multiple languages (ex: Canadian Government, UN departments, European Commission departments).
- Companies that need to produce user documentation for their products in multiple languages, and who want to outsource this work to the community of users.
Q1: What is the current state of collaborative translation practices and technologies?
Description
There are lots of sites that are doing collaborative translation, and many technologies that are used to support them. A partial list can be found here:
At this point in time, nobody seems to have a good handle on everything that is happening. It would be good to write a good synthesis of what is happening.
For example, we could write a survey that analyzes the different communities and tools in terms of the extent to which they operate without relying on the Assumptions of conventional translation processes.
Why is this question important?
This is important so we know what has been done already, so we can figure out what the important unresolved problems are, and can focus on solving those instead of re-inventing the wheel.
What makes this a research question?
This is not hardcore quantitative research, but it it falls in the category of qualitative research. It will involve gathering information, writing and analysing surveys, and synthesizing the information into a big picture.
Proposal assessment (please prefix your scores with your initials)
Please help us by providing your own assessment of this research question, on three levels.
Importance: To what degree, do you feel that the answer to this question have important concrete consequences for the community of people doing collaborative translation.
1 = Not important, 5 = Critical importance
Workload: How many person month do you think it will take to answer that question?
Research level: To what degree, do you feel that this qualifies as research?
1 = This is not research at all, 5 = This is definitely research.
Make sure you make up your own mind before looking at other people's assessments (you can view them by clicking in the minus sign below).
Your assessment
[+]Q2: How to best integrate a collaborative translation platform with existing Computer-Assisted Tools?
Description
Professional translators have all sorts of Computer Assisted Translation (CAT) tools at their disposal (ex: terminology databases, translation memories), which amateur translators working in collaborative fashion often do not have.
Which of these tools should be integrated into collaborative translation platforms, and if so, how?
In this project, we would integrate open source CAT tools into TikiWiki, have them used in an actual environment involving amateur translators, and gather feedback about their usefulness, limitations, and suggested improvements.
Another related issue is that some organisations like the UN and EU agencies want to use collaborative translation to outsource translation to communities. But they already have expensive CAT infrastructure, which includes workflow management systems and large terminology database and translation memories. How to integrate these proprietary tools in an open source collaborative platform?
Why is this question important?
This is important because CAT tools have great potential for increasing the productivity of volunteer translators in a collaborative environment.
What makes this a research question?
CAT tools are pretty mature, and we know how to build them for professional translators. We also know that they have a good impact on productivity.
But it's not clear to what extent tools need to be different to help amateur translators, and the extent to which it will actually improve their productivity.
Moreover, the more open and unpredictable technical environment in which amateur translators work poses a number of design questions that will be interesting research from a Human Computer Interaction perspective.
Proposal assessment (please prefix your scores with your initials)
Please help us by providing your own assessment of this research question, on three levels.
Importance: To what degree, do you feel that the answer to this question have important concrete consequences for the community of people doing collaborative translation.
1 = Not important, 5 = Critical importance
Workload: How many person month do you think it will take to answer that question?
Research level: To what degree, do you feel that this qualifies as research?
1 = This is not research at all, 5 = This is definitely research.
Make sure you make up your own mind before looking at other people's assessments (you can view them by clicking in the minus sign below).
Your assessment
[+]Q3: How can Machine Translation help collaborative translation communities?
Description
Collaborative translation communities often do not have sufficient human resources to cover all language pairs, and to provide translation of all content in a timely fashion.
Machine Translation might help in several ways:
- Automatically provide a "gist quality" translation of new content. This would be only a temporary measure until a human translator finds the time fix it.
- Allow volunteer translators to translate content from a source language that they can't read. For example, the MT system would provide a "bad" English translation of a page written in Japanese, and the user could fix that bad English without having to actually read the original Japanese.
Why is this question important?
This is important because communities don't want to spend most of their human resources and energy in translation as opposed to creation of original content. MT has the potential of providing "good enough" translation at a fraction of the cost in human resources that fully manual translation can offer.
What makes this a research question?
MT is still bleeding edge technology, so application that uses it is definitely research.
While there have been studies of the use of MT outputs for the purpose of gisting, and as first drafts to be post-edited by human translators, those have focused on translation of whole documents.
In the context of a collaborative community, we are more likely to want to apply MT to updates to pages. There are some interesting new issues with that context.
For example, consider a French page that is perfectly translated by a human. Someone adds two sentences to the English page. Wouldn't it be nice to be able to insert an MT translation of just those two sentences into the French page, maybe highlighting them in yellow with a warning saying that they were MT translated? Could it be that two, potentially poorly MT translated sentences are more easily understandable when presented in the context of a perfectly translated document? Also, how do we go about reliably inserting those two sentences at the right place in the French (ex: using alignment technology).
Also, suppose I have an English page that is initially all translated by MT to French. Then, I manually correct the bad MT translation to make it perfect. In particular, I modify the structure of sentence 2 to make it sound more like a French sentence (the MT translation used an English-like sentence structure). Then, someone changes the English sentence number 2. What should the MT system do? Should it replace French sentence 2 by an MT translation of the newly modified English sentence 2? If so, chances are that I will have to redo the structure modification in the French sentence 2. Is there a way that the MT system could learn from my correction made to the original French sentence 2, and use the same sentence structure to retranslate the updated English sentence 2?
There may also be some "softer" Human Computer Interaction types of issues. For example, how best to entice readers of a bad MT translation (either of a whole page, or just of a page), to become an active participant in the community by fixing the translation?
Proposal assessment (please prefix your scores with your initials)
Please help us by providing your own assessment of this research question, on three levels.
Importance: To what degree, do you feel that the answer to this question have important concrete consequences for the community of people doing collaborative translation.
1 = Not important, 5 = Critical importance
Workload: How many person month do you think it will take to answer that question?
Research level: To what degree, do you feel that this qualifies as research?
1 = This is not research at all, 5 = This is definitely research.
Make sure you make up your own mind before looking at other people's assessments (you can view them by clicking in the minus sign below).
Your assessment
[+]Q4: How useful is the current implementation of CLWE?
Description
We have made real progress in the CLWE project, thanks to the excellent work by Louis-Philippe Huberdeau. For a demo, see:
How useful is this to end users as it is now? What are the remaining problems to be addressed?
Why is this question important?
CLWE is still at beta stage, and it is crucial to evaluate it in real-use situations, in order to improve it.
What makes this a research question?
This is not hardcore, quantitative style of research, but it falls within the realm of more qualitative Human Computer Interaction research.
Proposal assessment (please prefix your scores with your initials)
Please help us by providing your own assessment of this research question, on three levels.
Importance: To what degree, do you feel that the answer to this question have important concrete consequences for the community of people doing collaborative translation.
1 = Not important, 5 = Critical importance
Workload: How many person month do you think it will take to answer that question?
Research level: To what degree, do you feel that this qualifies as research?
1 = This is not research at all, 5 = This is definitely research.
Make sure you make up your own mind before looking at other people's assessments (you can view them by clicking in the minus sign below).
Your assessment
[+]Q5: How to better isolate textual elements in a page that need translation?
Description
The CLWE system does a pretty good job at knowing when say, the French page is missing some edits that have been made in the English and Spanish pages.
But it does not do a great job at identifying the actual textual elements in the English and Spanish pages that need to be reproduced in French.
The actual issues are complex and a bit hard to explain, but are described in the paper entitled "The Cross-Lingual Wiki Engine: Enabling Collaboration Across Language Barriers" (soon to be available on the web... Google for the title). See the Limitations section of that paper for a description of the problem, and the Future research section for a description of potential solution.
Why is this question important?
The current implementation of displaying what needs to be translated is based on diff technology, which can cause a lot of confusion if there are new page changes interleaved with translations from another language. For example, when translating a change from English to French, the system might in cases where there are interleaved modifications to the English page, indicate that certain portions of the English page need translation into French, when in fact, these English passages were actually created in French originally, and translated to English.
This can cause the users to completely lose faith in the system.
What makes this a research question?
While diff technology is pretty straightforward, patching technology isn't, and often requires that the human be kept in the loop. The main challenge of this project is to find a way to:
- Take a diff between say, versions v5 and v6 of the English page
- Show those diffs in the context of the current version of the English page, say v9.
As far as we know, this is not a trivial problem. More advanced isolation of textual elements in a page that need translation significantly complicates the range of possible translation workflows.
Proposal assessment (please prefix your scores with your initials)
Please help us by providing your own assessment of this research question, on three levels.
Importance: To what degree, do you feel that the answer to this question have important concrete consequences for the community of people doing collaborative translation.
1 = Not important, 5 = Critical importance
Workload: How many person month do you think it will take to answer that question?
Research level: To what degree, do you feel that this qualifies as research?
1 = This is not research at all, 5 = This is definitely research.
Make sure you make up your own mind before looking at other people's assessments (you can view them by clicking in the minus sign below).
Your assessment
[+]Q6: What is the value of supporting cross-lingual searching, and how best to implement it?
Description
In a site that is collaboratively translated, some of the information may be available only in particular languages and not in others.
When searching for information, users probably want to find the information no matter in which language it is present. But obviously they don't want to write the same query in different languages.
There are experimental technologies for doing cross-lingual search. For example, writing a query in English, and having the system search for that in all languages (usually by automatically translating the query to different languages). Combined with Machine Translation system for translating the hits found in different languages, this might be good enough for people to find and understand information in pages written in languages that they can't read.
Does such a feature have value for collaborative translation communities? If so, where does it lie? How can we best implement such features?
Why is this question important?
This is another way to deal with the fact that in collaboratively translated sites, is not always possible to translate all relevant information to all languages in a timely fashion.
What makes this a research question?
Cross Lingual Search technology is still bleeding edge, so it's not clear that it will work to a sufficient level to provide value to end users. We plan to find out by building it and trying it out with real end users.
Proposal assessment (please prefix your scores with your initials)
Please help us by providing your own assessment of this research question, on three levels.
Importance: To what degree, do you feel that the answer to this question have important concrete consequences for the community of people doing collaborative translation.
1 = Not important, 5 = Critical importance
Workload: How many person month do you think it will take to answer that question?
Research level: To what degree, do you feel that this qualifies as research?
1 = This is not research at all, 5 = This is definitely research.
Make sure you make up your own mind before looking at other people's assessments (you can view them by clicking in the minus sign below).
Your assessment
[+]Q7: How could bilingual alignment technology decrease reliance on users for assessing translation completion?
Description
The CLW system currently relies heavily on the user to tell it when a particular translation task is complete (Complete Translation versus Partial Translation buttons for saving). If the user mistakenly pushes the wrong
button, this may result in changes not being propagated to
other languages, or in substantial confusion for subsequent
translators of the same page.
One way to alleviate this problem would be to use automatic
bilingual sentence alignment technologies to perform
a basic sanity check on the alignment of the saved
target page with the source page. The system could then
notify the user when the alignment does not seem to correspond
to his choice of Complete Translation versus Partial
Translation button
Why is this question important?
Users are currently confused about what button to press and often press the wrong button at the wrong time. As pointed above, this can have dire consequences.
Even if you know what to push when, it's easy to get into a grove where you always click on the Partial Translation button to do partial saves, and then forget to get out of that groove for the final save. Finally, even, it's easy to forget to translate say, a sentence, or accidentally delete one, so having the Complete Translation do a sanity check would be useful.
What makes this a research question?
Although bilingual alignment is a fairly mature technology, it's still not 100% robust. Also, it's usually not employed to do sanity checks on translations. It's more used in the context of producing parallel sentences to train MT systems, or to pour into Translation Memory.
So figuring out how to make it work well for that particular context will involve some amount of applied research.
Proposal assessment (please prefix your scores with your initials)
Please help us by providing your own assessment of this research question, on three levels.
Importance: To what degree, do you feel that the answer to this question have important concrete consequences for the community of people doing collaborative translation.
1 = Not important, 5 = Critical importance
Workload: How many person month do you think it will take to answer that question?
Research level: To what degree, do you feel that this qualifies as research?
1 = This is not research at all, 5 = This is definitely research.
Make sure you make up your own mind before looking at other people's assessments (you can view them by clicking in the minus sign below).
Your assessment
[+]Q8: Design translation editing interface to prevent original contributions in the context of a translation transaction
Description
The CLWE system currently requires that users not mix translation and original contributions within the same transaction. In our limited experience using the system, this can be hard to do, especially when one notices an important mistake in the source text, while in the midst of translating it. Unfortunately, if a user makes an original edit while in the midst of a translation dialog, that original edit may never be propagated to other languages.
There does not seem to be an easy way to allow users to mix original edits and translations in the same transaction. However, we can constrain the translation user interface in
such a way as to prevent the temptation. For example, instead
of displaying the full text of the source page in an edit
box, we could display most of it in a read-only text box,
and only display those parts that need to be translated in
editable text boxes.
This constrained user interface may also help track translations
at a sentence-by-sentence level, which in turn may help perform sanity checks on translation alignments (as per
the previous section). Or, it could be that conversely, automatic bilingual alignment technology is needed in order to
identify which sentences the user should be able to edit in
the target text (that is, which sentences in the target text
correspond to changed sentences in the source text).
Why is this question important?
As pointed out above, mixing original content with translated content in a same translation transaction is a fairly common thing which results in original content not being translated in other languages. That's bad.
What makes this a research question?
Designing a UI to do this is not trivial. See our first attempts here: Mockup of a constrained translation
Also, it may require the use of bilingual alignment technology, which, while relatively mature, is still not 100% accurate and has never been used for that particular purpose.
Proposal assessment (please prefix your scores with your initials)
Please help us by providing your own assessment of this research question, on three levels.
Importance: To what degree, do you feel that the answer to this question have important concrete consequences for the community of people doing collaborative translation.
1 = Not important, 5 = Critical importance
Workload: How many person month do you think it will take to answer that question?
Research level: To what degree, do you feel that this qualifies as research?
1 = This is not research at all, 5 = This is definitely research.
Make sure you make up your own mind before looking at other people's assessments (you can view them by clicking in the minus sign below).
Your assessment
[+]Q: Experiment with alternative up-to-dateness
measuresDescription
The current CLWE system uses a measure of up-to-dateness is acceptable and provides useful information as-is. However, it is imprecise and could certainly be improved. Potential solutions include:
- Changing the unit used for counting changes and employ
- Changing the insertion/deletion weights.
- Dynamically adapting the change counting unit as well
of the page.
- Performing deeper content analysis to determine if an
- Presenting the measure graphically instead of numerically
the imprecise nature of the value to the end user.
Why is this question important?
Providing readers and volunteer translators with a measure of up to dateness is important to help them:
- Figure out which version to read to get most up to date information.
- Figure out which language needs most amount of translation work.
What makes this a research question?
Not clear how to measure up to dateness. Will require lots of trial and error to come up with something that makes most sense.
Proposal assessment (please prefix your scores with your initials)
Please help us by providing your own assessment of this research question, on three levels.
Importance: To what degree, do you feel that the answer to this question have important concrete consequences for the community of people doing collaborative translation.
1 = Not important, 5 = Critical importance
Workload: How many person month do you think it will take to answer that question?
Research level: To what degree, do you feel that this qualifies as research?
1 = This is not research at all, 5 = This is definitely research.
Make sure you make up your own mind before looking at other people's assessments (you can view them by clicking in the minus sign below).
Your assessment
[+]Q: Incorporate translation management tools
Description
Although current features of CLWE allow users
to finnd out what translation work needs to be done for any
given page, users have no way of easily assessing which
pages, among all those on a given site, are in most need
of translation work. To deal with this issue, we could implement
simple reporting and visualization tools to help users
answer questions such as:
- What urgent translation requests need to be fulfilled
- What highly-visited pages in my native language are
- What's the average state of up-to-dateness for pages
Why is this question important?
Such tools can increase participation by translators, by making it easy for them to find important and relevant translation work. It may also allow people to act as volunteer coordinators and motivate translators by telling them where their contributions are most needed.
What makes this a research question?
May require a bit of trial and error, and some usability research to figure out the optimal mix of features.
Proposal assessment (please prefix your scores with your initials)
Please help us by providing your own assessment of this research question, on three levels.
Importance: To what degree, do you feel that the answer to this question have important concrete consequences for the community of people doing collaborative translation.
1 = Not important, 5 = Critical importance
Workload: How many person month do you think it will take to answer that question?
Research level: To what degree, do you feel that this qualifies as research?
1 = This is not research at all, 5 = This is definitely research.
Make sure you make up your own mind before looking at other people's assessments (you can view them by clicking in the minus sign below).
Your assessment
[+]Q: ???
Description
Why is this question important?
What makes this a research question?
Proposal assessment (please prefix your scores with your initials)
Please help us by providing your own assessment of this research question, on three levels.
Importance: To what degree, do you feel that the answer to this question have important concrete consequences for the community of people doing collaborative translation.
1 = Not important, 5 = Critical importance
Workload: How many person month do you think it will take to answer that question?
Research level: To what degree, do you feel that this qualifies as research?
1 = This is not research at all, 5 = This is definitely research.
Make sure you make up your own mind before looking at other people's assessments (you can view them by clicking in the minus sign below).