WASHINGTON - Bill Waawaate is Indigenous, smart, educated, and the millionaire-founder of a highly successful snowmobile company. He also is a comic book superhero from a First Nation in Canada.
"The aim here is to help Canadians understand Indigenous culture and to erase the stereotypes about First Nations communities," said Joseph John, the Montreal-based designer, and publisher of the Citizen Canada comic book series.
Johns wanted his feather-caped superhero to speak English, French, and Cree, a language spoken by more than 95,000 First Nations people in Canada. He assumed he could rely on Google Translate for help.
But the app, which supports 109 languages, does not offer Cree or any of the other roughly 150 Indigenous languages spoken today in North America.
Group of Woodland Cree people, Fort George, James Bay, Quebec, 1893
So Johns started up an online petition urging Google to add Cree to its translation engine. That petition has so far received nearly all the 7,500 signatures he had hoped for.
"For me, it just doesn’t make sense," John told VOA. "Google Translate does offer Maori, the Indigenous language of New Zealand, which is spoken by only about 50,000 persons. How can a company with 135,000 people working for it in 40 nations across the globe not find the resources to add Canada's most widely spoken Indigenous language?"
"Indigenous languages are incredibly important to us," Google spokesperson Justin Burr said via email. As it turns out, though, Cree is a "low resource" language, which means there aren’t enough written translations of Cree documents to populate and "train" automated translation systems like Google’s.
Burr said Google is actively working toward adding more low-resource languages.
"One of those ways is we lean heavily on our contributor community, which allows native speakers to add valuable feedback, verify translations, et cetera, to languages that we do support, as well as languages we have yet to support," said Burr. "Beyond that, we are working on new machine learning techniques that allow us to support the low resource languages with less training data."
University of Colorado linguist Andrew Cowell specializes in Indigenous-language documentation. He explained to VOA some of the challenges for a machine to translate Indigenous languages.
Portrait of Sequoyah, who in the early years of the 19th Century developed the first Cherokee language syllabary.
"Most of the world’s languages aren’t written. They are spoken as household or community languages that are not regularly used in any kind of literate way," said Cowell. "The pattern all over the world is that someone speaks one language at home and then they write in the national language. And so that language isn’t represented online. And even if it is, there won’t be any standardized writing system because people make it up as they go."
Adding a language to Google Translate requires the input of "hundreds of millions of words," according to Cowell. "And it needs to be what's called 'clean data,' which means that you have the same spelling and grammar conventions."
Cree is actually a series of dialects that gradually change across Canada.
"Cree is actually considered to be multiple different languages by linguists -- East Cree, Wood Cree, Swampy Cree, Plains Cree, et cetera," said Cowell. "Even within those languages, there is a good deal of regional variation. So, the 'Cree language' is more complex — and each community of speakers is smaller — than would be suggested by statements that '95,000 people speak Cree.'"
Let's talk: Why is Google Translate still so bad? Will Google translate replace translators?