- 注册
- 2002-10-07
- 消息
- 402,225
- 荣誉分数
- 76
- 声望点数
- 0
Google’s powerful search engine is defeating some court-ordered publication bans in Canada and undermining efforts to protect young offenders and victims.
Computer experts believe it’s an unintended, “mind-boggling” consequence of Google search algorithms.
In six high-profile cases documented by the Citizen, searching the name of a young offender or victim online pointed to media coverage of their court cases, even though their names do not appear anywhere in the news articles themselves.
It’s a curious anomaly that appears to apply primarily to results produced by Google’s search engine. Similar searches in Bing and Yahoo! do not link the protected names to news coverage with the same consistency.
Informed of the findings, a Google spokesperson said the company would take action based on individual complaints. “If search results that violate local laws are brought to our attention, we’ll remove them,” Google Canada’s Aaron Brindle said in a written statement.
The problem was discovered by Citizen court reporter Gary Dimmock, who found that a Google search of a young offender’s name linked to news articles about the youth’s case.
In that case, a Barrhaven youth was found guilty of making a series of fake bomb threats and 911 calls that triggered police SWAT team responses to schools, shopping malls and personal residences across North America. He was 16 at the time and, as a result, his identity was barred from publication by provisions of the Youth Criminal Justice Act.
Dimmock asked editors to investigate the situation and ensure that the newspaper hadn’t accidentally encoded the youth’s name onto its website.
That subsequent investigation found the problem did not lie with the newspaper or its online source code.
There was also meta data to consider. Meta data includes various information fields that reporters or editors fill out to describe an article, its photos or its videos, but which is not visible to the public. Again, nowhere in the metadata did the name appear.
Further inquiries revealed a similar issue existed in at least five other high-profile Ottawa cases in which publication bans had been in effect, as well as in court cases in Montreal and Windsor.
Media outlets are privy to the names of people whose identities are protected by publication bans by virtue of having a reporter in the courtroom.
A Google search for the name of an Ottawa-based RCMP officer convicted of confining, starving and abusing his son links to coverage of that court case. The officer’s identity is protected by a judge’s order designed to shield his son from publicity.
The officer’s name had never been reported by the Citizen or any other media outlet. The abused boy, now 15, was never identified in any article published online. Yet a search of the boy’s name produces results that link to coverage of the case.
In another high-profile case, a 17-year-old was charged in November 2016 with a series of offences after local mosques, synagogues and a church were spray-painted with hateful graffiti. The young offender has never been named in online news articles, but a search of his name connects to them.
In yet another case, searching the name of a 1970s sex-assault victim revealed court coverage of the trial of a Kingston teacher and Ontario Hockey League billet found to have abused him.
In Windsor, former OHL player Ben Johnson was convicted last year of sexually assaulting a “near comatose” 16-year-old girl at a nightclub. The victim’s name, protected by a publication ban, is now tied to online court coverage of the case.
The Google feedback loop means that in select cases, court-ordered publication bans are being undermined.
Most online news articles are archived on the web without an expiry date, which means the pieces will exist there for years to come.
Potential employers, co-workers, friends, partners — anyone searching an individual’s name — can therefore be linked to online coverage that exposes the individual’s past. The situation allows what the court-ordered bans were designed to protect against.
Ottawa lawyer Michael Crystal said he believes Google’s search results may violate Canadian law and open the company to a class-action lawsuit from those whose privacy has been violated.
“It’s very frightening that so many people could have access to this information and it’s virtually unprotected,” said Crystal, a privacy and data breach specialist.
“In the battlefield of privacy law, it seems like we’re always fighting a rearguard action in terms of protecting privacy.”
Google’s Aaron Brindle declined to answer questions about how the search engine can link a given name to court coverage that doesn’t contain it. “Hundreds of factors contribute to what search results appear for a given query,” he said, “including things like PageRank, the specific words that appear on websites, the freshness of content and your region.”
Google co-founder Larry Page once described the perfect search engine as “understanding exactly what you mean and giving you back exactly what you want.”
To that end, Google uses programming algorithms to instantly sort through trillions of web pages to produce the most relevant results. It handles more than 3.5 billion searches a day, and dominates the search engine landscape.
Photo by Leon Neal/Getty Images
Algorithms 101
What is an algorithm?
In simple terms, an algorithm is a set of precise, step-by-step instructions to accomplish a task. By telling a friend how to reach your house from across town, you’re creating an algorithm. A recipe is an algorithm, so is the list of instructions to set up your new TV.
What is a computer algorithm?
A computer algorithm is a set of instructions that automate the task you want a computer to perform. This logical set of rules — rooted in mathematics — allows a computer to solve problems quickly, efficiently and consistently: It means, for instance, that every time you use an accounting spreadsheet to make financial calculations, the correct answers are produced. If you do the same thing twice, you get the same results. The best algorithms are the most efficient, and reach the correct answer with the fewest steps.
How do algorithms work?
Programming languages, such as Python, JavaScript and Ruby, are used to express algorithms in a form that can be read and understood by a computer. These languages are used to write the computer code that unleashes the power of an algorithm, and allows it to solve massive problems — quickly.
What kind of problems can they solve?
Algorithms are at the heart of almost everything your computer does. When Google Maps finds you the best way to a hockey rink — identifying the fastest route from dozens of possibilities — algorithms are at work. Playing chess against the computer is based on a program that uses algorithms to predict the best possible response to every one of your moves. Search engines, which comb through billions of web pages to find the most relevant results, are also based on a complex set of algorithmic instructions.
How does Google use algorithms in its search engine?
Google’s search engine uses algorithms both to analyze the words you use in a search and to produce the results that most accurately answer your query. The algorithms interpret spelling mistakes, and assess words that have more than one meaning. Google has organized the internet into a massive index. Its alogrithms comb that index to match keywords, while also ranking pages based on those with the most relevant and relied upon content. The algorithms instantly analyze hundreds of factors, including the “freshness” of the content, the number of other important websites that link to it, your location and your search history.
The company is constantly refining its algorithms so that its searches do not simply return pages with the most matching “keywords,” but rather those with the most relevant information. Last year alone, Google made 1,600 tweaks to its search engine algorithm.
A former company executive has revealed Google search results are sometimes informed by common search patterns.
In May 2012, Amit Singhal, then a Google senior vice-president, wrote a blog post to unveil major changes to the search engine that tapped into the “collective intelligence of the web.”
“We can now sometimes help answer your next question before you’ve asked it because the facts we show are informed by what other people have searched for,” Singhal wrote.
The inner workings of Google’s search formula is a closely guarded secret. But based on publicly available information, it’s possible to make educated guesses about how publication bans are being thwarted.
In some cases — including the case of the young Ottawa swatter — a protected name was used in social media or in a blog by private individuals – not media outlets – that described the incident in question and linked to coverage. Evidence suggests Google’s algorithm learns to associate web pages with search queries when a link to the page frequently appears alongside a particular phrase.
Some people have used the algorithm to make mischief. For example, in a co-ordinated effort launched by a political blogger in 2003, the search terms “miserable failure” linked to pages about then-U.S. president George W. Bush. The results-altering practice became known as “Google bombing.”
Other publication ban cases are harder to explain, particularly those in which a protected name has never appeared anywhere on the web in connection to a crime. Such is the case for the unnamed RCMP officer found guilty of abusing his son.
Aristides Gionis, a computer science professor at Finland’s Aalto University, suggested the search results may be influenced by common search patterns.
For example, people who know or suspect they know the identity of someone whose name is covered by a publication ban might search for terms such as “John Doe RCMP Ottawa child abuse” and click on links to articles about the case. If enough people do that, the search engine might learn to associate those articles with searches for “John Doe,” even though the name does not appear in the article.
“Search engines are smart enough to learn some hidden associations … but not smart enough to know that the associations should not be used in certain cases,” explained Gionis, an expert in data mining and algorithmic data analysis.
Publication bans: A primer
What is a publication ban?
A publication ban is a court order that prohibits anyone from publishing, broadcasting or transmitting information that identifies an individual involved in the justice system. There are different kinds of bans. Some are meant to protect the integrity of a court case before trial, others the privacy of a victim or witness, and still others safeguard the identity of a young offender.
What legislation is at work?
Canada has an open courts system, but both federal and provincial laws impose limits on what the media can publish. The Criminal Code, the Youth Criminal Justice Act, the Mental Health Act and the Child and Family Services Act all have provisions for publication bans that protect the names of individuals involved in the system.
How are bans used in criminal proceedings?
Publication bans are often imposed early in court proceedings to ensure that potential jurors are not tainted by evidence heard during bail hearings, preliminary hearings and voir dires — hearings conducted during a trial to assess the admissibility of certain evidence. Judges can also impose publication bans to protect the names of witnesses or sexual assault victims. They also have the power to impose publication bans whenever required to ensure the “proper administration of justice.”
How do publication bans apply to young offenders?
The Youth Criminal Justice Act applies to anyone who was under 18 at the time of an alleged offence. One of the law’s fundamental principles is that the identity of a young person should be protected in order make rehabilitation easier. The offender’s identity can sometimes be published if the court imposes an adult sentence.
How has the internet affected publication bans?
The internet and social media have complicated publication bans, and made them difficult to police. In 2012, for instance, the federal government conceded its longstanding publication ban on early election results could not survive the internet age: It lifted an election night blackout on Atlantic Canada returns that had stood since 1938. “The ban … does not make sense with widespread use of social media and modern communications technology,” said Tim Uppal, then minister of state for democratic reform. The ban was in place to ensure that all Canadians cast their ballots without being influenced by early results.
University of Toronto computer science professor Periklis Andritsos, a data-mining expert, said he could not isolate a technical explanation for the phenomenon. “It’s mind boggling,” he said. “I would like to be able to say 100 per cent this is it, but I can’t.”
The best theory Andritsos could offer is that a critical number of people have used the protected name alongside similar search terms, thereby establishing a pattern of links to news coverage.
“If there are a lot of links pointing to these sites, it could be the case,” he said. “It must be accidental, but I can’t find explanations for it.”
Montreal lawyer Allen Mendelsohn said the case involves an untested area of the law since no one employed by Google would know the protected names.
“It must be Google’s technology figuring things out on its own, scraping information in an artificially intelligent way,” said Mendelsohn, an internet law specialist who teaches privacy law at McGill University.
“This could be a violation of a publication ban, but whether it is or not is a bit more complicated because Google is not seen as a publisher, per se, and that adds complexity to the situation.”
— with files from Claire Brownell, National Post
查看原文...
Computer experts believe it’s an unintended, “mind-boggling” consequence of Google search algorithms.
In six high-profile cases documented by the Citizen, searching the name of a young offender or victim online pointed to media coverage of their court cases, even though their names do not appear anywhere in the news articles themselves.
It’s a curious anomaly that appears to apply primarily to results produced by Google’s search engine. Similar searches in Bing and Yahoo! do not link the protected names to news coverage with the same consistency.
Informed of the findings, a Google spokesperson said the company would take action based on individual complaints. “If search results that violate local laws are brought to our attention, we’ll remove them,” Google Canada’s Aaron Brindle said in a written statement.
The problem was discovered by Citizen court reporter Gary Dimmock, who found that a Google search of a young offender’s name linked to news articles about the youth’s case.
In that case, a Barrhaven youth was found guilty of making a series of fake bomb threats and 911 calls that triggered police SWAT team responses to schools, shopping malls and personal residences across North America. He was 16 at the time and, as a result, his identity was barred from publication by provisions of the Youth Criminal Justice Act.
Dimmock asked editors to investigate the situation and ensure that the newspaper hadn’t accidentally encoded the youth’s name onto its website.
That subsequent investigation found the problem did not lie with the newspaper or its online source code.
There was also meta data to consider. Meta data includes various information fields that reporters or editors fill out to describe an article, its photos or its videos, but which is not visible to the public. Again, nowhere in the metadata did the name appear.
Further inquiries revealed a similar issue existed in at least five other high-profile Ottawa cases in which publication bans had been in effect, as well as in court cases in Montreal and Windsor.
Media outlets are privy to the names of people whose identities are protected by publication bans by virtue of having a reporter in the courtroom.
A Google search for the name of an Ottawa-based RCMP officer convicted of confining, starving and abusing his son links to coverage of that court case. The officer’s identity is protected by a judge’s order designed to shield his son from publicity.
The officer’s name had never been reported by the Citizen or any other media outlet. The abused boy, now 15, was never identified in any article published online. Yet a search of the boy’s name produces results that link to coverage of the case.
In another high-profile case, a 17-year-old was charged in November 2016 with a series of offences after local mosques, synagogues and a church were spray-painted with hateful graffiti. The young offender has never been named in online news articles, but a search of his name connects to them.
In yet another case, searching the name of a 1970s sex-assault victim revealed court coverage of the trial of a Kingston teacher and Ontario Hockey League billet found to have abused him.
In Windsor, former OHL player Ben Johnson was convicted last year of sexually assaulting a “near comatose” 16-year-old girl at a nightclub. The victim’s name, protected by a publication ban, is now tied to online court coverage of the case.
The Google feedback loop means that in select cases, court-ordered publication bans are being undermined.
Most online news articles are archived on the web without an expiry date, which means the pieces will exist there for years to come.
Potential employers, co-workers, friends, partners — anyone searching an individual’s name — can therefore be linked to online coverage that exposes the individual’s past. The situation allows what the court-ordered bans were designed to protect against.
Ottawa lawyer Michael Crystal said he believes Google’s search results may violate Canadian law and open the company to a class-action lawsuit from those whose privacy has been violated.
“It’s very frightening that so many people could have access to this information and it’s virtually unprotected,” said Crystal, a privacy and data breach specialist.
“In the battlefield of privacy law, it seems like we’re always fighting a rearguard action in terms of protecting privacy.”
Google’s Aaron Brindle declined to answer questions about how the search engine can link a given name to court coverage that doesn’t contain it. “Hundreds of factors contribute to what search results appear for a given query,” he said, “including things like PageRank, the specific words that appear on websites, the freshness of content and your region.”
Google co-founder Larry Page once described the perfect search engine as “understanding exactly what you mean and giving you back exactly what you want.”
To that end, Google uses programming algorithms to instantly sort through trillions of web pages to produce the most relevant results. It handles more than 3.5 billion searches a day, and dominates the search engine landscape.
Photo by Leon Neal/Getty Images
Algorithms 101
What is an algorithm?
In simple terms, an algorithm is a set of precise, step-by-step instructions to accomplish a task. By telling a friend how to reach your house from across town, you’re creating an algorithm. A recipe is an algorithm, so is the list of instructions to set up your new TV.
What is a computer algorithm?
A computer algorithm is a set of instructions that automate the task you want a computer to perform. This logical set of rules — rooted in mathematics — allows a computer to solve problems quickly, efficiently and consistently: It means, for instance, that every time you use an accounting spreadsheet to make financial calculations, the correct answers are produced. If you do the same thing twice, you get the same results. The best algorithms are the most efficient, and reach the correct answer with the fewest steps.
How do algorithms work?
Programming languages, such as Python, JavaScript and Ruby, are used to express algorithms in a form that can be read and understood by a computer. These languages are used to write the computer code that unleashes the power of an algorithm, and allows it to solve massive problems — quickly.
What kind of problems can they solve?
Algorithms are at the heart of almost everything your computer does. When Google Maps finds you the best way to a hockey rink — identifying the fastest route from dozens of possibilities — algorithms are at work. Playing chess against the computer is based on a program that uses algorithms to predict the best possible response to every one of your moves. Search engines, which comb through billions of web pages to find the most relevant results, are also based on a complex set of algorithmic instructions.
How does Google use algorithms in its search engine?
Google’s search engine uses algorithms both to analyze the words you use in a search and to produce the results that most accurately answer your query. The algorithms interpret spelling mistakes, and assess words that have more than one meaning. Google has organized the internet into a massive index. Its alogrithms comb that index to match keywords, while also ranking pages based on those with the most relevant and relied upon content. The algorithms instantly analyze hundreds of factors, including the “freshness” of the content, the number of other important websites that link to it, your location and your search history.
The company is constantly refining its algorithms so that its searches do not simply return pages with the most matching “keywords,” but rather those with the most relevant information. Last year alone, Google made 1,600 tweaks to its search engine algorithm.
A former company executive has revealed Google search results are sometimes informed by common search patterns.
In May 2012, Amit Singhal, then a Google senior vice-president, wrote a blog post to unveil major changes to the search engine that tapped into the “collective intelligence of the web.”
“We can now sometimes help answer your next question before you’ve asked it because the facts we show are informed by what other people have searched for,” Singhal wrote.
The inner workings of Google’s search formula is a closely guarded secret. But based on publicly available information, it’s possible to make educated guesses about how publication bans are being thwarted.
In some cases — including the case of the young Ottawa swatter — a protected name was used in social media or in a blog by private individuals – not media outlets – that described the incident in question and linked to coverage. Evidence suggests Google’s algorithm learns to associate web pages with search queries when a link to the page frequently appears alongside a particular phrase.
Some people have used the algorithm to make mischief. For example, in a co-ordinated effort launched by a political blogger in 2003, the search terms “miserable failure” linked to pages about then-U.S. president George W. Bush. The results-altering practice became known as “Google bombing.”
Other publication ban cases are harder to explain, particularly those in which a protected name has never appeared anywhere on the web in connection to a crime. Such is the case for the unnamed RCMP officer found guilty of abusing his son.
Aristides Gionis, a computer science professor at Finland’s Aalto University, suggested the search results may be influenced by common search patterns.
For example, people who know or suspect they know the identity of someone whose name is covered by a publication ban might search for terms such as “John Doe RCMP Ottawa child abuse” and click on links to articles about the case. If enough people do that, the search engine might learn to associate those articles with searches for “John Doe,” even though the name does not appear in the article.
“Search engines are smart enough to learn some hidden associations … but not smart enough to know that the associations should not be used in certain cases,” explained Gionis, an expert in data mining and algorithmic data analysis.
Publication bans: A primer
What is a publication ban?
A publication ban is a court order that prohibits anyone from publishing, broadcasting or transmitting information that identifies an individual involved in the justice system. There are different kinds of bans. Some are meant to protect the integrity of a court case before trial, others the privacy of a victim or witness, and still others safeguard the identity of a young offender.
What legislation is at work?
Canada has an open courts system, but both federal and provincial laws impose limits on what the media can publish. The Criminal Code, the Youth Criminal Justice Act, the Mental Health Act and the Child and Family Services Act all have provisions for publication bans that protect the names of individuals involved in the system.
How are bans used in criminal proceedings?
Publication bans are often imposed early in court proceedings to ensure that potential jurors are not tainted by evidence heard during bail hearings, preliminary hearings and voir dires — hearings conducted during a trial to assess the admissibility of certain evidence. Judges can also impose publication bans to protect the names of witnesses or sexual assault victims. They also have the power to impose publication bans whenever required to ensure the “proper administration of justice.”
How do publication bans apply to young offenders?
The Youth Criminal Justice Act applies to anyone who was under 18 at the time of an alleged offence. One of the law’s fundamental principles is that the identity of a young person should be protected in order make rehabilitation easier. The offender’s identity can sometimes be published if the court imposes an adult sentence.
How has the internet affected publication bans?
The internet and social media have complicated publication bans, and made them difficult to police. In 2012, for instance, the federal government conceded its longstanding publication ban on early election results could not survive the internet age: It lifted an election night blackout on Atlantic Canada returns that had stood since 1938. “The ban … does not make sense with widespread use of social media and modern communications technology,” said Tim Uppal, then minister of state for democratic reform. The ban was in place to ensure that all Canadians cast their ballots without being influenced by early results.
University of Toronto computer science professor Periklis Andritsos, a data-mining expert, said he could not isolate a technical explanation for the phenomenon. “It’s mind boggling,” he said. “I would like to be able to say 100 per cent this is it, but I can’t.”
The best theory Andritsos could offer is that a critical number of people have used the protected name alongside similar search terms, thereby establishing a pattern of links to news coverage.
“If there are a lot of links pointing to these sites, it could be the case,” he said. “It must be accidental, but I can’t find explanations for it.”
Montreal lawyer Allen Mendelsohn said the case involves an untested area of the law since no one employed by Google would know the protected names.
“It must be Google’s technology figuring things out on its own, scraping information in an artificially intelligent way,” said Mendelsohn, an internet law specialist who teaches privacy law at McGill University.
“This could be a violation of a publication ban, but whether it is or not is a bit more complicated because Google is not seen as a publisher, per se, and that adds complexity to the situation.”
— with files from Claire Brownell, National Post
查看原文...