大量数字,如何通过统计学看出其中是否有假?(部分ZT)

贵圈

https://bbs.comefromchina.com/threads/1733668/
注册
2014-10-21
消息
13,078
荣誉分数
2,545
声望点数
273
我们在日常生活中,经常会接触到大量自然、随机产生的数据,比如人口数量,国土面积,产品的销售额度,大公司的财务报表数据,大选中的选票数据等。那么,这些自然产生的数据中,以1为首位数字的数(比如1732,12,136,1,16,198539,108934556……)出现的概率有多少呢?一般人凭著直觉会得出一个结论,概率有1/9,也就是大约11.1%。

然而,这个直觉却是个错觉。在现实中,首位是1的数字出现的概率高达30.1%,是人们直觉的大约3倍!首位是9的数字出现的概率有多大呢?只有4.6%!是不是很出乎意外?

1938年,美国物理学家本福特(Frank Albert Benford)发现了这一规律,并用公式将其准确表述出来,被后人称之为本福特定律(Benford’s Law)(图一)。



 

贵圈

https://bbs.comefromchina.com/threads/1733668/
注册
2014-10-21
消息
13,078
荣誉分数
2,545
声望点数
273
这是大量数据造假时,常用的一个科学甄别手段。搞过科研,尤其涉及大量数据的时候,应该知道。
 

贵圈

https://bbs.comefromchina.com/threads/1733668/
注册
2014-10-21
消息
13,078
荣誉分数
2,545
声望点数
273
那么为什么统计学对数字的规律是科学呢?因为这就是科学。

科学论文如果发现这种统计失常,可能因此而丢掉工作。从此在科学界无立锥之地。(在以前)

现在的科学界,就很难说了。
 

贵圈

https://bbs.comefromchina.com/threads/1733668/
注册
2014-10-21
消息
13,078
荣誉分数
2,545
声望点数
273
怎么个假法?说说。
 

rottenmelon

资深人士
注册
2016-12-06
消息
4,684
荣誉分数
1,029
声望点数
223
主媒说假的就是假的,哪那么多事儿?
 

livingeverywhere

你删贴,就说明你特别害怕我说的,相信JB和贺锦丽真赢8100万选票的人,基本上有认知障碍,离他们远点
注册
2008-08-02
消息
9,565
荣誉分数
1,162
声望点数
373
主媒说假的就是假的,哪那么多事儿?
哈哈哈,和春长说的一样一样,蛋是,几个主媒也不一样啊,怎么办
 

贵圈

https://bbs.comefromchina.com/threads/1733668/
注册
2014-10-21
消息
13,078
荣誉分数
2,545
声望点数
273

livingeverywhere

你删贴,就说明你特别害怕我说的,相信JB和贺锦丽真赢8100万选票的人,基本上有认知障碍,离他们远点
注册
2008-08-02
消息
9,565
荣誉分数
1,162
声望点数
373

lindamy

时代广场舞照跳
VIP
注册
2005-11-23
消息
20,351
荣誉分数
4,508
声望点数
373

Fact check: Deviation from Benford’s Law does not prove election fraud​

By Reuters Staff
9 MIN READ

Social media users have been sharing posts that say a mathematical rule called Benford’s Law provides clear proof of fraud in the U.S. presidential election. However, research papers and academics consulted by Reuters consistently say that deviation from Benford’s Law does not prove election fraud took place.

Reuters Fact Check. REUTERS

Benford’s law says that in many naturally occurring sets of numbers, the first digits of these numbers (eg. the ‘1’ in ‘15’) are not evenly distributed.

Measurements with a lower first digit occur more frequently: 1 is the first digit in a number about 30 percent of the time while 9 begins less than 5 percent of numbers. In certain data sets ranging from rainfall amounts to town populations, the numbers follow a Benford’s Law distribution. Deviation of data from Benford’s law has been examined in areas such as finance to detect if something is not right, for example fraud, mistakes or misstatements (here , here) .

The posts, such as those here and here , show graphs that compare candidate’s vote tallies by leading digit to the expected distribution according to Benford’s law in order to contend that Biden’s vote tallies do not follow Benford’s Law but Trump’s do. Posts state that Benford’s law is a test that has been used before to detect fraud (here) . Captions on the posts include, “Joe Biden’s votes violate Benford’s Law”; “It’s easy to win if you cheat”; “Statistically impossible odds […] now MATH doesn’t even agree with their faux victory.”

1605502810612.png

1605502727188.png

Reuters sought comment from experts regarding these claims.

Theodore P. Hill, Professor Emeritus of Mathematics at Georgia Tech, Atlanta, cautioned that regardless of the distribution uncovered, the application of Benford’s Law would not provide definitive evidence that fraud took place.

“First, I’d like to stress that Benford’s Law can NOT be used to “prove fraud”,” he told Reuters by email. “It is only a Red Flag test, that can raise doubts. E.g., the IRS has been using it for decades to ferret out fraudsters, but only by identifying suspicious entries, at which time they put the auditors to work on the hard evidence. Whether or not a dataset follows BL proves nothing.”

Walter Mebane, Professor at the Department of Political Science and Department of Statistics at the University of Michigan (here) authored a December 2006 article (here) around the application of Benford’s Law to the US presidential election results. The article suggested some limitations of the process, but said in the Abstract: “The test is worth taking seriously as a statistical test for election fraud.”

Nevertheless, Mebane’s article also said, in the Discussion: “In any case, the 2BL test on its own should not be considered proof either that election fraud has occurred or that an election was clean. A significant 2BL test result can be caused by complications other than fraud. Some kinds of fraud the 2BL test cannot detect.”

On Nov. 9, 2020, in response to “several queries” Mebane published a paper called “Inappropriate Applications of Benford’s Law Regularities to Some Data from the 2020 Presidential Election in the United States” (here). His paper says, “The displays shown at those sources using the first digits of precinct vote counts data from Fulton County, GA, Allegheny County, PA, Milwaukee, WI, and Chicago, IL, say nothing about possible frauds” before examining the reasons behind this statement.

“It is widely understood that the first digits of precinct vote counts are not useful for trying to diagnose election frauds,” he writes.

Elsewhere, a study called “Benford’s Law and the Detection of Election Fraud”, published in 2011 by Joseph Deckert, Mikhail Myagkov, Professor of Political Science at the University of Oregon (here) and Peter Ordeshook, Professor of Political Science at Caltech (here), found that Benford’s Law was “problematical at best” when applied to elections: “We find that conformity with and deviations from Benford's Law follow no pattern. […] Its “success rate” either way is essentially equivalent to a toss of a coin, thereby rendering it problematical at best as a forensic tool and wholly misleading at worst.” (here)

Dr Jen Golbeck, Professor of the College of Information Studies at the University of Maryland (www.cs.umd.edu/~golbeck/), said in a thread on Twitter (here) that the claims in the social media posts are false, citing the above article. She told Reuters, “There is just not solid evidence that Benford works in elections at all. The results are profoundly mixed. Which means it’s not evidence of anything.”

1605502662629.png

1605502451435.png

1605502601016.png

Golbeck points out that the numbers on some graphs being cited by social media users are not even labelled, whilst the law “works on very specific types of numbers”. She added that none of the research that analyzes the Benford Law is as simplistic as the analysis people are posting: instead, research uses “quite advanced statistical techniques”, often looking at the second digits which have their own expected distribution.

The specific case of the Milwaukee results was also examined by Professor Boud Roukema of Poland’s Nicolaus Copernicus University. Roukema considered the application of Benford’s Law to the 2009 Iranian elections (arxiv.org/abs/0906.2789) . He told Reuters by email: "A major flaw in applying Benford's law to the Milwaukee results is that the logarithmic distribution - how many "powers of tens" there are - in the numbers of votes per ward in Milwaukee is very narrow. In other words, half of all the wards have total votes from about 570 to 1200, and the logarithmic average (mean) is about 800.

“Biden overall got about 70% of the votes in Milwaukee. So the most likely vote for Biden (in the simplest model, assuming no falsification) in a typical Milwaukee ward is something like 0.7 times 800, which is 560 votes. We expect about half the Biden votes to lie between about 400 and 850 in typical Milwaukee wards.

“So the most popular first digit of the votes for Biden should be 5 - the first digit of 560 - and 4s and 6s and 7s should also be reasonably frequent.

“This is just what we see in the blue vertical bars in top left figure in the diagram at (here). So Benford's law reasoning, applied to the real data, shows no reason to suspect fraud here.”

The academic and digital research coalition Election Integrity Partnership also cautioned against the conclusion that deviation from Benford’s Law is evidence of election fraud (here). It pointed out that for the law to hold, all numbers must be equally likely to appear and the numbers must span multiple orders of magnitude (eg. Range from 100 to 10,000,000). They say that one of these conditions is not met in the election: “For vote tallies, all numbers are equally likely, but not all states meet the second assumption. In the state of Nevada, Esmeralda County has around 900 people while Clark County has over 2,250,000 people. In the state of Vermont, the bounds are much narrower.”

VERDICT​

False. The degree to which Benford’s Law can be used as an indicator of electoral fraud has been debated by academics, but the application of the rule to the leading digit of local vote tallies is problematic and apparent deviation from the law cannot be used alone to prove electoral fraud, experts say.

This article was produced by the Reuters Fact Check team. Read more about our fact-checking work here .

 
最后编辑:

贵圈

https://bbs.comefromchina.com/threads/1733668/
注册
2014-10-21
消息
13,078
荣誉分数
2,545
声望点数
273

lindamy

时代广场舞照跳
VIP
注册
2005-11-23
消息
20,351
荣誉分数
4,508
声望点数
373
愿意相信哪个都可以。
 

rottenmelon

资深人士
注册
2016-12-06
消息
4,684
荣誉分数
1,029
声望点数
223
主媒有庞大的专业团队收集分析数据,
所以给不明真相的群众显示的是用数据跟提取的事实来体现真相...
 
顶部