Contains unread posts
Post-1
Errors in Statistical Analysis
Contains unread posts
Brian Boger posted Feb 3, 2022 8:24 AM
Subscribe
Two factors that could lead to errors in statistical analysis are misunderstanding the p-value and misinterpreting the correlation between two events. The probability value, known as p-value is used to determine if there is a statistically significant impact or difference when performing statistical analysis. According to Bhatia (2017), “P value do not measure the probability that the studied hypothesis is true, or the probability that the data was produced by random choice alone. Hence, business and organizational decisions should not be based only on whether a p-value passes a specific threshold” (para. 5). Misinterpreting the correlation between two events can occur when two events appear to be related, which interprets to one event causing another (Bhatia, 2017).
According to Velickovic (2017), “In 2012, the New England Journal of Medicine published a paper claiming that chocolate consumption could enhance the cognitive function. The basis for this conclusion was that the number of Nobel Prize laureates in each country was strongly correlated with the per capita consumption of chocolate for that country” (para. 1). When the New England Journal of Medicine performed their statistical analysis, they came to their correlation using group-level data and then categorized individuals into their claim. To have an accurate correlation, individual-level data should have been collected and used to determine if consuming chocolate has any correlation with enhancing the cognitive function, as their claim is just speculation. Scientists identified this error and performed the statistical analysis on their own and came to the same conclusion; there is no link between chocolate consumption and its enhancing effect on the cognitive function (Velickovic, 2017). The consequences associated with this example are chocolate sales may skyrocket and the health of children and adults could be at risk due to eating more chocolate than they should.
A suggestion for a lack of individual-level data is to perform multilevel modeling. When group-level data is collected, it can be put into multilevel modeling which creates a hierarchical data structure by grouping or clustering data and establishing different variables. When group-level data is analyzed in this form, it can be easier to identify or dispel relationships between attributes and thus determine if correlations truly exist (Gray, 2018).
References
Bhatia, R. (2017, August 30). Top 6 most common statistical errors made by data scientists. Analytics India Magazine. https://analyticsindiamag.com/top-6-common-statistical-errors-made-data-scientists/
Gray, K. (2018, October 1). What is Multilevel Modeling? The Digital Transformation People. https://www.thedigitaltransformationpeople.com/channels/analytics/what-is-multilevel-modeling/
Velickovic, V. (2017, February 6). What Everyone Should Know about Statistical Correlation. American Scientist. https://www.americanscientist.org/article/what-everyone-should-know-about-statistical-correlation
Post-2
Statistical Errors and Effects
Contains unread posts
Liz Danhires posted Feb 2, 2022 8:59 AM
Subscribe
Analytical follies can reduce the effectiveness of using statistics to back up a specific stance. The statistical threshold of alpha=0.05 is frequently used to determine if a result is significant or non-significant. If a p-value is less than the threshold, it is considered statistically significant. While this is useful information in disproving a hypothesis, it cannot be the sole evidence (Makin & Xivry, 2019). Another factor that can lead to errors is interpreting correlation as causation. The correlation of two highly correlated variables could be caused by coincidence, reverse-causation, or an unknown cause entirely (Makin & Xivry, 2019). To prevent these errors in statistical analysis skewing results, more tests should be performed to gather additional evidence.
As an example of a statistical error, experts predicted that a rapid increase in crime rates would occur in the early to mid-1990s. These predictions were based on a pattern of continuous growth in crime that had been observed through the 1980s (Levitt, 2004). Instead, crime rates began to fall rather than increase, and they generally continue to fall through the present day. In this case, errors in statistical analysis resulted in a failure to predict the actual change in crime rate. The statistical analysis did not consider factors such as an increased police presence, an aging population, receding drug usage, and the passing of laws that resulted in a smaller at-risk population (Levitt, 2004). A reduction in crime is obviously a positive thing, but the statistical error resulted in the public searching for answers as to why the crime rate dropped rather than increased, resulting in incorrect and often harmful conclusions.
Many felt that the drop in crime was due to innovative policing strategies such as stop-and-frisk. This led to the public calling for the program’s expansion in areas with higher crime such as New York City and Chicago (Levitt, 2004). In reality, these types of policing strategies did not have a significant impact on the crime rate and disproportionately affected black and Latino populations (Badger, 2020). Increased police presence, however, did have a 5-6% reduction in the crime rate from 1991-2001 (Levitt, 2004). If all factors were incorporated into the statistical analysis, the reasons behind the dropping crime rate would have been more widely understood and certain actions could have been implemented across the country to reduce crime further.
References
Badger, E. (2020, November 30). The lasting effects of stop-and-frisk in Bloomberg’s New York. The New York Times. https://www.nytimes.com/2020/03/02/upshot/stop-and-frisk-bloomberg.html
Levitt, S. D. (2004). Understanding why crime fell in the 1990s: Four factors that explain the decline and six that do not. The Journal of Economic Perspectives, 18(1), 163–190.
Makin, T. R. & Xivry, J. O. (2019, October 9). Science Forum: Ten common statistical mistakes to watch out for when writing or reviewing a manuscript. eLife. https://elifesciences.org/articles/48175
—————–
Post-1
Statistical Analysis Errors
This section contains unread posts.
Brian Boger posted at 8:24 a.m. on February 3, 2022.
Subscribe
Misunderstanding the p-value and misinterpreting the correlation between two occurrences are two causes that could contribute to statistical analysis errors. When performing statistical analysis, the probability value, also known as the p-value, is used to evaluate whether there is a statistically significant impact or difference. “P value does not measure the chance that the researched hypothesis is correct, or the probability that the evidence was obtained only by random choosing,” writes Bhatia (2017). As a result, commercial and organizational decisions should not be made solely on whether a p-value exceeds a certain threshold” (para. 5). When two occurrences appear to be related, it is possible to misinterpret their correlation.