Data's Edge: Why More Isn't Always Better

Data’s Edge: Why More Isn’t Always Better

13 June 2025

Data’s Edge: Why More Isn’t Always Better

13 June 2025

Mariam Pachuashvili, ACTR

Databases and Statistics Team Lead

Research Data Quality vs. Quantity – A Historical Lesson That Changed the World of Research Analytics

Imagine a scenario: your company makes a strategic decision based on feedback received from 2.4 million customers. Such a volume of data naturally makes us think that we are dealing with accurate and correct results. But what happens if these results are incorrect?

The 1936 story teaches us that in research, “more” does not always mean “better” – a lesson that even today, in the era of BIG DATA and AI, is relevant and of crucial importance.

2.4 Million Mistake: How America’s Largest Poll Went Wrong

The Literary Digest was like the New York Times of its era – trusted, respected, influential. For nearly two decades, it had successfully predicted the outcomes of American presidential elections through mass polls. In 1936, the magazine faced another major task: predicting the result of the presidential clash between Franklin Roosevelt and Alfred Landon.

The scale of the research sounded impressive: to assess the ongoing contest between Roosevelt and Republican candidate Alfred Landon, questionnaires were sent to 10 million Americans.

2.4 million respondents answered the questionnaire – a number that is impressive even by today’s standards. Based on this enormous sample The Literary Digest confidently predicted Landon’s victory with 57%.

However, the result was completely opposite: Roosevelt defeated Landon by a large margin – 523 electoral votes to just 8.

Quantity vs. Quality of Data

What Caused This Error?

The Literary Digest created its sampling frame from three sources: its own subscribers, telephone books, and automobile owners’ registry.

In 1936, at the peak of the Great Depression, when owning a car or a telephone and even subscribing to magazines and newspapers was considered a luxury, this meant that the survey participants were only relatively affluent Americans – a social stratum that had above-average incomes and traditionally supported Republicans

This led to a sampling bias – a systematic deviation that rendered the results of the entire study invalid.

The Other Side: How the Correct Methodology Worked

For the same election, George Gallup’s American Institute of Public Opinion used a scientifically different approach. They:

They used a sample size of only 50,000 respondents.
They created a representative sample from all social strata.
They correctly predicted the result with only a 1.4% margin of error.

Thus, a 50,000-respondent, yet correctly composed, sample achieved accuracy with 48 times fewer resources, while an incorrect sample of 2.4 million predicted entirely wrong results.

This story demonstrates a fundamental truth for modern analytics: when a sample is not representative, the results cannot be generalized – no matter how large the data volume.

Digital Era Dilemma

Today, when companies collect millions of data points daily, the same fundamental principles apply. Social media observes our behavior, applications determine location, and companies’ CRM (Customer Relationship Management) systems record every interaction with the customer, yet quantity still cannot replace quality:

Online surveys often face self-selection bias – only the most motivated (and often those with the most extreme positions) customers participate, which significantly limits the generalizability of the results.
Social media analytics often perceive the loudest opinions as the most representative (Salience bias), creating a false impression that the opinions of active users reflect the position of the entire society. In reality, a few very active users create more “noise” than thousands of silent ones, and algorithms give the most importance to this noise.
Data in CRM systems systematically exclude passive customers (coverage bias) – that is, those who interact with the company relatively rarely. As a result, business analysis is based only on the behavior and opinions of active customers, which creates a biased picture.

The lesson from 1936 is still valid: it’s not the quantity that matters, but the quality.

Quality Formula: How to Ensure Reliable Analytics

Probability Sampling

In modern research, probabilistic sampling is used – a method that ensures every person in the target population has a known and calculable chance of being included in the survey. This creates the foundation for forming a representative sample.

Weighting Methodology

Even when a sample is correctly composed, factors emerge in a real-world environment that influence the research results, for example: Even when a sample is correctly drawn, real-world factors can influence research results, for example:

non-response bias – Some groups are less likely to respond (do you answer phone calls from unknown numbers?)
coverage error – When the survey completely failed to reach certain segments. When the survey failed to reach certain segments at all.
mode effects – Different data collection methods yield different results (people react differently to online, telephone, or in-person interviews).

This is where weighting becomes crucial – a statistical procedure that adjusts the data by assigning each respondent’s answer an appropriate weight, based on their proportion within the population.

Practical Example: If, in a B2B survey, small companies represent 70% of the respondents, while their actual share in the market is 40%, their responses will be weighted with an appropriate coefficient so that the final results accurately reflect the real market structure.

Risk Management: How to Avoid “Million-Dollar Mistakes”

Data Quality Audit Regular assessment of sample representativeness – from research design through to final analysis.

Transparency in Limitations Open communication about research limitations, confidence intervals, and potential biases.

Responsibility in Data Interpretation It is crucial that analytics be based not only on numbers, but on a properly selected analytical framework – context, hypotheses, and what the data does not tell us.

Takeaway: The Principle That Changes Everything

The Literary Digest fiasco from 90 years ago teaches us the golden rule of modern business analytics:

Representative Sample + Statistically Correct Weighting = Reliable and Reality-Oriented Insights for Decision-Making

This formula is not just statistical theory – it is the foundation of competitive advantage.

When companies truly understand the needs of their customers, employees, and market (and not just the loudest voices), they gain a strategic weapon that works effectively.

The key is to remember: Properly collected small data is better than a million noises that don’t reflect reality.

— * ACT Research’s analytics team works with companies to ensure their research processes are methodologically reliable and business results-oriented, because you cannot make good decisions based on bad data.*

Data’s Edge: Why More Isn’t Always Better

Data’s Edge: Why More Isn’t Always Better

Mariam Pachuashvili, ACTR

Databases and Statistics Team Lead

2.4 Million Mistake: How America’s Largest Poll Went Wrong

Quantity vs. Quality of Data

The Other Side: How the Correct Methodology Worked

Digital Era Dilemma

Quality Formula: How to Ensure Reliable Analytics

Risk Management: How to Avoid “Million-Dollar Mistakes”

Takeaway: The Principle That Changes Everything

Share:

Latest

Subscribe to our newsletter

Subscribe to our newsletter