Frequently Asked Questions
About GoodBed's Mattress Testing
This document answers common questions about how we test and evaluate mattresses, including the principles, frameworks, and safeguards that guide our methodology.
Our goal is to produce ratings that are both scientifically grounded and practically useful – translating complex performance characteristics into clear, comparable insights. The FAQs below outline how our testing is structured, how results are interpreted, and how they are used to inform product research, comparisons, and personalized recommendations. They are a complement to the more general information we provide about our scientific mattress testing methodology.
At the same time, certain elements of our methodology remain proprietary. This is intentional and necessary to preserve the integrity of the system and prevent misuse. In particular, detailed test procedures can be difficult to execute and analyze correctly, and if replicated without the necessary expertise or rigor, they can produce misinterpreted and misleading results. When presented alongside similar-looking methods, such results can create the appearance of credibility without delivering the same level of accuracy. Protecting these elements helps ensure that our testing continues to reflect genuine product performance and remains clearly differentiated from less rigorous or potentially misleading approaches.
How are various metrics tested?
Our testing is conducted using a standardized, repeatable protocol developed over four years. This comprehensive battery of lab tests is designed to measure the full set of objective mattress characteristics that we refer to as Mattress DNA™. Each test is designed to enable precise and consistent measurement of a specific attribute across a wide range of mattress types, constructions, and price points, using a unified methodology.
Most of these assessments rely on quantitative measurement, and in many cases are entirely data-driven. Where quantitative methods are not applicable or sufficiently reliable, structured qualitative assessments are used, guided by defined evaluation frameworks to maintain consistency.
Across all metrics, our goal is to assess products in both absolute and relative terms, enabling meaningful comparison while also conveying real-world performance implications. The full test battery encompasses dozens of individual tests. We do not disclose step-by-step procedures.
Are testing methods quantitative or qualitative?
Measurement approaches vary by metric. Many core performance characteristics (e.g., cooling, edge support, motion isolation, softness, memory feel, bounce, cushioning depth) are derived from quantitative measurements. Other attributes may be based on structured qualitative assessments of materials, construction, or features (e.g., suitability for specific use cases such as children or latex sensitivity). Some metrics use a hybrid approach, wherein quantitative test data is translated into normalized ratings using defined interpretive frameworks. In these cases, limited human interpretation may be applied, although we continuously work to reduce subjectivity as methods and data reliability evolve.
What scoring scales are used?
All scores are normalized onto a consistent, standardized scale that we maintain across our database to enable apples-to-apples comparisons.
Most performance metrics use a 1-10 scale, where 10 represents the best level of performance that a consumer can expect, and 1 represents the worst. The specific formulas by which we translate raw test data into normalized ratings are proprietary.
By contrast, "Feel" metrics are treated differently. These represent characteristics that can't be universally better or worse, since they're experienced subjectively based on a given sleeper's personal preferences. These metrics are presented on a 1-9 scale, with 5 representing the average across all available models.
What qualifies as a high vs low score for each metric?
Rating scales on GoodBed are designed to convey both relative standing and a clear consumer takeaway. Specifically, scores are calibrated against the full distribution of products we've tested to reflect where a product falls within the market, while also mapping that position to a practical interpretation of performance (e.g., whether performance is poor, good, or exceptional). As a result, a given score communicates not only how a product compares to alternatives, but also what conclusion a consumer should reasonably draw about its performance in that area.
For "Feel" metrics, where there are no universally better or worse ratings, a higher number simply reflects having more of that characteristic, while a lower number reflects having less. 1 and 9 represent the extreme ends of the available range of options.
For other metrics ("Fit" and "Features"), each quantitative rating is anchored to a descriptive performance category. Here are the rating categories for performance metrics on GoodBed:
- 10 - Exceptional
- 9 - Excellent
- 8 - Very Good
- 7 - Good
- 6 - Pretty Good
- 5 - OK
- 4 - Mediocre
- 3 - Fair
- 2 - Poor
- 1 - Very Poor
Do you calculate an overall score for each mattress?
We do not calculate or publish overall scores. Mattresses are highly personal products, and the same product can perform very differently depending on an individual’s unique needs and preferences. As a result, we believe that aggregating performance into a single weighted score is inherently misleading. Instead, we present ratings for individual attributes so that each characteristic can be considered in the context of what matters most to a given consumer. In our matching system, we use each consumer’s stated requirements to dynamically weight relevant characteristics when calculating a personalized match score, placing greater emphasis on the attributes they indicate are most important and, for Feel metrics, how closely a product aligns with their stated preferences in that area.
How long does it take to perform the testing?
Completing a full battery of tests across all metrics typically requires approximately 50–60 hours of actual testing time per mattress. The amount of elapsed time that this requires can vary based on scheduling and sequencing of tests. Additional time is required for analysis, validation, and synthesis of results. The duration of each test is consistent across all products tested.
How many times are tests repeated?
For the majority of tests, measurements are performed in triplicate, outliers are removed, and remaining values are averaged. However, destructive tests such as durability are performed only once.
How do ratings compare to category averages or similar mattresses in your database?
Our normalized rating system is inherently benchmarked. When developing the scales and translating raw performance data into ratings, we calibrated against a broad set of tested mattresses that reflected the full range of performance – including both the extremes and more typical outcomes. As a result, normalized scores automatically reflect category norms and relative standing within the market. This framework also allows us to contextualize a given product’s performance relative to relevant peer groups, such as similar construction types, price tiers, or feel profiles.
How is firmness (softness) calibrated across different mattresses?
We refer to this metric as softness to distinguish it from supportiveness, as it's intended to describe a Feel characteristic rather than an assessment of structural support. Softness is measured using standardized tests that analyze mattress deflection under varying levels of applied force, allowing us to capture how the mattress responds at different depths. This includes separate assessments of upper-level softness and deeper softness, both of which are included in the aggregate softness rating that we publish.
To ensure consistency across different materials and constructions, we supplement traditional deflection testing (which applies force gradually) with additional measurements that apply force more rapidly. This allows us to capture differences in how materials respond under faster loading conditions, which can materially affect perceived softness (e.g., slower-responding materials may feel firmer under quick interaction than under slow compression). Incorporating both types of inputs enables us to better reflect the softness as actually experienced by a user, while maintaining comparability across products.
The softness rating scale is calibrated to reflect the full range of observed softness levels across the category and is presented as a preference-based characteristic, meaning there is no universally better or worse outcome – only what is more or less suitable for a given individual.
How is bounce measured?
Bounce characteristics are assessed using controlled, repeatable interactions designed to isolate responsiveness and energy return. We evaluate bounce at multiple depths within the mattress to reflect how it is experienced in real-world use, including surface-level bounce (e.g., light contact or initial movement), mid-level bounce (e.g., repositioning or rolling), and deeper bounce (e.g., full body weight loading such as sitting or dropping onto the mattress). These distinctions capture how each mattress responds under different conditions.
To measure this, we use drop-based interactions with precisely controlled, standardized inputs to simulate how force is applied at different intensities and depths, allowing us to observe energy return across these layers. This multi-level approach enables us to provide a more complete and realistic representation of bounce as experienced by a user, while maintaining comparability across products. Specific test mechanics are proprietary.
Are final ratings adjusted by editorial judgment?
The degree of editorial input varies by measurement type. For metrics based on qualitative or hybrid assessments, some structured judgment is applied by design. For metrics derived from quantitative testing, scores are primarily determined by the underlying data and translation formulas, with minimal editorial input once methods have reached maturity.
During the introduction or refinement of test methods, limited editorial oversight may be applied to ensure that the translation of raw data into normalized ratings remains consistent with established scales and prior results. This oversight is transitional and diminishes as methods are validated and standardized. In steady state, quantitative metrics are produced with little to no discretionary adjustment.
Can any raw test data be shared with participating brands?
We provide participating brands with a detailed report that includes the raw test outputs collected on their products. This data is shared confidentially and is intended strictly for internal use; it is not published on our website or shared with consumers, readers, or other third parties.
In addition, we can often provide contextualized insights to help interpret results, including how a given product performs relative to relevant peer groups within our database of tested products.