When Kohl’s, one of North America’s largest retail chains, was considering introducing a new product category, furniture, many executives were tremendously enthusiastic, anticipating significant additional revenue. However, the top management team felt this innovation was too risky, as it would significantly impact the layout and logistics of the retail chain. They decided to conduct a test at 70 stores over six months. With these 70 stores introducing the new category, while the others not. The 70 experimental stores showed a net decrease in revenue. This happened because products that now had less floor space to make room for the furniture experienced a drop in sales, so that Kohl’s was actually losing customers overall. The case of Kohl’s shows that, in the absence of sufficient data to inform decisions about proposed innovations, managers often have to rely on their experience, intuition, or conventional wisdom, none of which is necessarily relevant. Managers who adopt a scientific approach can, instead, resort to experiments in which companies separate an independent variable, the presumed cause, from a dependent variable, the observed effect, while holding all other potential causes constant, and then manipulate the former to study changes in the latter. The manipulation, followed by careful observation and analysis, yields insight into the relationships between cause and effect, which ideally can be applied to and tested in other settings. To obtain that kind of knowledge and ensure that business experimentation is worth the expense and effort, companies need to ask themselves the following crucial questions. The first regards the experiment purpose: Does the experiment have a clear purpose? Does the experiment focus on a specific innovation under consideration? What do managers hope to learn from the experiment? The second question regards the managers’ buy-in: Have stakeholders made a commitment to abide by the results of the experiment? What specific changes would be made on the basis of the results of the experiment? How will the organization ensure that the results aren't ignored? How does the experiment fit into the organization's overall innovation agenda? The third question regards feasibility: Is the experiment doable? Does the experiment have a testable prediction? What is the required sample size to achieve the results? Can the organization feasibly conduct experiments at the test locations for the required duration? The fourth question regards reliability: How can we ensure reliable results? What measures will be used to account for systemic bias, whether it's conscious or unconscious? Do characteristics of the control group match those of the test group? Have any remaining biases been eliminated through statistical analysis or other techniques? Would others conducting the same test obtain the same results? The fifth question regards value: Has the organization gotten the most value out of the experiment? Does the organization have a better understanding now of what variables are causing what effects? Hereby a few suggestions about how to conduct business innovation experiments effectively. The first, run the experiments on a testable and falsifiable hypothesis. It is tempting to run an experiment around a question such as: Is developing a new product feature worth the cost? Or: Should we lower or increase our research and development spending? Indeed, beginning with a question that is related to a company's strategy and to creating value for the customer is the right thing to do, but it's misguided to think that a single experiment will solve large problems. The reason is simple: multiple factors go into solving big problems. Managers must have the patience to break down these large problems into subproblems, develop a theory and specific hypotheses about them, and then run the experiments to test these hypotheses. The second is: use large interventions as sizable treatment. Companies experiment when they don't know what will work best. Faced with this uncertainty, it may sound appealing to start small in order to avoid disrupting things, or to overestimate the effect of a given treatment. If the goal is to see whether an intervention, the innovation, will make a difference to customers, the intervention should be large enough. The third is: gather the right data. Once identified what the intervention is, it is necessary to choose what data to look at. One important design attribute of good experiments is to have data to understand pre/post treatment effects. These diff-in-diff approach allows us to isolate the differential effect of the treatment and effectively interpret the differences between the experimental and the control group. Furthermore, managers should make a list of all the internal data related to the outcome they would like to influence, and when they will need to do the measurements. They should include the data both about things they hope will change and things that they hope will not change as a result of the intervention, because it is necessary to be alert for unintended consequences. Managers should also think about sources of external data that might add perspective. For example, consider a company who wishes to launch a new cosmetic product and has to decide which type of packaging leads to the highest customer loyalty and satisfaction. It may decide to run a randomized controlled trial, for example an A/B test on different packages, across geographical areas. In addition to measuring recurring orders and helpline customer feedback (internal data), one might want to track online user reviews on Amazon and look for differences among customers in different areas (external data). Fourth, identify the target population and pick the right sample. Managers should choose a subgroup of customers that matches the customer profile the company wants feedback from. It might be tempting to look for the easiest avenue to get a subgroup, such as online users, but these might lead to wrong inferences, if it does not represent target customers. Fifth. Whenever possible, use randomization, randomly assign some people to a treatment group and others to a control group. The treatment group receives the innovation the company wants to test, while the control group receives what the company previously had on offer, and make sure there are no differences. The first rule of randomization is to not let participants decide which group to be in, or the results would be meaningless. The second is to make sure there really are no differences between treatment and control. It's not always easy to follow the second rule, one classic form of randomized controlled trials are A/B tests, also known as split tests. These work very well in entrepreneurial settings where the company is trying to test greenfield options, for example of a given product. But in established companies with existing products, it is necessary to identify the appropriate counterfactual, and anchor the intervention to it. For example, a company might experiment by testing new features of an application on Sunday rather than on Monday. The problem is that Sunday users may be systematically different from Monday users, even if one controls for the volume of users on each day. The sixth suggestion is not to change the experimental design. Before managers run an experiment, they must decide how many observations they want to collect. This decision will affect not only the cost of the experiment but also its inferential power. In fact, larger sample sizes - other things equal - will improve the quality of the estimates and of the inference. Another important decision regards the magnitude and the time horizon of the experiment, which should be decided so that the intervention is strong enough to work, and long enough to drive the hypothesized changes. Rigorous experiments require that managers accept the results, and make decisions based on the evidence they offer. If they get the result they expected, this will encourage to pursue the idea or change contained in the intervention. If not, something important has been learned, and can inform further experimentation. It is totally wrong to make other interventions during an experiment, change decision threshold, or keep running the experiment until the results look good. Managers should stick to the experiment design and plan to the extent possible.