Adaptive Hybrid Methods for Choice-Based Conjoint Analysis : A Comparative Study

Adaptive choice-based conjoint analysis (ACBC) and hybrid individualized two-level choice-based conjoint analysis (HIT-CBC) were developed to improve standard choice-based conjoint analysis through additional interviewing techniques. Both methods have demonstrated their applicability in comparison to standard choice-based conjoint methods. The objective of our study was a direct comparison of the two adaptive hybrid methods ACBC and HIT-CBC. Therefore, we analysed the previous comparative literature on the methods and used the results to conduct both a Monte Carlo simulation study and an empirical study for validity comparisons. The simulation study confirms the vulnerability of HIT-CBC to produce incorrect ratings of respondents in the last part of the questionnaire. The empirical findings reveal an advantage of ACBC in comparison to the current version of HIT-CBC. We conclude that the rating tasks in the last section of HIT-CBC questionnaires reduce the predictive validity of the method and suggest an improvement to HIT-CBC.


Introduction
Conjoint analysis is widely used (Hauser et al., 2006).However, there are many different conjoint analysis methods and selecting the best method for a (research) problem is a difficult task for researchers and practitioners.Several new methods have arisen in the field of choice-based conjoint analysis (CBC) in recent years.In the present article, we evaluate two adaptive hybrid methods that were developed for product conceptswith many attributes and levels.
In classical CBC choice tasks, respondents choose their preferred (product) concept from a set of two or more concepts (Louviere & Woodworth, 1983;Batsell & Louviere, 1991).Because such choice decisions are similar to (purchase) decisions in reality, it is argued that CBC performs better than other methods for preference measurement (Louviere & Woodworth, 1983;Toubia et al., 2004), a claim supported by several studies (Louviere et al., 2004;Toubia et al., 2004;Eggers & Sattler, 2009).In recent years, online surveys have garnered increasing popularity and the rising online processing power allows for adapting questions on the basis of prior responses (Toubia et al., 2004).These advances led to methods thatcreate respondent-specific choice tasks during the survey(e.g., Toubia et al., 2004 andToubia et al., 2007 suggested (probabilistic) polyhedral methods and Yu et al., 2011 suggested Bayesian methods for the adaptive design generation).
Adaptive hybrid approaches use additional interview techniques to improve the efficiency of the choice tasks.This approach is known from adaptive conjoint analysis (ACA) introduced in the 1980s (Johnson, 1987).Adaptive hybrid methods for CBC were suggested by Johnson and Orme (2007) and Eggers and Sattler (2009).Eggers and Sattler (2009) introduced the hybrid individualized two-level choice-based conjoint approach (HIT-CBC), while Johnson and Orme (2007) developed adaptive choice-based conjoint (ACBC).It is claimed that due to the more elaborate interview technique, the new adaptive hybrid methods can handle product concepts with many attributes and levels more efficiently (Johnson & Orme, 2007;Eggers & Sattler, 2009).
In previous studies, ACBC and HIT-CBC demonstrated their applicability in comparison to the standard CBC approach, which does not include additional interviewing techniques.Our studyintends to offer thefirst direct comparison of these two adaptive hybrid methods.We particularly aimed to compare the performance of those methods in the case of complex products with many attributes and levels.Therefore, we conducted a Monte Carlo simulation study and an empirical study on car preferences for our validity comparisons.
In section 2, we provide the methodological background of ACBC and HIT-CBC.In section 3, we give an overview of previous comparative studies on either ACBC or HIT-CBC.In section4, we present the results of the Monte Carlo simulation followed by the results of our empirical study in section 5.The discussion of the results and the conclusion follow in the final section.

Adaptive Choice-Based Conjoint (ACBC)
ACBC was established by Johnson and Orme (2007) and several scientists have applied ACBC in recent years.For example: Chapman et al. (2009) conducted a study on consumer electronics.Jervis et al. (2012) studied sour cream.Boeschand Weber (2012) studied the industrial milk, potato and wheat markets.Heinzle et al. (2013) carried out a study on real estate investments.
The core of ACBC is the CBC portion of the questionnaire, but, due to the adaptive approach, it is possible to reduce irrelevant information and to make tasks more engaging for the respondents.The researcher defines the attributes and attribute levels before starting data collection in an empirical study.Thereby the questionnaire can be constructed with the Sawtooth Software package.In general, ACBC questionnaires can be divided into four main sections, which are illustrated in appendix B.
In the configurator section, respondents indicate the best level for each attribute.When the analyst assumes a general preference order for specific attributes, the evaluation can be omitted for those attributes.Different product concepts are shown in the screening section.Thereby respondents are not asked to make final choices, but they should indicate whether or not concepts are a possible choice.The design of the concepts is regulated so that mainly the individually preferred levels from the configurator section are presented in the screening section.
If potential non-compensatory screening rules can be detected from previous questions, respondents have to state if an attribute level is a must-have or an unacceptable level.If several of such levels are detected, respondents must choose the most desired or least acceptable.In the choice task section, respondents evaluate the remaining concepts from the screening section.The "winning" concept in each choice set advances to the next stage until a final winner is identified.
The design algorithm of ACBC generates random designs that are not optimal regarding the design efficiency measures.Nevertheless, the set of the algorithm ensures near-orthogonality and high design efficiency values.
In ACBC, individual part-worth utility values are estimated with the Hierarchical Bayes algorithm.Apart from the data from the choice task section, further information can be used for utility estimation.Every choice from the Configurator section may be coded as a choice task where respondents choose 1 of K levels.Information from the screener section can be coded as binary choices where the respondent is assumed to compare the utility of the concept to the utility of a constant threshold prior to making a choice.Part-worth thresholds for the none-option can be estimated via the dropped concepts from the screening section.
Additionally, several more optional components are available in ACBC.For example, a calibration section at the end of the survey can be used to estimate a further part-worth threshold for the none-option.Furthermore, the summed prices approach enables the analyst to incorporate conditional prices.

Hybrid Individualized Two-Level Choice-Based Conjoint Approach (HIT-CBC)
HIT-CBC was established in Eggers and Sattler (2009).In addition to the initial application of Eggers and Sattler, we found two further published studies using this methodology (Kaltenborn, 2012;Zenker et al., 2013).Eggers and Sattler (2009) conducted a study on flights, Kaltenborn (2012) on the German telecommunication market and Zenker et al. (2013) applied HIT-CBC to residence location choice.
In HIT-CBC, the core of the questionnaire is the CBC portion.The identification of best and worst levels in the initial section renders it possible to simplify the choice design to a fixed 2 k design, which is design-efficient and Paretooptimal.
HIT-CBC surveys can be divided into the following three main stages, which are also illustrated in appendix C. In the first stage, respondents identify the best and worst levels for every attribute.When the analyst assumes a general preference order for specific attributes, the evaluation of the best and worst levels can be omitted for those attributes.Only the best and the worst levels of each attribute are included in the choice task section, in which a none-option can be incorporated optionally.In the last stage, respondents evaluate the remaining levels on a rating scale with the identified best and worst levels as the maximum and minimum scaling points.
HIT-CBC uses a fixed design approach.Due to the fact that only the best and worst levels identified in the first section appear in the choicetasks, every design reduces to a 2 k factorial.A shifting procedure algorithm is applied for the generation of the 2 k design (Burgess & Street, 2003).The algorithm is easy to implement and ensures design efficiency and orthogonality.Due to the knowledge of best and worst levels, dominated choicesets can be excluded a priori.As a consequence, the remaining choicesets are Paretooptimal.
An advantage of the 2 k design is that the number-of-levels effect (NLE) cannot occur in HIT-CBC.The NLE can occur when the numbers of levels are not equally distributed among the attributes.It biases the results since attributes with a higher number of levels tend to have an artificially higher attribute importance (Wilde et al., 2008) Individual part-worth utilities are estimated with the Hierarchical Bayes algorithm.Because of the 2 k design, only parameters for the best and worst levels must be estimated with the algorithm.Utilities for the remaining levels are adjusted according to the ratings of the interpolation section.
An additional section (WTP elicitation) can be incorporated after the identification of the best and worst levels.In that stage, respondents state their willingness to pay for two concepts.One consists of the best levels for every attribute; another, of the worst levels.Consequently, prices for every other concept must lie in between these two stated prices.

ACBC-Related Validity Comparisons
We identified four studies dealing with the validity of ACBC.These studiesall compared the validations of standard choice-based conjoint models to ACBC.Validity was evaluated in terms of hit rates and (market) share predictions for holdout tasks.Three of these studies showed a predominance of ACBC: Johnson and Orme (2007) evaluated validity and several qualitative criteria in two studies.The first study was on laptops (ten attributes).ACBC showed comparable market share predictions (in terms of mean absolute error) and significantly greater hit rates than standard CBC.Even though the median interviewing time in ACBC was almost twice the time in standard CBC, the majority of the respondents preferred ACBC.The second study of Johnson and Orme (2007) was on recreational equipment (eight attributes) and produced comparable results.However, a significant difference in hit rates was not observable.Orme and Johnson (2008) studied home purchases (ten attributes).They compared three different setups of ACBC and standard CBC.With regard to hit rates, ACBC yielded favourableresults, although differences between ACBC and CBC were not significant.In terms of market share predictions, descriptive analysis of mean absolute error implied a predominance of ACBC.The qualitative results (e.g., respondent's enjoyment of the tasks in the questionnaire) demonstrated that respondents preferred ACBC.Chapman et al. (2009) studied a consumer electronics product with eight attributes.In terms of within-subject preference (hit rates in holdout tasks), CBC performed slightly better.However, a significant difference could not be observed.Additionally, the authors analysed the performance of the methods with regard to observed market data with better results in ACBC.

HIT-CBC-Related Validity Comparisons
We identified two studies dealing with the validity of HIT-CBC.Eggers and Sattler (2009) compared HIT-CBC to standard CBC.They conducted a simulation study and an empirical study on European flights (six attributes).The simulation study showed a predominance of HIT-CBC in two different scenarios.Variances in utility estimates were smaller, and medians of the utility estimates were closer to the theoretical values.The predominance of HIT-CBC was clear, especially in the case of heterogeneity among "respondents."Inthe empirical study, no significant difference between the methods was observed with regard to hit rates for holdout tasks.Market share predictions led to similar results.Additionally, they found that the overall enjoyment of the respondents was similar to CBC.Kaltenborn (2012) studied mobile phone contracts (seven attributes) and compared the validity of HIT-CBC, Graded Paired Comparison and standard CBC.On the one hand, he found significantly greater hit rates for CBC compared to Graded Paired Comparisons and HIT-CBC.On the other hand, market share predictions for HIT-CBC yielded the best results.Therefore, a clear predominance of one method was not observable in this study.

Summary
Few scientific studies have compared ACBC and HIT-CBC to other methods.The ACBC-related studies identified in section 3.1 compare the validity of ACBC to standard CBC.The results imply a predominance of the adaptive hybrid method compared to standard CBC.For HIT-CBC, we identified only two comparative studies.The study of Kaltenborn (2012) analysed hit rates and market shares leading to inconsistent results.The study of Eggers and Sattler (2009) compared HIT-CBC to standard CBC.The empirical study revealed similar results for HIT-CBC and CBC.Their simulation study revealed a predominance of HIT-CBCwhen ratings for the intermediate level utilities (levels between best and worst levels) of HIT-CBC were correct.In practice, this is only true if the ratings in section three of the HIT-CBC questionnaire are carried out correctly by the respondents.In the case of incorrect ratings, the utilities are biased and the results may differ.However, Eggers and Sattler (2009) did study the effect a posteriori and not as a part of the simulation study.In our simulation study, we investigate the effect of incorrect ratings in the HIT-CBC questionnaire directly (section 4).

Simulation Study
Previous research on the adaptive hybrid methods ACBC and HIT-CBC compared these methods to standard CBC (and in one case to Graded Paired Comparisons).In our inquest into the literature, we did not identify a study with a direct comparison between ACBC and HIT-CBC neither in terms of simulations norempirical studies.Therefore, we compare ACBC, HIT-CBC and standard CBC in our simulation study.Within that study, we defined eight attributes: three attributes having five levels, two attributes having four levels, two attributes having three levels and one attribute having two levels.Theoretical part-worthutilities have been constructed according to the suggestions of Arora and Huber (2001) and Eggers and Sattler (2009) and can be found in appendix A. In order to account for the effect of preference heterogeneity, we varied the theoretical utilities values among three groups.Every group consisted of 50 simulated respondents leading to a sample size of n = 150 simulated respondents.Choices in the different choice sets were simulated in the following way: we varied the individual utilities by adding a Gaussian distributed error value with mean zero and standard deviation 0.5 to the theoretical utility value.Then we predicted choices according to the first choice rule.For ACBC and HIT-CBC, we assumed that best and worst levels were identified correctly.
For the rating tasks in section three of HIT-CBC, we defined three different scenarios.In the first scenario, we assumed that the ratings were correct.In order to take into account the possible vulnerability of HIT-CBC to incorrect ratings, we defined two further scenarios.In scenario 2, we varied the ratings by adding an error value with mean zero and standard deviation one; in scenario 3, by adding an error valuewith mean zero and standard deviation two.For the estimation of individual part-worths, we applied the Hierarchical Bayes (HB) algorithm from Sawtooth Software.The HB estimations exhibited the same settings for all methods.For the evaluation of the methods, we evaluated the mean absolute error over all utility estimates and respondents (Table 1).The adaptive hybrid methods yielded significantly (p<0.01)lower mean absolute errors than standard CBC with one exception.When the ratings in section 3 were assumed to be biased with a mean deviation of 2 (scenario 3) rating points on an eleven point rating scale, HIT-CBC did not perform significantly better than standard CBC (alpha = 0.05).
Comparing ACBC and HIT-CBC produced similar results when the rating is biased with a mean deviation of 1 rating point (scenario 2).

Empirical Study
The aim of the empirical study was to compare the validity of ACBC and HIT-CBCin a real-world application with a complex product.Standard CBC was not superior to ACBC or HIT-CBC inprevious studies nor in our simulation study.Therefore, we decided to abstain from standard CBC in the empirical study and rather to concentrate on a direct comparison of the two adaptive hybrid approaches.For this purpose, we conducted an online study aiming to measure preferences for cars in the German market.

Research Design
The adaptive hybrid methods ACBC and HIT-CBC were suggested for the measurement of preferences in the case of complex product concepts (i.e., many attributes and levels) (Eggers & Sattler, 2009;Sawtooth Software, 2009a).In order to compare the methods in a meaningful application, we chose cars as an example of a complex product with a total of eight attributes and 31 attribute levels (Table 2).
Preferences for cars in the German market were assessed in the choice experiment.Our discrete choice experiment was hypothetical, and the aim of our study was to compare the validity of ACBC and HIT-CBC.In developing a choice experiment, the first step is to decide attributes and attributes levels.This determination influences all results and is consequently very important for a preference study (Helm et al., 2008).We developed the attributes and corresponding levels for our study in the following way.We first reviewed scientific and commercial literature to detect possible items (e.g., Bunch et al., 1993;Moore, 2004, Eggers & Eggers, 2011;Rust, 2011).Afterwards, we ran expert interviews and a pretest to determine the attributes and attribute levels listed in table 2. We found that alternative drive systems are an important issue for car purchases, and we identified battery electric vehicles, natural gas vehicles and hybrids as the most important alternative drive systems.In our study, we collectively termed range extended electric vehicles (REEV), plug-in hybrid electric vehicles (PHEV) and hybrid electric vehicles (HEV) as "hybrids."Because of their very early market stage and the lacking filling station infrastructure in Germany, fuel cell vehicles were not used in the study.First-generation biofuels were excluded from the study as well.In Germany, biofuels are mostly used as an admixture in petrol and are not regarded as an actual alternative to conventional drive systems.
The attribute "vehicle equipment" was explicitly explained in the online questionnaire in the following way: "Grassrootsequipment includes air conditioning, car radio with CD and USB, electric windows, six airbags, ESP and TCS.Premium equipment includes in addition to grassrootsequipment heated seats, car computer with navigation system, alarm, parking sensors, lane departure warning system and bending light." For the evaluation of cars' eco-friendliness, several authors studied the attribute CO2 emissions (Bhat et al., 2009;Achtnicht, 2012;Ziegler, 2012;Olson, 2013).However, expert interviews and the pretest revealed that this attribute does not belong to the most important attributes and we consequently excluded it from the study.
The levels of the brand attribute facilitate diversity in prices ranging from low-priced (Kia) to premium brands (BMW).We also included brands that, at the time of our study, were already successful in selling alternative fuel vehicles (Toyota hybrids, Renault battery electrics and VW natural gas vehicles) on the German market.
Purchase and energy prices represent typical German market prices from the compact class segment at the time of our study.We did not exclude any combinations of attribute levels that were unrealistic at the time of our study (e.g., battery electrics with high driving ranges), as respondents were explicitly asked to make their choices in a hypothetical situation.This setup allowed us to anticipate the technological innovations in the German car marketin the next years ahead.
For measuring preferences, it is important that potential respondents have a basic knowledge of the car decision problem in general and alternative drive systems in particular (Helm et al., 2008).Therefore, we selected students who acquire technical knowledge concerning cars during their vocational training.Every respondent answered the HIT-CBC and the ACBC tasks in a randomly determined order in the online survey.This randomized within-subjects design allowed us to compare the methods for each respondent individually and to preclude order effects.For a discussion of the within-subjects design, see Charness et al. (2012).Additionally, respondents evaluated both methods in terms of closeness to reality, task simplicity, enjoyment and their overall pleasure.The online survey was open from November 2012 until January 2013.Altogether, 423 respondents completed the survey.
The HIT-CBC part was arranged in the following way.In the first part, respondents selected the best and worst levels for all attributes including price.In order to reduce complexity, we treated price as a "usual" attribute, i.e., we refrained from the WTP elicitation.Afterwards, respondents considered twelve choice sets, which were presented in a randomly determined order.These choice sets consisted of three alternatives each.A none-option was not given.In the last part of our study, the respondents evaluated the remaining levels on an eleven point scale.
The ACBC portion was created according to the suggestions of SawtoothSoftware (2009a).We set the number of screening tasks to 7; the number of concepts per screening task to 4; the minimum of attributes to vary from BYO Selections to 2 and the maximum to 3; the number of unacceptables was set to 4 and the number of must-haves to 3. The maximum number of products brought into choice tournament was 16 and the number of concepts per choice task 3.No calibration concepts were included in the survey.Again price was treated as "usual" attribute, i.e., we refrained from the summed pricing approach in order to reduce the complexity of the tasks.
To measure the internal predictive validity, respondents additionally evaluated three holdout choice sets, which were not used for estimation of individual utility values.Furthermore, a holdout group of 66 respondents (Group 2) served as an external prediction benchmark to assess cross-sample predictive validity.This group only evaluated the three holdout choice sets mentioned above.
For data analysis the software packages from Sawtooth Software were applied.For assessing validity measures and further statistical test R was used.

Empirical Results
In order to compare the validity of ACBC and HIT-CBC, we applied the following assessment criteria:  Goodness of fit  Predictive validity (regarding hit rates and RMSE)  Qualitative criteria (respondents' evaluation of the methods, interviewing time analysis) For the estimation of individual part-worths, which were the basis for most of the assessment criteria, we applied the Hierarchical Bayes algorithm (Sawtooth Software, 2009b) to both methods.

Goodness of Fit
The goodness of fit of the model indicates the internal information processing within a method.In our empirical comparison, we assess the goodness of fit through the pseudo-R² measure, which is a typical measure for logit models (McFadden, 1974;Hauser, 1978;Hensher et al., 2007).As mentioned above, we applied Sawtooth Software's Hierarchical Bayes algorithm for the estimation of the individual part-worths in ACBC and HIT-CBC.
The model output contains the percent certainty measure, which is equivalent to the pseudo-R² measure (Hauser, 1978;Sawtooth Software, 2009b).
The pseudo-R² of a choice model is not exactly the same as the R² of a linear regression model.However, there exists a direct empirical relationship between the two measures, and pseudo-R² values between the range of 0.3 and 0.5 can be translated as an R² of between 0.6 and 0.9 for the linear model equivalent (Domencich & McFadden, 1975;Hensher et al., 2007).Therefore, a model with a pseudo-R² of 0.3 can be interpreted asa decent model (Hensher et al., 2007).In our empirical study, both methods revealed similar pseudo-R² values and both methods led to decent model fits (ACBC: pseudo-R²=0.529;HIT-CBC: pseudo-R²=0.476).

Predictive Validity
Predictive validity indicators assess the ability of a method to predict real choices (Helm et al., 2008).Following the published scientific literature (Toubia et al., 2004;Helm et al., 2008;Sawtooth Software, 2009a, Chapman et al., 2009;Eggers & Sattler, 2009;Meißner et al., 2011), we analysed hit rates and market share predictions for measuring internal predictive validity.Hit rates for the three holdout choice tasks were computed from expected part-worthsthrough the first-choice rule, and market share predictions were evaluated with mean absolute error (MAE).Both measures were tabulated as mean values across the three holdout choice tasks.Accordingly, external MAE was derived for the holdout group (Group 2).Overall, the predictive validity indicators demonstrated that both methods were able to predict choices (table 3).
The majority of the choice predictions were successful for both methods.In particular, both hit rates were significantly (McNemar's test; p<=0.001) higher than the random hit rate of 25% (as we presented four stimuli in the holdout sets).Comparing the two analysedmethods, ACBC performed better than HIT-CBC in the three holdouttasks (McNemar's test p<=0.05).ACBC also performed better in market share predictions as shown in the internal and external MAE (table 3).

Qualitative Criteria
We asked respondents to evaluate the two methods in terms of closeness to reality, task simplicity, enjoyment and their overall pleasure using a seven point scale from 7=high to 1=low.The assessment of these subjective criteria is important, since tasks that are too difficult or too monotonous may lead to less carefully determined trade-offs and consequently to less validity (Hartmann & Sattler, 2004).
In every category, ACBC was preferred by the respondents (t-Test, p < 0.05) (table 4).As mentioned in section 5.1, the presentation order of the methods was randomly determined for each respondent.Therefore, the differences cannot be explained by order effects.In HIT-CBC, task simplicity was evaluated poorly.Since sections one and two of HIT-CBC are quite similar to the tasks in ACBC, we suppose that the rating tasks in section three are crucial for the poor evaluation of the task simplicity criterion.We therefore assume that the rating of the intermediate levels is complicated for respondents.Another important criterion is the time needed to complete the survey.Very time-consuming questionnaires may tire or bore respondents, leading to an increasing threat to biased evaluations.As longer survey times usually incur higher costs, time is also a cost factor for commercial studies (Helm et al., 2008).Regarding the time respondents spent on the methods, we did not find a significant difference (t-Test, α = 0.05).The average time used per method was approximately 9 minutes (ACBC 551s; HIT-CBC 535s).
Additionally, we regarded the number-of-levels effect.While ACBC is vulnerable to the effect, it cannot occur in HIT-CBC because of the 2 k design in the choice sets.Previous research suggests that number-of-level effects are most likely for respondents that do not have well defined preferences (Wittink et al., 1989;Wittink et al., 1992;Hair et al., 2010).The inclusion of alternative fuel vehicles in our car preference study allows us to account for a degree of preference uncertainty.Battery electrics, hybrids and natural gas vehicles are new products on the German market and their market share is very small.In 2012, the market share of alternative fuel vehicles was only 0.96% in new vehicle registrations (German Federal Motor Transport Authority, 2013).Thus we assume that most respondents do not have well defined preferences for alternative fuel vehicles.
In order to study the number-of-levels effect, we analysedindividual attribute importances of the attributes with the most attribute levels: brand, drive system and price, each of which had five levels.For brand and drive system, the individual attribute importance was not significantly higher in ACBC than in HIT-CBC (t-Test; alpha = 0.05).For the attribute price, we observed a significantly higher importance for ACBC (t-Test; p < 0.001).Therefore, clear evidence for the occurrence of the number-of-levels effect in ACBC could not be observed in our empirical study.In the recent literature, we could not identify any scientific studies analysing the number-of-levels effect for ACBC.This is an opportunity for further research.

Discussion and Conclusion
Recent adaptive hybrid methods refine the concept of CBC by introducing respondent-specific choice tasks.The aim of our work was to comparethe validity of two adaptive hybrid methods, namely ACBC and HIT-CBC, which were suggested for the measurement of preferences in the case of complex product concepts.For this purpose, we reviewed scientific literature for previous comparative studies on the methods.Based on the results of the literature review, we conducted a Monte Carlo simulation study andan online study on the German car market for empirical validity comparisons.In order to compare the methods in a meaningful empirical application, we chose cars as an example.We defined eight attributes and a total number of 31 attribute levels.
The simulation study revealed advantages of the adaptive hybrid methods in comparison to standard CBC.However, it also revealed the vulnerability of HIT-CBC to incorrect ratings of intermediate attribute levels.The empirical study showed that both methods were able to apply to the complex study design.Regarding predictive validity measures (hit rates and market share predictions) and several qualitative criteria, we found that both methods were able to predict choices.The direct comparison of the two adaptive hybrid methods ACBC and HIT-CBC demonstrated that especially in case of hit rates and the qualitative criteria, ACBC performed significantly better than HIT-CBC.
The analysis of qualitative criteria (closeness to reality, task simplicity, enjoyment and overall pleasure) is important, since questionnaires that are too difficult or too monotonous may reduce a method's validity (Helm et al., 2008).The results showed that the complex study design did not lead to dissatisfaction of the respondents and consequently showed the applicabilityof both methods to the complex study design.This finding is in line with former studies on ACBC (Johnson & Orme, 2007;Orme & Johnson, 2008;Sawtooth Software, 2009a) and HIT-CBC (Eggers & Sattler, 2009).More specifically, we found that ACBC outperformed HIT-CBC with regard to the four measured qualitative criteria.The criterion with the worst rating was the task simplicity of HIT-CBC.Eggers and Sattler (2009) posited that HIT-CBC might be vulnerable to incorrect ratings of intermediate levels.
Their simulation study revealed that incorrect ratings bias aggregate level analyses substantially.However,they did not study the effectdirectly, i.e., they included a rating error a posteriori and not during simulations of individual responses.Therefore, onefocus of our simulation study was the actual simulation of incorrect ratings.We found that the performance of HIT-CBC depends to a great extent on the ratings in the last section of the HIT-CBC questionnaire.With that result in mind, our comparative empirical study yields a further important outcome.Firstly, we found that the predictive validity of HIT-CBC was worse than ACBC, even though the simulation study showed that both methods should lead to similar results if the rating error in section 3 of HIT-CBC is small.Secondly, respondents evaluated the task simplicity of ACBC significantly higher than for HIT-CBC.Since the first sections of HIT-CBC are quite similar to the analogous tasks in ACBC, we suppose that the rating tasks in section 3 are crucial to the poor evaluation of the task simplicity criterion of HIT-CBC.We therefore assume that respondents have difficulties in these rating tasks leading to incorrect ratings and consequently less predictive validity of HIT-CBC.We suggest a methodological improvement of HIT-CBC in the rating sectionthat could be tested in future research.A possible solution might be to create rating tasks that are more appealing graphically.Additionally, it might be beneficial to combine the first section (identification of the best and worst levels) and the rating section of HIT CBC.
In our empirical study, we did not observe a significant impact of the number-of-levels effect on the results.However, the main focus of our study was validity comparisons rather than an analysis the number-of-levels effect.As we could not identify any scientific studies analysing the number-of-levels effect for ACBC in recent literature, our results may provide a first reference for future studies on the number-of-levels effect in ACBC applications.
When interpreting our findings, some limitations of our study should be taken into account.Firstly, the attribute price was treated as an "ordinary" attribute.However, both methods also provide opportunities for a deeper study of the price attribute and consequential of WTP figures.One question ripe for further research is how the inclusion of additional pricing components such as the summed pricing approach in ACBC and the WTP elicitation in HIT-CBC influences the empirical results and the validity of the methods in different applications.
The second limitation of our study concerns the measurement of predictive validity.For this purpose, we included holdout tasks in the questionnaire.Additionally, a holdout group served as an external prediction benchmark.In order to overcome the hypothetical bias of such an approach, several authors suggest observing real purchase decisions for validity assessments.Wertenbroch and Skiera (2002) and Ding (2007) evaluated real purchase decisions directly in the survey.Chapman et al. (2009) compared the predicted market shares from internal holdout tasks to observed actual market shares.However, for big-ticket items and very innovative product concepts (for which no physical products are on the market so far), the inclusion of real purchase decisions is quite challenging, although it would be an interesting topic for further research.
In the simulation study we mainly investigated the impact of incorrect rating in HIT-CBC.Of course, there are several topics, which also would be interesting to study (e.g., number ofattributes, number of attribute levels, number of simulated consumer groups).This is a direction for further research.
The findings of our study provide implications for practitioners.As our results imply a superiority of ACBC, we suggest this method for practical applications.However, since ACBC might be vulnerable to the number-of-levels effect, we advise that those who would use it be alert to the possible occurrence of the effect, especially in studies where respondents do not have well defined preferences (Wittink et al., 1989;Wittink et al., 1992;Hair et al., 2010).We do not recommend the application of HIT-CBC in its current version.The vulnerability of the method to incorrect ratings in the last section of the questionnaire should be remediedbefore practical application.
For the analyst, the choice of the method may also depend on practicability.Sawtooth Software provides a software package for ACBC that enables the analyst to create the questionnaire and, after fielding, to derive utility estimates (Sawtooth Software, 2009a).Such a software package is notyet available for HIT-CBC.While the experimental design is fixed and easy to create for HIT-CBC, this should not strictly prevent application of this method because as well Sawtooth Software's CBC/HB package as the R package "bayesm" can be used for utility estimations (Rossi, 2013).

Table 1 .
Mean absolute errors in the simulation study

Table 2 .
Traditionally, the car market consists of several vehicle types that are often differentiated by means of vehicle size.Since several car attributes vary heavily among vehicles types, we concentrated on one vehicle type in our study, namely the compact class type.The compact class was chosen because vehicles of that class were mostly sold during recent years.The market share of compact class vehicles was 23.8% in 2012(German Federal Motor  Transport Authority, 2013).Explored attributes and attribute levels for cars

Table 3 .
Predictive validity assessment in terms of hit rates and MAE

Table 4 .
Respondents' evaluation of qualitative criteria for ACBC and HIT-CBC * Statistically different at p < 0.05 (t-Test).