Quantcast
RESEARCH MANUAL

I. Purpose of Manual

II. Roles of market research

III. Market research methodology

 

I. Purpose of this manual

The purpose of this manual is to provide managers with guidelines for the techniques and methods of various market research studies that have proven effective and meet accepted standards for reliability, validity and usefulness.

Marketing research is a tool the purpose of which is to help reduce risk in the process of marketing and business management. As such it is critical to understand the different ways that marketing research can be used and, importantly, the kinds of studies, research design, interviewing techniques, samples, that are most appropriate for the situation.

This manual is designed to give managers an understanding of the alternatives available, the basic terminology used by researchers, the pros and cons of certain popular research techniques, a recommended direction in each of a number of specific situations.

The manual will cover these subjects:

new product research
product research
strategic research
communications research/testing
package testing
name testing
pricing research
tracking studies
sampling
qualitative vs. quantitative research
the research brief/action standards

The minimum recommended standards for each type of research will be presented, along with a review of the alternative methods and techniques available. Such issues as interview method, sample size and design, analysis plan, type of stimuli to be used, statistical tests to be applied will be discussed.

The goal of this manual is to give those who do not have access to staff research counsel a set of ground rules and guideposts for the utilization of market research.

II. Roles of Market Research

The main role of market research is to reduce risk in making decisions. Its function is to bring to the table the opinions, reactions, attitudes and beliefs of the customers whom we intend to target our brands. Ultimately their responses will determine if we have made the best decision for our business. Research is essentially used to predict what will happen before we commit ourselves to a course of action that has major economic consequences.

There are basically four kinds of decisions in which marketing research plays a role: decisions involving the product (taste, texture, color, ingredients, aroma, strength, formulation); decisions involving communications (advertising strategy, package design and graphics, campaign strategy, advertising execution, media strategy); decisions involving price; and decisions involving the brand (overall brand position, extension opportunities, growth strategy). In most cases, these decisions concern existing brands, but in many instances they involve new to market products.

back to top

III. Market Research Methodology

A. NEW PRODUCT RESEARCH

1. Types of Studies

There are three types of research that can play a role in developing new to market products: planning studies that provide strategic direction, exploratory studies that help to generate ideas and concepts and evaluative studies/concept tests that help to determine the appeal of new concepts and products.

a. Planning studies

These are quantitative. They are concerned with learning more than "what" but "how many" and "how often". These include studies that "map" a market to provide perspective on where to position new to market products and the strengths and weaknesses of current brands; segmentation studies can identify niches to be targeted with new products; attitude and behavior studies are helpful in determining the volume potential of products and categories based on incidence and frequency of use/purchase data.

b. Exploratory studies

These are qualitative. Their purpose is to stimulate ideas, to aid in concept development. They include focus group sessions, one-on-one depth interviews, in-market observations and a range of variations on these themes.

"Brainstorming" sessions with participants from sales, production, marketing and the trade can be very productive if properly led and focussed. Moderators can be found who are expert in leading such sessions and in synthesizing the results.

Exploratory research is most productive when a new to market strategy has been developed. Knowing the main objectives, target and sources of business that are being sought are key to this strategy. Launching a series of focus groups before this is done is wasteful and inefficient.

There are no rules for exploratory research except to recognize that it is not conclusive or decisive information. It should be viewed as anecdotal. Bear in mind that increasing the number of focus groups from two to ten does not make the investigation "quantitative" or representative. An idea that springs from one interview is as good as one that comes from ten. The issue to determine is whether the idea has marketing validity. That is where evaluative research comes in.

c. Evaluative studies/concept tests

These are quantitative. Their purpose is to determine if a new to market concept has enough potential to be developed. A new to market product concept is a "bundle" of elements: core idea, name, position, package and graphics, price. It is important to evaluate these elements in combination, since they each will impact on the appeal of the whole bundle.

The most commonly used technique for evaluating new to market ideas early in development is "concept testing". One form of concept is verbal; a statement that describes the core idea of the product devoid of any embellishment; another form is visual or graphic; a combination of words and pictures designed to convey the core idea and some of the intangible or emotional qualities that are associated with the concept. Use a mock up or sample of the product itself, in combination with either the verbal or visual stimuli is suggested if feasible.

Having selected the form of execution, the concept is reproduced and shown to people who are judged to be in the market target. Their reactions are determined by using a structured questionnaire. The questions include intent to buy (usually measured with a "likelihood to buy" scale), an assessment of the uniqueness of the concept; some rating of its perceived qualities compared to an existing product (scaled "degree of difference"), and price/value questions. If a prototype or sample product is available, the interview should include a tasting, followed by another series of intent to buy and perception/image questions.

If a structured questionnaire is used, it is possible to evaluate as many as 15-20 concepts in a single test, though only a few can be tasted by any one respondent. When conducting such tests it is advisable to include concepts for existing brands presented in a similar form. Reactions to these will provide a benchmark or control against which the performance of the new concepts can be judged.

Sample size for such tests should be a minimum of 100-150 target people per cell. The exposure and interviews should be conducted individually, avoiding the influence of other people's reactions on the respondent.

back to top

B. PRODUCT RESEARCH

The three most typical situations requiring product testing are:

1) change in formulation of an existing product.
2) new to market product development
3) competitive product challenge

1. Product testing methods:

"Blind" vs. Branded

"Blind" testing, where the respondent is unaware of the brand name(s), is used when issues of product formulation are being tested, such as in a cost-reduction formula change. In this instance, the purpose of the research is to determine how many people detect a difference between the original product and the proposed new formulation(s). Double blind triad tests are used to determine consumers' ability to detect formulation changes on existing brands. Respondents are asked to "identify" the different product among three (two of which are the same).

Another situation that calls for "blind" testing is the evaluation of a new to market product vs. a well established competitor. Here, the issue is to determine if the proposed new to market product appeals to enough people on its own merits to warrant continued development. By testing "blind", the respondents' reactions reflect only on the products, and the considerable image advantage of the competitive brand is kept out of the picture.

Branded product tests are used when one wishes to determine the combined impact of formulation and brand image on people's reactions. Bearing in mind that a strong brand will perform better than a weak one in a branded test, even if its formula was not preferred by a majority in a blind test, ultimately a product must face the competition in branded form.

Therefore a branded test should be used as the final evaluation of a product's ability to draw business from established competitors in its category or competitive frame.

Monadic versus paired comparison

A monadic test is the presentation of a single product for evaluation. Respondents are asked to sample the product and evaluate it in the context of their own standards, usually by comparing it to the "brand they use most often", their "regular" brand, or by using some overall evaluative scale--e.g. "excellent, very good, good, fair not so good, poor". Paired comparison tests involve giving a respondent two products to sample at the same time and asking him/her to choose between the two and to indicate why one is preferred over another.

Paired comparison testing is advised early on in product formulation, when the issues are to select a final formula to move ahead with. Alternative formulations may be tested versus each other as well as versus those of major competitors. These tests should be performed "blind" and the order of presentation must be rotated since the first-presented product is usually given a more favorable rating.

Monadic testing is advised when a "final" formulation is selected. In this instance, it is preferable to have the product evaluated in-home over time, rather than in a one-taste situation. This approach will give a more realistic appraisal of the way people will respond to a product after using a bottle. Most spirit products need to be evaluated over time since there may be either a wear-out effect or a build in appeal that will not be detected by a single taste test.

Central location/intercept vs. in-home

Most product tests are conducted in central locations, such as malls, halls, survey centers, meeting rooms, and focus group facilities. Respondents are recruited to attend a "taste test". Recruiting on the spot is called "intercept" interviewing. The other approach "prerecruits" people, usually by telephone or mail. Respondents are promised a gift (usually money) for their participation. Compared to in-home tests, central location tests have the advantage of being less expensive and quicker to do. Central location testing is best when a one time taste of the product, either monadic or paired comparison, is used.

It is best used when choices among alternatives are being made, not when a "final" formula is being evaluated prior to market introduction.In-home testing involves the respondent being given a sufficient supply of the test product to bring home and to consume in a "normal" way over time. He/she may be interviewed by phone, in person, or by a self administered questionnaire that is mailed to the research office. This method provides a more realistic appraisal of a product's appeal, and while not predictive of market place performance, it will reveal if a product has a short "honeymoon" before people tire of its taste. This is the only way to find this out short of marketing the product on a small or large scale.

Sequential monadic test

A variation on monadic tests is called "sequential monadic" or sometimes, monadic/comparison testing. In this design, a respondent is given a sample of one product to taste, usually in a central location setting, and asked to evaluate it against his or her own standards; essentially a monadic evaluation. A second product is then produced and the person is asked to taste and compare it to the first sample. The evaluation is framed as a direct comparison between the two products. Thus a single interview is used to produce both a monadic evaluation as well as a paired comparison. To equalize the bias of presentation order, they are rotated from person to person for monadic evaluation. This technique is useful when several formulas are being tested for a single product. In this case, the test formulas can be evaluated monadically and a competitive product may be used in the second or paired comparison portion of the test.

2. Sample size and composition

A cardinal rule for product testing is that it be conducted among people who are in the target group for the brand or the category. A cost-reduction formula test must be conducted among core users of the brand. A test of a new to market product should be conducted among users of the category. If the entry is designed to compete directly with one or two major brands, the users of those brands should be the subjects of the test. If the user group for a brand comprises men and women, both should be included in its testing.

If ethnic segments represent the target for a product, the test must be conducted among the members of this segment. Quotas may be established to ensure that various subgroups (age, gender, ethnic background, etc.) are represented in their proper proportions. In some instances, where a subgroup is particularly important to the success of a brand, additional interviews may be conducted to permit separate analysis of their responses.

The size of the sample should be determined by two factors: 1) the subgroups in the study that will be analyzed independently; and 2) the degree of risk involved in the decision being tested. Normally, a cell of 100-150 target respondents will provide a sufficient base to analyze responses and compare reactions to other independent cells in a monadic test. If it is important to analyze men and women independently, then it is advisable to have at least 100 of each in each cell or group being analyzed. In tests where significant economic consequences will result from the decision, it is wise to use larger samples (200 per cell) so that smaller differences will be needed for "statistical significance".

Significance testing

"Significance testing" is used to determine how confident one may be in interpreting a difference between two products' ratings,(or the ratings assigned by two subgroups) e.g. on taste appeal. This is a statistical test that measures the amount of variation that is likely to occur within and between the test cells "normally" (i.e. the "noise" in the system). For a difference to be "significant" it must be larger than the level of difference produced by the "noise" in the system. As a sample gets larger, the size of the difference needed to reach significance is reduced. Conversely, the smaller a sample gets, the larger the difference must be between two cells to reach "significance". If significance tests are applied, it is suggested that the research agency test at the 95% confidence level (see section F).

3. Questionnaire design

There are basically two kinds of questions in a product test questionnaire: evaluative questions, designed to determine whether or not a person likes the product and the intensity of liking or dislike.

These questions may take the form of a verbal "overall rating" (excellent, very good, good, fair, not so good, poor"), a numerical rating (1-10 with 1 being the lowest and 10 being the highest rating), a "comparative verbal rating, usually against the "regular" brand (much better, somewhat better, neither better or worse, not quite as good, not nearly as good); the second type of questions are diagnostic, so called because their aim is to help understand why people reacted the way they did to the product, and presumably to give direction for improvement of the product if it did not receive favorable ratings.

Diagnostic questions range from "open end" questions such as "tell me why you did not like the product", or "why do you feel that way?" to structured rating scales where respondents are asked to evaluate such specific aspects of the product as "taste/flavor" texture, aroma, mouthfeel, strength, and color. These are usually rated on verbal or numerical scales (excellent-poor or 1-10).

In a paired comparison test, a preference scale is used to measure the degree of preference for one product over another. This can take the form of a numerical scale (prefer "A" over "B" from 1-10 points), or a verbal scale (prefer A to B much more, somewhat more, prefer B to A much more, somewhat more, have no preference between A and B). Diagnostic scales in a paired comparison test should be used to determine specific strengths of one product over another. Each product attribute (taste, flavor, aroma, texture etc.) is rated individually by comparing the two test products. Respondents are asked if they prefer A, prefer B or have no preference. Where a preference is stated, they are asked to rate the strength of their preference (prefer the taste of A much more, somewhat more).

back to top

C. STRATEGIC RESEARCH

1. Strategic research goals

Strategic research takes many forms, depending upon the objectives and needs of the brand or company. What distinguishes strategic research from the other forms of studies is its purpose: to put the marketing strategy of a brand on the most fruitful track.

Strategic research encompasses such issues as brand position, selection of the market target, identifying the most powerful benefits on which to focus the brand's communications, determining the competitive frame in which a brand does or should compete.

With such broad and overarching issues to be decided, there is no one technique, method or study that can answer all. However there are a number of specific kinds of information that are critical to the development of sound brand strategy:

1) benefits--understanding the nature of the emotional and rational benefits that people seek from the category and the brand;

2) segmentation-- knowing how the customer market may be segmented, if at all, into subgroups who have particular needs, desires or tastes;

3) brand character/personality--defining the character of the major brands in the market, including the qualities that make the distinctive and separate them from each other;

4) role of the product--understanding the way the product category is used by its customers, including the situations in which it is consumed or given;

5) heavy users--determining if there are a small number of users who con- sume a disproportionately large percentage of the volume of the category and understanding who these people are, what they are like and what moti- vates them.

 

2. Strategic Research Studies

There are a variety of studies that come under the heading of strategic research: market segmentation study, market definition study, attitude and usage study, market structure study, buying incentive study, etc. At the core of each study there is usually one or at most two major objectives.

a) Segmentation studies

A segmentation study is designed to determine if and how a market can be usefully segmented for strategic purposes. Segmentation may be done by attitude, behavior, or characteristics of users. The idea behind this study is that a category can be broken into subgroups each with distinct needs, attitudes or characteristics; a brand may then be targeted to one of these segments as a way of focussing its marketing effort and separating it from competitors.

These studies are large, complex and often very expensive. They require a skilled technician to conduct the analysis, a skilled researcher to design the questionnaire and draw a proper sample, and an experienced analyst to draw meaning from the data. Without the above qualifications, such studies should not be undertaken. Because of the complexity of the questionnaire, these studies should be conducted in person or by using a combination of telephone and personal interview.

b) Market Definition study

This refers to a study designed to provide a blueprint of a market, answering such questions as incidence of use, frequency of use, characteristics of users, brand status and hierarchy, concentration of use (heavy vs light users), seasonality, market dynamics (new entrants vs. category defectors), regionality, and price partitioning.

Such a basic study is helpful when entering a category with a new product for the first time, or when trying the assess a newly emerging category. It provides information that will help in making some fundamental decisions.

This kind of study requires a large and representative sample; a probability sample is preferable (see glossary). It is not technically difficult or complex, but does require a well designed questionnaire and a carefully planned analysis. The complexity of the questionnaire requires a personal interview or a combination of telephone and personal interviews.

c) Attitude and Usage study

This study is one of the most basic strategic studies. in many ways it is similar to a market definition study, except it contains more information about brand images, benefits and attributes that are desirable to users, and such behavioral data as frequency of use and purchase, usage situations and occasions, competitive products and categories. An A&U study provides a snapshot of a category, with profiles of the major brands, their users and their images, their strengths and weaknesses.

As is the case with any strategic study, it requires a large and representative sample for reliable and projectable data, a well designed questionnaire and analysis plan. These studies may be conducted by telephone, by mail, by a combination of telephone and mail or in person.

d) Market Structure Study

This study is designed to reveal how the products in a market compete with each other on a multi-dimensional basis. It permits one to "map" the brands and categories in a market and to identify the "territory" that each occupies as well as the extent to which they overlap in their "turfs".

A market structure study can be invaluable when planning to enter a market with a new entry, as it can identify areas where competition may be less intense and still offer volume potential. Such studies involve the use of many kinds of sorting and rating questions, and thus require personal interviews, usually of quite some length. They also require the use of advanced statistical programs. Specialists who have experience in this type of study should be used, else the results will not be fruitful. As with most strategic studies, the sample should be large enough to provide stable sub-groups and representative of the population being surveyed.

e) Buying Incentive Study/Brand Equity Study

More narrowly focussed than the previously described studies, a Buying Incentive Study is used to identify the key benefits that will form the centerpiece of the brand's position and communications. In such a study, the category has been defined as well as the market target. The brands against which the target brand will compete have also been identified. The objective of the study is the identify the benefits that are most important to people in the target and those that have the most potential leverage for the brand in question.

This study requires a thorough exploration of the nature and make up of the rational and emotions benefits that are relevant top users of the category, using a statistical technique called "factor analysis". It also requires the measurement of the images of each of the major brands on these benefits among users and non-users.

This study requires a large and representative sample, the services of skilled and experienced statisticians and a good marketing oriented analyst.

Often the marketing objectives necessitate strategic research requiring a combination of two or more of the above types of studies.

back to top

D. COMMUNICATIONS RESEARCH/TESTING

1. Creative development research;

Communications research is useful at two points in time: once strategy has been decided upon exploratory research of a qualitative nature is often helpful to the agency and the brand group. This is called creative development research. Its purpose is to provide feedback from target customers to rough ideas, concepts and executional directions for the benefit of the creative group. This is not testing or evaluative research, and so is not required to meet rigorous standards of protocol. Creative development research can be done with focus group sessions, one-on-one interviews or small (mini) groups. Respondents are exposed to a variety of stimuli, rough executions, headlines, concept statements and the like, and are questioned about the reactions, feelings, associations and attitudes associated with these stimuli. It is up to the moderator and the creative people who are observing to interpret and draw conclusions from what they see and hear. There is no structured set of measurements for creative development research. It is successful when the creative people come away with new ideas and insights.

2. Pretesting: Measuring advertising effectiveness

The second point where communications research is more than useful is after a campaign has been executed but before it has been placed in the media. This is the time for evaluative research. In the case of pre-testing, there are formal guidelines and protocols to follow. Such issues as technique, exposure method and measurements are vitally important to the pretesting of advertising.

The goal of pretesting an ad or commercial to to determine if the proposed advertising is effective before exposing it in the media. Effectiveness is defined as the ability of the advertisement to achieve the goals for which it was created. In most cases there are three criteria that comprise effectiveness: awareness, memorability and persuasion.

a) Awareness, memorability and persuasion

There are many pretesting services that offer standardized techniques designed to evaluate advertising effectiveness. The three basic yardsticks of advertising effectiveness are awareness, memorability and persuasion. These can be measured individually or in combination.

Awareness refers to the ability of an advertisement to attract the attention of a reasonably large proportion of people who have an opportunity to see it. Measuring awareness is important since the first job of any ad is to get people to notice and attend to it. There are a number of ways that the awareness-getting ability of an ad can be measured. Placing the ad in an environment that contains many other ads as well as entertainment material creates a realistic test.

Measuring the number of people who attend or read the ad in this environment and expressing this number as a percentage of the total number or people who were exposed to the environment (or had an "opportunity to see" the ad) provides an "attention score".

Memorability refers to the extent to which people who are exposed to an ad retain something about the message as a result of seeing the ad. This is important, since most ads are designed to impart information as well as have an emotional impact on readers/viewers. Memorability is particularly important in the case of a new to market introduction, where there are no residual impressions on which to build a brand image, and the product has a limited time to establish itself in the competitive environment.

Persuasion, on a macro level, refers to the ability of an advertisement to affect people's attitudes towards buying or using the brand. On a micro level, persuasion can mean the ability an ad the change the way people perceive the brand, particularly with regard to the strategic issues being addressed by the advertising.

b) Measuring Awareness

This is the simplest of the three main criteria of effectiveness to measure. Awareness is usually measured by placing the test ad or commercial in the context of other advertising and editorial (portfolio or magazine) or programming (tv or cinema) without any prior notice to the respondent. Those who are exposed to the portfolio, magazine or program are generally questioned immediately after exposure.

They are asked to recall advertising they read or saw in the test vehicle, and if mention is made of the test brand or ad, they are asked to describe what the remember seeing or reading. This is considered "unaided" recall. Those who do not mention the test ad are asked if they recall seeing or reading an ad for (test brand). If they do, they are also asked to describe what they remember seeing or reading. This is called "aided" recall.

Awareness is expressed as the percentage of people who had an opportunity to see an ad or commercial who recall reading or seeing it at some time after the exposure. In a sample of 100 people, 30 say they recall reading an ad for (test brand) and are able to describe something specific about the ad, the awareness or "recall score" for this ad would be 30%. Whether this is a good, bad or indifferent result depends upon the action standards which were set prior to the test. If the brand and category have a history of test results for comparable situations, this "score" can be evaluated in the context of that history. Without such a context, there are no absolute standards against which this result can be judged.

c.) Measuring Memorability

Memorability is a more complex dimension to evaluate than is awareness. Usually, memorability is determined by the amount of information that people can recall or "play back" in an interview some time after exposure to the ad or commercial. The "playback" obtained in the post-exposure interview described above is the source of the "memorability" information.

The more people who can accurately recite the content of the advertising, the more effective it is in registering its message.

However, there is a catch to this issue. Memorability does not necessarily mean that people are persuaded by the message. In fact, sometimes people can recite the copy points of an ad almost verbatim, but their perceptions or image of the product that those copy points were designed to change remains unchanged! There are instances where playback of the copy content does not reveal the true impact of an advertisement on people's perceptions of image of the brand. in other words, what people say you said is not necessarily what they feel you meant to say.

An ad designed to communicate "prestige" may successfully convey that this is its intent, but the image of the brand's prestige could remain unmoved by exposure to the ad. Therefore, it is important to measure the impact of an advertisement on people's perceptions or image of the brand in addition to knowing what they can recall from its content.

d.) Measuring Persuasion

On its most macro level, persuasion is the ability of an advertisement to increase a person's the propensity to buy the brand as a result of exposure to the advertising. Since most people do not like to admit that advertising influences their buying decisions, direct questions will not successfully measure the persuasive power of an ad. It must be measured indirectly.

This can be done in one of two ways: one is a Pre-Post design, whereby the brand preferences of the sample are measure before exposure to the advertising and again after exposure.

Differences in the percentage of people who choose the subject brand after exposure are attributed to the test advertising. The strength of this design is its sensitivity. It does not take very large differences between pre and post levels to reach "statistical significance".

The weakness of this design is bias caused by transparency of the test. When people become aware that the test is measuring their brand preferences, their answers are affected by more than just the impact of the test ad. In such tests, it is important to disguise the pre exposure question

The second design for measuring persuasion is called Test-Control, whereby separate and independent samples of people are used, one is exposed to the test ad and a second sample is exposed to a control ad or no ad for the brand at all. The level of brand selection or attitude toward buying the test brand is compared between the test and control samples.

If a larger number of people in the "test" cell favor the advertised brand than those in the control cell, the difference can be attributed to the test advertising.

In such tests, it is important to have large enough cells to make statistical comparisons (at least 100 people per cell), and to make sure that the cells are as evenly matched in critical characteristics so their pre exposure attitudes to the test brand will be comparable.

The test-control design can also be used to measure persuasion on a micro level. That is, a battery of image dimensions can be included in the interview, and the ratings of the test brand on these dimensions can be compared between the test and control cells. The dimensions should be strategically relevant to the advertising, and in this way, a more valid measure of the advertising impact on brand image can be obtained than by using verbal playback of advertising copy.

For example, by comparing the rating of the brand on a dimension such as "prestige" in the test cell with its ratings in the control cell, the impact of the ad on this dimension will be revealed.

d.) Pretesting services

A list of pretesting services is provided along with an assessment of each in "SUPPLIERS" section of this manual. There are a number of characteristics of pretesting services that should be taken into account when selecting one. Having chosen one particular technique or service, it is advisable to stick with it, assuming it produces useful results. This will allow for the establishment of a a set of expectations and standards which will help to make the interpretation of results of each new test all the more insightful.

There are basically four standards that can be applied to any test design: 1) How "normal" is the test situation? 2) How normal is the exposure technique? 3) How many exposures to the advertising (film/video) are incorporated? 4) How actionable will the results be?

Taking each in turn

1) The test situation: The test situation should ideally be in the respondent's own home or the environment where the advertising is most likely to be seen under real life circumstances. To the extent that the exposure situation is unlike "reality", the effect on the responses of respondents may bias the results of a test.

2) The exposure technique: The advertising should be exposed in the context that most closely parallels its "normal" environment. This means in a magazine or billboard for print or in a program for video or cinema for film. Removing an ad from its normal context will cause it to be perceived differently in unknown ways.

3) The number of exposures: Most advertisements are seen two or three times before they have an impact. This is particularly true for video, which usually has too short a duration (30 seconds) to deliver its message fully on first exposure.

For this reason, tests that incorporate multiple exposure should be sought over those that rely on a single exposure. In the case of print, techniques that permit repeated and/or prolonged exposure should be favored over those that use a single brief exposure at one point in time.

4) Actionability: If a pretest is being done for a go/no-go decision, the action standards need to be outlined in advance and the ways the test will be interpreted should be clearly spelled out beforehand. To the extent that "norms" can be used, this is helpful, though norms must be viewed with a critical eye. Only those tests which most closely parallel the one being done with respect to strategy, target, category etc. should be used as yardsticks.

Some of the issues involved in choosing a pretesting service include:

1. Geographic location

Locations of test sights may vary or be standardized. The appropriateness of the markets for standardized services should be evaluated for each brand. Cities with atypical consumption of the product or category should be avoided for test purposes.

2. Sampling

Each pretest service has its own methods for selecting and drawing the sample for a pretest. These include:

...Method for recruiting Central location intercepts
Telephone
Mail
Social groups (church, school)
...Nature of the sample Cross section of general population Screened for specific characteristics
...Sampling method Quota
Probability
Haphazard
...Sample size per test cell Range from 100-500

3. Test location

The location of the test site is important to consider. Generally there are four alternatives offered by pretesting services: a mall/hall research suite, a commercial meeting room in a hotel or hall, in-home, or the private testing sites of the service.

4. Exposure method

Keep in mind that the more natural the method of exposure, the more valid will be the test results. As exposure techniques become more unreal, the chances increase that the test situation will influence the results. Pretesting services offer a range of exposure techniques. Some use forced exposure with a captive audience, others use a partial "forced exposure" by inviting the audience to view, and the most desirable use a natural exposure either on-air or in-home. For print testing, some services place the test ad ina "portfolio" which is a mock magazine, while others use slides that are exposed for brief intervals, and others simply place the test ad in a folder with a group of eight to ten "clutter" ads.

For t.v. commercial testing, some services place the ad into a program along with other commercials. The program may be exposed in a studio, over cable or closed circuit t.v. or via video cassette on a monitor. Again, the more normal the context of exposure, the more valid will be the test results.

5. Timing of the interview

For the most part, it is desirable to limit the questioning of respondents prior to exposing the test advertising. This limits the degree of "forewarning" or calling undo attention to the test subject. Basically, pre-exposure questions should be limited to qualifying matters, preferably disguised in the context of other questions.

Post exposure questioning can take place immediately after exposure, if a measure of awareness and persuasion is taken, or it may be delayed as much as 24 hours, to measure residual memorability and persuasion.

back to top

E. PACKAGE TESTING

Packaging may well be the most important marketing communications agent for our products. The ability to attract attention on crowded shelves, effectiveness in communicating the desired image, brand identity, price/value and quality are all critical tasks for packaging.

In most cases, the testing will be among alternatives, and the role of the test will be to help select one for further development or adoption. When testing alternatives, the evaluative measurements will be comparative; responses to each package will be compared to determine which performs best. A secondary role of package testing is to provide guidance to improve pack design. In all cases, these are quantitative techniques. Qualitative research is not appropriate for later stage evaluation of package design.

The first thing to determine when conducting a packaging test are which criteria will be used to evaluate the package(s). There are essentially three major kinds of information that package testing will provide: 1) visibility on the shelf; 2) image communication; 3) impact on intention to buy.

Tests that emulate store shelf observation are effective in providing visibility measures. Often the new package is shown via slides on a crowded retail shelf in the context of competitive products.

Eye tracking techniques or brand recall and recognition are used to measure visibility. To measure intent to buy, the package is presented for a longer period of time after the visibility measure is taken.

Measuring the impact of packaging on image is more difficult. Such evaluation would involve either pre/post measurement of imagery components and/or exposed vs. unexposed cells in the test design.

There are several package testing services that offer sound methods and techniques for evaluating new product packages or modifications in existing packaging. These are listed in the supplier section of the manual.

back to top

F. NAME TESTING

Many of the comments made above for package testing apply to name testing. It is vital to know the desired image or position for a new brand in order to evaluate the results of a name test. In virtually every case, the role of a name test will be to help select one from an number of alternatives.

The "artificial product test" can be used quite effectively when testing names outside of the context of pack design. Qualified respondents are recruited to a central location, ostensibly to give their reactions to a new product. They will be told that there are samples of four different products, each identified only by a name. They are asked to taste and evaluate a sample of the product. Differences in how they react to each "product" will of course be due to the influence of the names, since the products will be the same.

Probing for the top of mind associations to each test name is also useful, as these responses will provide clues to the kinds of images that each name stimulates.

Most qualified research suppliers can provide the design for name testing, as special equipment or techniques are not required.

back to top

G. PRICE RESEARCH

Using research to evaluate pricing is one of the most difficult endeavors, as there are no proven techniques for predicting price elasticity. There are several methods for evaluating relative prices in the context of perceptions of value. Using this model requires conducting a research study among category users that will provide specific information on brand price value perceptions.

The range of acceptable prices for a given brand may be determined by a survey that comprises four questions. Respondents are presented with a representation of the brand in question and given a card with a price scale ranging from far below to far above the current average retail price of the brand. The scale is marked in small increments. People are asked four questions: 1) at what price on the scale would they consider the brand to be too expensive to buy? 2) at what price would they consider it to be too cheap to be any good? 3) at what price would they consider it to be a good value for the money? 4) at what price would the consider it to be expensive but worth it?

The answers to each of these four questions are plotted on frequency curves. The points at which the too expensive and too cheap curves intersect represents the indifferent price point. That is the point at which as many people feel the price is too high as feel it is too low. The point at which the good value and expensive but worth it curves intersect is the optimum price point.

The answers to the too expensive and too cheap questions provide an idea of the range of acceptable prices, with a distribution of the proportion of people whose purchases are at risk as the price moves toward the high point or whose image of the brand may be damaged as the price goes down.

If a separate analysis of the answers of brand users, non-users, heavy users, etc. is made, this study can provide sound strategic guidance for pricing decisions.

back to top

H. TRACKING STUDIES

1. What are they?

A tracking study is either a continuous survey reported out at regular intervals or a survey taken at regular intervals (called waves or "dips") among comparable samples of people using the same set of questions. Its purpose is to compare the answers obtained in each time period with the prior answers to measure changes in such factors as brand salience (top of mind awareness), brand preference, brand imagery and brand usage. The object of the tracking study is to know what is happening to a brand or brands in the minds of the people who buy or use it as well as those who are intended to buy or use it. Often tracking studies are undertaken specifically to evaluate the progress of advertising campaigns over time.

Sales data do not reveal much beyond how much product is being sold. Understanding the dynamics of the marketplace, the changes in brand perceptions or imagery and the changes in behavior with respect to brands or categories are essential to the management of a brand's marketing programs.

2. Methods and Measures

Tracking studies may be conducted by telephone (where appropriate), mail, or in person. It is usually more efficient to combine tracking studies for many brands into one wave of screening for eligible respondents, since each contact may yield a qualified respondent for one of the several brands being measured.

Once begun, it is critical to maintain comparability of sample, questionnaire, screening technique, markets and analytical techniques over time because the goal is to attribute any changes seen in the data to marketplace effects rather than artifacts of the research.

b. Measures

There are essentially three key measures to be "tracked" over time: brand and advertising salience (how many people mention the brand unaided?), brand usage (how many people list the brand as one of their "preferred" brands, and at what level?) and brand imagery (how has the image of the brand held up or changed over time, particularly on the key strategic dimensions?) These factors should be tracked for the brands in question as well as for their major competitors. In addition to these measures, intent to buy and product satisfaction can be helpful indicators of future events and can signal potential problems that should be investigated further.

Measures of these factors over time will provide managers with a good barometer of its strength in the marketplace and a good predictor or its future sales trends. The impact of marketing and advertising programs can be inferred from the changes that are observed, as well as the impending threat of competitive activities.

Measures of advertising recall and impressions can be added to the tracking study questionnaire.

c. Frequency of survey

The most effective and sensitive method of monitoring the consumer mind is to conduct continuous interviews, on a weekly or daily basis, and cumulate the results to show moving averages. This method will reveal changes in brand status as it occurs and will also be highly sensitive to the brand's marketing and competitive activities. A continuous tracking study requires a relatively small number of interviews per day or week and gains its stability as the sample grows and accumulates over time.

An alternative is to conduct regular waves or dips of surveys. These will be less sensitive to events in the marketplace and will provide mainly gross indications of brand status.

Tracking studies that are conducted once a year are the least sensitive and revealing. They are useful to show "seismic" changes in a category, but are not very helpful in diagnosing the underlying reasons for change.

back to top

I. SAMPLING

The sample drawn for a research study is the most important decision made in the design of a study. The sample definition, design and method will affect the validity, reliability representativeness and the cost of the study more than any other single factor. There are four primary issues to determine in designing the sample for a study: whom to interview (sample universe), how many people to interview (sample size), how to select the respondents (sampling method), how to control the sampling process (sample quality controls).

1) defining the sample universe

The term sample universe refers to the total population you wish to investigate. Precision is important! If, for example, a test of a new cream based liqueur formula was being conducted, and the brand concept was for a premium priced position. The sample universe for the test should be defined as men and women who bought and drank a premium priced cream based liqueur within the past six months. If existing data revealed that heavy users (drink cream based liqueurs at least once per week) account for the vast majority of volume (75% or more), the sample universe should be men and women who drink a premium priced cram based liqueur at least once per week.

Since screening costs (the cost of identifying a qualified respondent) may often equal the costs of interviewing, a change of sample universe definition can have significant cost implications. In the above example, by changing the definition from men and women who bought and used a premium price cream based liqueur to heavy users, the incidence of qualified respondents may drop precipitously. So instead of one in ten people who may qualify, the incidence may fall to one in fifty! Is it worth the cost?

The answer in most cases is yes, because a test that is not conducted among the right sample universe is worthless as a predictor of results in the marketplace. And that is why the research is being done in the first place.

2) sample size

How many people should be interviewed? This decision affects the cost and reliability of a study. The size of the sample should be determined by two factors: how much accuracy is required from the study and what subgroups or cells will be analyzed independently. Accuracy refers to the likelihood that the results of the study will be similar if 100 identical studies were conducted at the same time. In other words, you want to be sure that the results of the study were not a "fluke" or artifact of some unknown factor. This is what is referred to as the confidence level.

The higher the confidence level, the less likely it is that a different result would occur if the study were replicated. For example, in a test of formula "A" vs. "B", "A" is preferred over "B" by a margin of 12% (36% vs. 24% with 40% "no preference"). The sample size is 150 respondents per cell for a total of 300. The report says that this difference is "significant at the 95% level of confidence +/- 4 percentage points". This means that had 100 such test been conducted at the same time under identical conditions, the results ("A" over "B" by 12 points, 36% vs 24%) would have occurred in 95 of the 100 tests, with "A"'s margin of preference over "B" ranging from 8 to 16 percentage points.

Such a result can be interpreted to mean that formula "A" is likely to be better received in the marketplace than would formula "B" and, all other factors being equal, should be used for the new product.

If a smaller sample were used, say 75 people per cell, the difference in preference between "A" and "B" would have needed to be nearly three times as large to reach the 95% confidence level. A difference of 36 points for "A" over "B" is not likely to have occurred, and therefore, a higher level of risk is introduced into the decision to choose "A" over "B".

The calculation of error ranges around sample statistics can be done only with a probability sample.

Probability sampling is the only method of sampling that produces statistics whose accuracy can be assessed, though it is common mispractice to use sampling error statistics and terminology on non-probability samples. Sampling accuracy is not related linearly to sample size. It takes four times the sample size to double the accuracy.

If a survey is being conducted where the results will be used to project volume or financial information and the economic consequences of inaccuracy will be great, a probability sample should be used. When conducting tests to determine the relative effectiveness of two or more advertisements or product formulas, it is not necessary to have such stringent sampling methods and sizes. If their are gross differences between the ads or formulae, cell sizes of 100 drawn from a proper universe with good sample controls can be relied upon. However, if one is trying to identify small or "nuance" differences, or if a formula change in an existing product is being tested, it is advisable to increase the sample size to a minimum of 250 qualified respondents per cell.

3) sampling method

Sampling method refers to the way the people to be interviewed are selected. There is a "pure" method, called probability sampling, wherein each person in the universe has a known chance of being part of the sample, and two other methods that do not permit any way to estimate sampling error. For studies that require precision and accuracy, probability sampling should be used. However, this is a very expensive process, and can add 25%-50% to the cost of a survey. A decision should be made as to whether the study being done requires "projectability". If it does, then this is the only method that should be used.

The two "non-purist" sampling methods are "quota sampling" and "haphazard sampling." Quota sampling involves defining the qualifications of the respondents being sought and screening for them among a population, either by telephone or from pedestrian traffic.

Haphazard sampling has no design or system. It involves taking anyone who comes along or answers the phone who will participate in the study. This kind of sample is representative only of the people who happen to be in the area at the time of the survey, or happened to be home and answered the telephone at the time of the survey. Sometimes inaccurately referred to as "random sampling", haphazard sampling should be used only when the study being done requires a minimum of accuracy, reliability or representativeness.

Using lists from which qualified respondents may be located is often a way to reduce screening costs. However, lists are not without bias. Names on lists are often obtained by self-selecting means (voluntary submission of purchase cards, coupon redemption, members of a group, etc.). The people on these lists will be quite different from the population at large, even if they meet the sample qualifications. Therefore, samples drawn from lists should be used only in situations where the sample biases will not impact on the decisions to be made from the research.

4) quality controls

Sample quality involves getting the right respondents, getting the right proportion of respondents in each category, and getting a representative sample of the right respondents.

Getting the right respondents is controlled by what is called the screener. The screener is a series of questions that are asked of potential respondents when they are first contacted. The questions act as a filter to identify the people who have the qualities or characteristics that the study requires. A careful record must be kept of each attempted contact and each person screened. In this way the supplier can report on the incidence of qualified respondents and the acceptance rate (the proportion of qualified respondents who agreed to be interviewed).

5) Use of incentives

These days many people are unwilling to be interviewed. If the acceptance rate falls below 50% of the people you contact, the sample quality may be compromised, since it will include only "agreeable" people. For studies that involve long interviews (over fifteen minutes in length), or those that involve self-administered questionnaires, it may be necessary to use an incentive. Money is often the most effective incentive, but there are cases where coupons or gifts will work.

back to top

J. Qualitative and Quantitative Research

It is necessary to point when each type of research is appropriate and inappropriate.

When decisions need to be taken that involve knowing how many people feel a certain way, act a certain way, perceive a brand a certain way a quantitative study is called for. Strategic decisions, evaluation of advertising, concepts, packages, etc. all require measurement via quantitative research.

Qualitative research is appropriate for exploration, hypotheses development, determining the language and types of people who may be in a market. It is not a decision making tool and should not be employed as such. This is important to note because qualitative research is more user friendly than is quantitative research. It is easy to understand what people are saying when observed from behind a one way mirror or on video tape. Their comments may have more force and seem more "real" than does a cold column of percentages.

But keep in mind that these comments are not representative of the population you are studying. Even if five focussed group sessions each of ten respondents are conducted, the sample is 5 not 50.

It is more often more difficult and possibly more costly to conduct a quantitative study than it is to do a few focussed groups. But if the economic consequences of the decisions to be taken are great, the difficulty and expense are called for.

back to top

K. THE RESEARCH BRIEF

Writing a good brief will save time and expense in dealing with a research agency. It will also help speed the interpretation of results of a study by making clear its objectives and standards of action.

The brief calls for you to answer a number of questions about the proposed study: 1) why is the research being proposed at this time? 2) what are the business issues for which the information is needed? 3) what information is believed to be needed? 4) what can/will we do with the information? 5) what level of acceptance/performance will trigger implementation of required action? 6) who is the target audience for the proposed product, advertising, etc.?

The purpose of answering these questions is to put into focus the goals of the research and the way the information will be used before the study is undertaken.

Questions 4 and 5 have to do with the action standards of the research. It is essential that these be agreed to before the study is implemented.

Special materials required (if any), budget parameters for the study and timning requirements should also be indicated.

In response, the research agency should describe how the study they propose to conduct will address the objectives and what information will be provided that will form the decision criteria. Any normative data or experiences from other studies that will be used to help interpret the results should be indicated in the proposal.

Agreement should be reached on these specific issues before the study is authorized.

back to top