Scale development takes a lot of time and research (Gehlbach & Brinkworth, 2011), so in this toolkit we suggest some robust scales you can use that have been developed through a rigorous research process. The advantage of using scales is that they are more precise and reliable when it comes to assessing the underlying theme that they aim to measure (McIver & Carmines, 1981). For example, in your questionnaire you could include a scale about student self-efficacy, which would include 5 items that collectively capture how much students believe they can succeed in achieving academic outcomes. We encourage you to use these student self efficacy scales that we have adapted from existing and previously validated scales.

Gehlbach and Brinkworth (2011) synthesize several known survey design practices and created a new rigorous and reliable process to design survey scales. This process relies on both potential survey participants as well as experts in the field as an effort to reduce measurement effort and increase the validity of new survey scales. The six-step process is as follows:

Step 1: Literature review with the goal to precisely define the construct in relation to the literature on the subject and to identify how existing measures of the construct (or related constructs) might be useful in the development of a new scale. For example, if you are developing a new scale to measure educational self-efficacy, you would want to consult the literature to help you define what self-efficacy is in the literature.

Step 2: Interviews and focus groups with potential respondents to establish whether the newly refined conceptualisation of the construct matches the way the prospective respondents think about it. You will need to recruit a sample of participants for your interviews or focus groups from the population of target respondents (e.g., a sample of Imperial students, if the scale in question is to be administered to Imperial students). Again using the example of creating a new educational self-efficacy scale, this step will help you to determine whether the way scholars define educational self-efficacy matches with the way Imperial students think about educational self-efficacy.

Step 3: Synthesise the literature review with interview-focus group data to reconcile the differences that emerge between academic and lay understandings of the construct in question. In this step, when scholars and respondents agree conceptually but describe the indicators, or sub-themes, differently, you can use the vocabulary of the respondents. This will help you to create a list of indicators, or sub-themes, which you may develop your items around in Step 4.

Step 4: Preliminary item development. The goal of this step is to develop items that represent the indicators – or themes – that arose out of the synthesis of literature and interview-focus group data. In this step, it is advisable to develop a few more items than you will want to have in your final scale. For example, if you want your final scale on educational self-efficacy to have 5 items, develop a preliminary list of 8-9 items. Please see the above “Top Tips for Developing Items and Response Options” section above.

Step 5: Expert validation of preliminary items. It is advisable to return to your academic audience, and ask experts in the field (e.g., self-efficacy) to participate in an online survey to review the items and provide feedback on how they might be improved (or eliminated altogether). This step will help to ensure that the items you developed match your conceptualization of the construct (e.g. educational self-efficacy), and will possibly give you additional feedback on possible missing indicators. When you reach out to experts in the construct of interest, be sure to provide them with your definition of the construct, and ask them to indicate how relevant each item you developed is to the construct and note any important aspects of the construct that are not reflected in your items. You can then revise or remove items based on the feedback you receive from the experts.

Step 6: Cognitive pretesting interviews to determine whether your respondents consistently interpret your remaining items as you intend them to. In each one-on-one interview, you will ask potential respondents to 1) re-state each item in their own words (without using any of the words in the item itself) and 2) think aloud while coming to their answer to each question. This process can help you identify clear trends from multiple respondents about any problematic items, and provide insight on how you can make changes (e.g., adapting the vocabulary of the item to make it easier to understand) (Willis, 2005).

After these six steps, the items can be pilot tested with a sample of potential respondents. From the pilot tests, it is necessary to analyse the mean and variability of each individual item, as well as inter-item and item-total correlations, reliability among other analyses, to understand how well the items perform together as a scale, and to test whether measures of a construct are consistent with what you understand the construct to be, among other analyses (Gehlbach & Brinkworth, 2011).


Gehlbach, H., & Brinkworth, M. E. (2011). Measure twice, cut down error: A process for enhancing the validity of survey scales. Review of General Psychology, 15(4), 380-387. Retrieved from

McIver, J. P., & Carmines, E. G. (1981). Unidimensional scaling. Beverly Hills, California: SAGE.

Willis, G. B. (2005). Cognitive interviewing: A tool for improving questionnaire design. Thousand Oaks, California: SAGE.