Validating an evaluation school functionality tool

The literature on school improvement suggests that interventions are more likely to succeed when they are implemented in schools with a certain basic level of functionality. If education is to achieve educational outcomes, improve learners’ well-being and breadth of skills (including socio-emotional skills) for the 21st century, then attention must be placed not only on access to schooling but also on quality of education. The question is: How do we ensure that interventions in education are achieving both quality and educational outcomes, and how do we measure the relationship between quality and educational outcomes?


Introduction
The literature on school improvement suggests that interventions are more likely to succeed when they are implemented in schools with a certain basic level of functionality. If education is to achieve educational outcomes, improve learners' well-being and breadth of skills (including socio-emotional skills) for the 21st century, then attention must be placed not only on access to schooling but also on quality of education. The question is: How do we ensure that interventions in education are achieving both quality and educational outcomes, and how do we measure the relationship between quality and educational outcomes? Taylor and Prinsloo (2005) argue for creative and innovative ways of addressing the challenges facing the education system globally, with a focus on new school performance indicators rather than relying on pass rates. These indicators include, for example, enrolment, governance, management, leadership and teaching. Furthermore, they argue for a systemic approach to school interventions. Harris et al. (2006) postulate that interventions aimed at dysfunctional schools need to take into consideration both the external environment and the internal environment of the school itself.
Khulisa Management Services (Khulisa) have conducted research and evaluations in South African schools since 1993. The authors hypothesised that school functionality influences learner outcomes. This hypothesis was based on the review of international research, as well as a 'gut feel' that this is particularly relevant in the context of poverty, social exclusion and resource constraints in South Africa. However, there was a gap in assessment tools to determine the status of school functionality with the ultimate aim of examining whether there is a relationship between school functionality and learning outcomes. Consequently, there was a need to develop an assessment tool.
Whether evaluators adapt a tool or design an evaluation tool, one of the key challenges is how to validate these tools, ensure that they are context-specific and they meet quality standards. To meet evaluation quality standards, evaluators must ensure that evaluation tools and data are accurately measuring the indicators and variables that they purport to measure. The principles include, amongst others: • Validity to confirm that what is set out to be measured is measured and to what extent the measurement represents the reality it claims to represent. • Reliability to determine the extent to which the measurement tools, analysis or specification (variable) is consistent and dependable. • Relevance of information to meet the requirements and scope of the evaluation and the organisation or programme, and the extent to which the information answers the question, indicator or objective. • Ethics to protect and respect the rights of beneficiaries and participants and to ensure that the choices about what is right and wrong in relation to values and behaviours are based on ethical principles. • Equitable, fair, impartial, unbiased and without discrimination related to both the assessment tool and participation of individuals.
To meet these standards in evaluation practice, we, as evaluators, need to have tools and instruments that provide accurate data. This article focuses on the process Khulisa undertook to develop a school functionality tool. The process included an assessment of the evidence gathered to support the meaningfulness, usefulness and appropriateness of the tool properties (Chan 2014). Lessons learned from the process are shared with the aim to guide evaluators undergoing similar processes to gather quality evidence in evaluations. The tool validity and reliability scores and results are not presented in this article, as this is ongoing; rather the focus is on lessons learned from the process of validation to date.

Research method and design
Over several different evaluations focusing on learner outcomes, Khulisa developed a school functionality tool. The tool was developed based on a review of international and South African literature, engagement with key stakeholders in the education sector in South Africa and through a series of implementation phases across various geographic sites during evaluations conducted since 2011.
The tool included quantitative and qualitative indicators in several pillars (or characteristics) of schools that work, which include teaching and curriculum delivery, learning outcomes, contextual environment, resources, administration, governance, community and professional development.
Khulisa's school functionality tool used a weighted scoring mechanism of assigning 0-4 points for indicators of primary school functionality, which are combined to calculate an overall school functionality rating. The tool allows for distinguishing between four general types of schools, namely, (1) highly functional schools, (2) functional schools, (3) stagnant but functional schools and (4) dysfunctional schools.
The tool originally started out as a 1-day exercise, but over time has evolved into a rapid assessment tool to be executed by a trained evaluator or field researcher, thereby minimising disruption to teaching and learning practice as well as school management. It relies mostly on observations, with evaluators rating what they observe against a set of valuing criteria. For corroboration purposes, the tool includes a collection of photographs of certain elements of school functionality, such as the condition of the school toilets, kitchen, and periphery (e.g. outdoor space and school fences). Initially, the tool was administered using MS Excel, and photographs were taken separately. More recently, the tool was administered using mobile data collection software such as Open Data Kit (ODK) and Tangerine®, which allow for real-time data collection using cellphones and tablets.
The administrative guide details procedures and considerations to ensure that ethical principles are upheld. This includes ensuring informed consent from the principal and school management and that photographs do not include children's identifiable features, such as their faces.
A systemic reflection and evaluation process was undertaken by Khulisa in 2019 to document the steps undertaken to design and validate the tool, and to extract lessons to guide future validation processes.

Ethical considerations
This article followed all ethical standards for a research without direct contact with human or animal subjects.

Results
The process of designing, refining and validating the school functionality tool For the past 8 years, Khulisa has been in the process of refining and validating the school functionality tool. Reflection on the process indicates that the four phases as    identified by Creswell (2012) were followed. These phases are: 1. planning 2. construction 3. quantitative evaluation 4. validation.
This section (Table 1) reflects on our experience and describes the process undertaken under each phase.

Planning
This phase includes, firstly, identifying the purpose of the tool, the content area and who the relevant stakeholders are; secondly, reviewing the literature to check existence of similar tools and to determine definitions of the variables and constructs to be measured and lastly developing open-ended questions to present and engage with relevant stakeholders.
The results of these elements should inform the development of the tool scope and components.
For Khulisa, the opportunity to develop the tool emerged in 2011 as we were contracted by a Foundation to conduct a 6-year evaluation of a range of projects implemented in 60 schools across the country. Our proposal included going into the schools before they received the projects' interventions to ascertain their level of functionality. Significant effort was expended to design a school functionality data collection tool that could be administered with verifiable data.
Our evaluation team conducted an extensive review of international and South African literature on school functionality to determine the variables and constructs to be measured. Together with the funder, Khulisa identified relevant stakeholders with which to engage and to support the development of the tool. Finally, Khulisa consulted with an education Foundation, consulted with academics from four South African universities and education experts. This led to the development of several indicators and the various school functionality pillars. This planning phase also occurred iteratively in future evaluations, where we had to update the literature with more recent findings from research.

Construction
This phase is about developing the tool. The first step is to identify the tool's objectives and develop a table of specifications whereby each indicator is linked to a concept and overall theme (Statistics Solutions 2018). Upon completion, it is time to build the tool, which includes looking at question format (e.g. multiple choice, nominal scales, ordinal, Likert scale, etc.) based on the type of data required for each question and/or indicator. When developing the tool, other sector/area specialists can be involved in the development process and to review the tool.
Once the tool is built and reviewed, it is presented to peers and other stakeholders to match items to specifications -and if there is not a direct match then it needs to be reviewed. The contents of a tool are considered valid when the indicators adequately reflect the various dimensions of the objective of the tool (Benson & Clark 1982). In the end, the tool is finally reviewed by relevant stakeholders who critique the quality of individual items and the tool as a whole (Statistics Solutions 2018).
In developing the school functionality tool (Figure 1), we workshopped the design, content and indicators collaboratively with relevant stakeholders. This included local Foundations working in education, academics and the South African Department of Basic Education. This exercise was important to establish buy-in and obtain input into the indicators we had developed based on the literature and engagements in the initial phase discussed above.
The initial development of the tool did not include schoollevel input, as it was not part of the evaluation design. However, the evidence to support the tool development came from academics and education officials who engaged   directly with the realities of school management and functionality. At this stage, tool validation was not the primary intention. The value of the process was that it garnered evidence and insights from a select group that subsequently informed a pilot test of the tool. Tool construction was not a once-off process, as we held workshops every time we revised and adapted the tool for use in our evaluations, thus at each round getting further input, refining the items and ensuring items directly matched the relevant variables. The image below provides a snapshot of some of the questions included in a recent iteration of the tool.
In a later evaluation, we developed a table of specifications (Table A1), which included reference to the literature, government norms and standards, scales of measurement and criteria for standards.

Quantitative evaluation (and current results)
This phase involves pre-testing or 'first pilot' of the tool with a representative sample and collecting feedback on the tool. . These measurements can assist in revising a tool based on evidence, rather than just a 'gut feel'.
In this phase, as with all our instruments, we pre-tested or 'piloted' the school functionality tool. Firstly, to receive feedback on the length, language and clarity of the tool. Secondly, to adapt and refine the tool to ensure relevance to the South African context and to determine consistency of measures and responses.
Khulisa used the tool in several different evaluations over the years to inform the piloting process. The tool was tested and refined over a series of pilots carried out during six evaluations where the tool was tested in a total of 962 sites (including schools and early childhood development centres), as illustrated in Table 2.
Edits to the tool, following feedback, included: • The initial tool included learner outcomes as measured by the Annual National Assessments (ANAs). However, when the ANAs were discontinued by the Department of Basic Education (DBE) in 2015, these data were not available. Consequently, this indicator was removed from the 2017 version of the tool. Then in 2019, the tool was revised to include an indicator of learner literacy outcomes, as it was a requirement of the evaluation being conducted at the time. These data were obtained through primary data collection. 1 • Initially, the tool took a full day to be administered. The feedback was that the tool could not be administrated easily within the given timeframe. The tool was subsequently revised to allow the evaluator to observe the different indicators during a school day across different settings (e.g. kitchen, safety of school, etc.), and enter the ratings into the tool following each point of observation. This was a more efficient use of the evaluator/researcher's time, improving the costeffectiveness of the evaluation without comprising quality. • Initially, the tool was administered using paper and pencil, and the data entered into a laptop. The feedback was that this was time-consuming, cumbersome and led to errors in data entry. With the development of rigorous 1.The evaluation findings are not currently available for public release. open-source mobile data collection platforms, we began to implement the tool using mobile data collection applications. This meant that the tool could be easily administered in real-time, which avoided duplicate entries, was less time-consuming and improved management of the data. For example, it allowed instant access to the data for daily quality checks, improved fieldworker management and ultimately improved data quality. Furthermore, mobile data collection improved the ability of the evaluation team to verify observations and ratings through the use of photographs and global positioning system (GPS) locations.
Importantly, as the tool required observations, providing adequate training (for inter-rater reliability), and having a method to moderate responses, was critical, hence the use of photographs and the inclusion of supervisors in the field.
Originally, the tool was designed to provide a rating of school functionality. The ratings were qualitatively confirmed across several evaluations and through peer review by experts in the field, evaluators, clients and government officials.
The premise behind school functionality is that dysfunctional schools lack the leadership, management and other skills needed to run a school effectively and that efforts to improve teaching and learning will not have an effect as teachers are not teaching and learners tend not to be learning. At the other end of the spectrum, highly functional schools do not require programme intervention. In the middle, there are functional schools, often with entrepreneurial principals who gather resources from lots of sources and then use them, that would benefit from programme support. The international literature does not discuss 'stagnant schools'. This category was added by Khulisa when collecting data to describe schools that once were either functional or highly functional but now are operating on legacy good practices and resources.
It is only as we have moved into the full validation phase that we are beginning a process of quantitatively evaluating the reliability and validity of the tool (discussed below).

Validation
This final step involves quantitatively establishing validity through a final round of testing the tool and reviewing the data against criteria. Here, it is important to understand the different constructs of validity and the relevance of each to the purpose of an evaluation in a specific context.
• Content validation determines the extent to which the items on a tool represent the domains or constructs that the tool intends to measure. At least three experts should be consulted (Statistics Solutions 2018). • Criterion-related validation determines if a tool is a good predictor of an expected outcome that it is theoretically expected to predict. Here, a correlation coefficient of over 0.60 indicates a significant positive relationship (Creswell 2012; Statistics Solutions 2018).
• Construct validation determines how well a test or experiment measures up to its claims that is if the score recorded by a tool is meaningful, significant, useful and has a purpose. It achieves this by comparing the relationship of a question from the scale to the overall scale, testing a theory to determine if the outcome supports the theory and by correlating the scores with other similar or dissimilar variables (Statistics Solutions 2018).
Either one or all of these types of validity may be conducted. The decision is based on what the tool will be used for and the strength of validity required.
Given the iterative nature of the school functionality tool, the fact that it has been reviewed by academics, experts and government officials, and it aligns with literature, government norms and standards, provides scales of measurement and has set criteria for standards, Khulisa believes that the content validity of the tool has been adequately established (although it would benefit further from a review from school officials, which Khulisa will undertake as part of a subsequent application of the tool).
Khulisa has begun a construct and criterion-related tool validation process using data from a recent evaluation, which looked at the impact of three reading interventions on learner reading outcomes. In this evaluation, a team of researchers administered an adapted version of the school functionality tool to 229 schools in one province in South Africa. The tool was administered alongside other evaluation tools including learner reading assessments, teacher and principal questionnaires, classroom observations and a parent questionnaire.
The school functionality tool administered in this evaluation collected information on various domains, including the status of food and nutrition, hygiene and healthcare, the school environment, teaching and curriculum delivery, learning and teaching materials and school management. As it relied mostly on observations, the tool by its nature reflected the judgements of the trained researcher, where each researcher's response was influenced by his or her own frame of reference (albeit informed by rigorous training) as to acceptable quality standards. For verification purposes, fieldworkers took photographs of certain elements. The researchers intend to explore whether school functionality status potentially affects the effectiveness of the reading interventions in schools, and therefore have an effect on learner reading performance.

Discussion
The following key lessons emerge from our experience developing, refining and starting the process of validating the school functionality tool. The lessons below provide guidance to evaluators embarking on a tool validation process. Original Research Firstly, it is important that there is sufficient time to develop, test and refine the tool. Assuming it is not possible to build the full process into one evaluation, because of cost and time limitations, building a tool over time and over a range of existing and relevant projects (where possible) can provide useful insights. There is a reciprocal advantage in that budgets from various evaluations can contribute to the development of a tool that can be used by a wider audience and, on the contrary, evaluation commissioners benefit from building on an established tool rather than starting over. Disadvantages include having to adapt the tool to serve the interests of different stakeholder groups and to suit the needs of different evaluations.
Secondly, it is critical to build in time and resources for a validation process from the beginning. Looking retrospectively at our process, establishing content validity required several rounds of reviews and pre-tests, and iterative tool refinement, to come to a point where the tool encompassed the correct constructs in line with the literature, government norms and standards, and with contextually appropriate scales of measurement and set criteria for standards. When designing or adapting a tool to context, it is advisable to plan for reliability and validation from the start. The process requires many team members with different skill-sets and technical specialists (in our case, tool development specialists, statisticians and education specialists) to assist in the process.
Thirdly, rigorous training of researchers is imperative. As previously explained, the tool relies mostly on observations, which are biased to the observer's frame of reference. Thus, it is important to establish that different researchers are collecting data in a consistent way. This involves rigorously training for the researchers and checking for inter-rater reliability. It typically involves a 3-day training process, where the first day consists of familiarisation with the tool and training on the ethics of collecting data in schools (including photographing children), the second day involves experiential learning where researchers collect real data on site and the third day includes feedback to researchers and revisions (if required) to the tool.
Finally, we learned that including sources of verification, in our case the option for photographs to be taken, was a key element to ensure consistency of scoring. For example, by examining photographs taken of the toilets, one can determine whether researchers are rating these in the same way. If this is not the case, there is a need to explore why not (e.g. do the researchers require more training? or, is the question or measurement criterion not clear?). Given the sensitivity of this type of data source, it is important that such data are adequately protected in line with the relevant laws and legislation.

Conclusion
Validity is an ongoing process over time (Benson & Clark 1982;Creswell 2012), and the deeper and more rigorous the analysis and greater the range of samples, the stronger the case for validity. Khulisa has started the process of validation, with substantial evidence towards the content validity of the tool, as documented in this article. Khulisa next intends to examine the construct and criterion-related validity of the school functionality tool. We are statistically analysing the internal consistency of the items within each of the domains in the tool and intend to conduct a confirmatory factor analysis to establish the construct validity of the tool. Once validity is fully established, we will look at whether the results from the tool indicate any significant differential treatment effects on learner performance. The results of these analyses will be written up for publication in future journal articles.