Towards defining and advancing ‘Made in Africa Evaluation’

of


Introduction
The field of evaluation in Africa is at a critical juncture as it faces new scrutiny and questions about its responsiveness to context and its sensitivity to the needs and realities of the continent's populations (Chilisa & Mertens 2021:241-253). Program evaluation defined by Fournier (2005) as: [A]n applied inquiry process for collecting and synthesizing evidence that culminates in conclusions about the state of affairs, value, merit, worth, significance, or quality of a program, product, person, policy, proposal, or plan. (pp. 139-140) without an agreement on its meaning. For example, in 2013 during AfrEA's conference in Yaoundé, Cameroon, there were very many views of MAE espoused by thought leaders without a common agreement on its meaning. In her landmark 2015 synthesis paper, Chilisa explored the concept's history, meaning, and application by examining the consensus (and dissensus) amongst some expert evaluators in the field. This article, commissioned by AfrEA with support from the Bill and Melinda Gates Foundation, picked up the thread from Chilisa and Malunga's (2012) Bellagio conference paper on the same topic. She discussed, for example, the centrality of relational epistemology, methodology, and axiology in MAE, as well as the importance of context. Chilisa (2015) moved the field towards conceptualising MAE to hedge against the proliferation of different conceptualisations of the idea, using a Strengths, Weaknesses, Opportunities, and Threats (SWOT) analysis as her methodology. Her synthesis paper yielded notable results, one of which was the identification of potential ways forward for the MAE concept in Africa. Further, Chilisa's (2015) work posited that MAE should challenge the current practice of designing evaluation tools that do not pay attention to contexts in Africa and to recognise and promote African diversity manifesting itself through different cultures, customs, languages, histories, and religions. MAE must challenge the current evaluation practice that leaves stakeholders wondering how exactly the community is benefitting from evaluation. It must challenge the evaluation that shows great successes of an intervention whilst the reality on the ground is entirely different. It must also question the marginalisation of African data collection tools like storytelling, folklore, talking circles, music, dance, and oral traditions.
She further asserted that MAE must be a tool for development. It should address the gap between the way we think development works and the way evaluation is done. To address this, evaluators should be more open about African peoples' beliefs and values about what constitutes development in Africa (Chilisa 2015). Another view from Chilisa's findings is that MAE has knowledge contribution from African history, Political Science, Anthropology, Sociology, African Philosophy, African Oral Literature, and African Knowledge Systems. This makes it a transdisciplinary concept. Her study also established that most evaluation experts interviewed agreed that worldviews and paradigms about the nature of reality, knowledge, and values of the African people should constitute MAE methodology.
She also signalled some discord and unresolved questions regarding the contours of the concept, due in part 'to whether scholars can originate evaluation practices and theories rooted in African world-views and paradigms and indeed if African paradigms exist ' Chilisa (2015:27). As such, her effort stopped short of offering a concise definition of MAE. This study sought explicitly to build on Chilisa's foundational work to contribute to MAE's refinement and to ascertain the extent to which it is gaining acceptance and prominence amongst those engaged in evaluation efforts across Africa.
Theoretically, this study was informed by a postcolonial critique of the development project and neoliberalism (Fanon 1965;Harvey 2007;Tiffin 1995). Our analysis also drew on decolonising and indigenous methodologies as used in research and evaluation (Chilisa 2012;Cloete 2016).
To further argue for the need for MAE, Uwizeyimana (2020) promoted the need for Africa-rooted evaluation and recognises the African ubuntu philosophy as the bedrock for Africarooted evaluation. The ubuntu philosophy is the heart-felt connection and interconnectedness between the African people. It expresses how Africans own and do things in a collectively rather than individually. This philosophy is known by different names in different African countries and cultures. For example, it is known as botho to the Sotho and Tswana people of Lesotho, Botswana, and South Africa. The Yoruba people of Nigeria exemplifies this philosophy as ajobi and ajose philosophies (Omosa 2016). In arguing for mainstreaming of the ubuntu philosophy, Uwizeyimana (2020) opined that even though the philosophy is integral to the concept of MAE, it has weaknesses and serious consequences for African evaluation, which must be addressed for it to effectively become the bedrock of Africa-rooted evaluation.
We postulate that MAE represents an alternative to the Western-centric epistemologies and ontologies that characterise the neoliberal 'development project' (McMichael & Weber 2020). Many critiques of neoliberalism in development have examined its failure in Africa through the lens of postcolonialism (Lundgren & Peacock 2010). Postcolonial indigenous theory and decolonising and indigenous methodologies present a worldview that deconstructs neoliberal 'truths' and norms that have heretofore been presented as normal and natural, showing them instead to be colonising and inequitable (De Sousa Santos 2018;Tamale 2020). Informed by this framing, this study addressed the following research questions: 1. How do thought leaders in the African evaluation field define MAE? 2. How are MAE principles operationalised and presented in evaluation reports? 3. What next steps do African evaluation thought leaders believe are necessary to advance the MAE concept?
In the remainder of this article, we present the methods, results, and implications of our empirical research aimed at addressing the research questions. Whilst the field of evaluation has benefited from an increase in conceptual literature on MAE in recent years (Chilisa & Mertens 2021), we posit that the social and scientific value of this article derives from the fact that the research reported here takes an empirical approach designed to help the field move towards a clearer conceptualisation and definition of MAE, thereby positioning the concept for further uptake, use, and study.

Study design
This multiple methods study employed a Delphi technique, interviews, prioritisation of actionable items by the same panel of expert used for the Delphi technique, and document analysis of evaluation reports. The Delphi approach involved two rounds of online surveys and analysis of participants' statements (Hsu & Sanford 2007). In addition to the Delphi process, our Delphi participants completed an online questionnaire to garner additional data on topics such as participants' perception of the needed next steps in the process of developing the MAE concept that extended beyond the data collection format used in the Delphi portion of the study. Then, we interviewed two additional experts. These in-depth interviews with two additional evaluation experts strengthened our Delphi-related results. Finally, we reviewed evaluation guidance documents and reports to seek evidence of whether and how the aspects of MAE were operationalised therein. Figure 1 is a schematic diagram of the various methods used in this study and how they relate to each other.
Multiple methods generally strengthen research designs because specific strategies have both strengths and weaknesses (Brewer & Hunter 2006). Mertens (2008) argued that using multiple methods helps in developing credible and accurate measurements and can increase validity. It achieves this by triangulating sources and capitalising on the strengths of each method employed (Creamer 2017 We selected a Delphi analysis to address our first research question for several reasons. Firstly, the iterations embedded in use of the Delphi technique made it possible to build consensus or dissensus (Hsu & Sanford 2007) amongst those we surveyed concerning the MAE concept. The method's feedback process provided an opportunity for the experts involved in our Delphi process to reassess their initial judgements. Secondly, the approach is well-suited to gather detailed data from experts in a way that promotes their broad participation because expert respondents could be located anywhere geographically. Finally, the use of the Delphi technique as a temporally flexible approach allowed participants time for reflection concerning their responses and therefore helped to reduce pressure on them (Dalkey 1972;Hsu & Sanford 2007).
We purposively selected 17 prospective participants. We reached out to those individuals using publicly available email addresses, and seven of the 17 agreed to participate in the Delphi portion of this study. Two additional individuals agreed to an in-depth interview; their comments and insights added validity to our findings. For both the Delphi phase and the interviews, we selected from amongst a group of potential participants who met the following criteria: (1) Top management bureaucrats, who are evaluators or evaluation commissioners in African governments, multilateral intergovernmental organisations (e.g. United Nations International Children Emergency Fund [UNICEF]), NGOs, and bilateral development entities in Africa; (2) African evaluation thought leaders, based on their work with AfrEA and previous championing of MAE; or (3) Have conducted evaluation research and have written explicitly or indirectly about MAE in their publications. Additionally, we required that invited participants have had at least 10 years' experience in evaluation research or practice.
We used email to invite and subsequently communicate with respondents for this online study. A follow-up email was sent at the end of five business days to remind individuals who had not replied to their initial invitation to respond. We sent that reminder twice, using a modified approach to survey reminders as proposed by Dillman, Smyth and Christian (2014). When we had recruited seven participants, enough to form our panel of experts, we sent an introduction letter, the first round of a web-based Qualtrics survey and a consent form to all who had indicated interest. We sent a reminder email at the end of five business days urging participants to complete the survey. We gave participants 10 business days to complete that effort, after which we began to analyse the data provided by the surveys. We obtained participants' email addresses through publicly available databases, such as the Voluntary Organization for Professional Evaluators (VOPEs), other published materials, and the AfrEA website.

Delphi method
The first round Delphi survey provided the expert panel a list of 10 statements describing MAE. To construct those statements, we identified prominent and common concepts that previous authors had employed in the salient literature to describe MAE. We list those statements, shown as S 1 to S 10 , and their sources in Table 1. We sought in this round for participants to rate the relative importance of the derived MAE descriptors on a scale of one (least important) to six (highly important).
In addition to completing their importance rankings for each descriptive statement, we asked each respondent to provide up to five additional descriptors that, in their view, described MAE, but that were not captured in the original 10 depictions. These additional statements were then included in subsequent rounds (indicated as B 1 , B 2 , etc.) to also be rated by the rest of the expert panel. The statements rated in the second round are a combination of statements where there were dissensus and additional statements suggested in round one. The second round followed the same procedure for round one.

Developing consensus criteria for both rounds
We defined respondent consensus as the extent to which individual scores demonstrated agreement concerning an item's level of importance (Vo 2013). More specifically, we calculated the variance of ratings for each statement as well as the average variance amongst all descriptions evaluated. For this study, we defined consensus as having been attained when the variance for a statement was less than the average variance of all descriptors judged in that round. Conversely, we judged that disagreement remained amongst our respondents when an individual statement's variance was greater than the average variance of all of the descriptions evaluated. Statements with very low variance or deviation from the mean suggested consensus. We constructed a twoby-two matrix to plot the relative mean scores and variance scores for all statements (see Figure 2). Statements found to have high consensus would then appear in quadrants I and II in Figure 2. We included statements on which disagreement remained in a second survey for re-rating by our expert respondents. Round two followed the same process of analysis to determine the level of importance attached to each remaining statement by our study participants.

Developing a working definition of Made in Africa Evaluation
After analysing both rounds of surveys, we noted the final mean and variance values for each statement on which our respondents reached consensus. For example, the final mean value for S 8 at the end of the second round was 5.20 (data and results are shared in full in the Results section, below). We also plotted the final mean value of each consensus statement against the final variance value in a scatter plot diagram. Quadrants I and III in Figure 2 contain statements with high mean values and hence, high importance in our respondents' view. Meanwhile, quadrants I and II contain statements with low variance values, and therefore, consensus. More importantly, quadrant I offers statements with high mean and low variance values. In other words, panellists reached consensus and accorded these descriptors a high level of importance in both survey iterations.
We performed a content analysis on the Quadrant 1 statements. Two coders jointly selected a central theme for each statement and employed those as codes for each (Corbin & Strauss 2015;Glaser & Strauss 1967). Taken together those constructs constituted the elements used to elucidate a working definition of MAE, shared in the results and discussion sections.

Interviews
To triangulate and augment the validity of the findings from the Delphi portion of our analysis, we interviewed two additional African evaluation experts. These participants agreed to an individual interview to share their perspective on the MAE concept, also because they were not available to participate in the full-fledged Delphi study. We conducted a semi-structured online (via video Skype) interview with each individual. Whilst a larger sample of interviewees would have added still further nuance to the study, we appreciate the triangulation and thick description provided by even these two in-depth interviews.
We transcribed the two audio recordings in Microsoft Word 2010. Following Corbin and Strauss (2015) and Glaser and Strauss (1967), we undertook whole text analysis of our transcripts. Our participants expressed emotions and we paid attention to their tone of voice and emphasised phrases. We read each transcript twice and identified text relevant to our research questions by highlighting and noting it in the margin. Each segment of text comprised an excerpt, and excerpts consisted of one or more sentences or paragraphs. A full sentence was the smallest unit of analysis. In the data corpus, whenever two or more excerpts communicated the same information, we included only one in our analysis.
We scrutinised each excerpt and assigned one or more codes to capture its meaning. We compared and contrasted each code with the others we had assigned to identify distinctive properties. We then organised the resulting codes into a list and sought to develop categories that could encompass more than one code. Finally, we examined the contents of each category for coherence. We continued this process until we were satisfied that each of our categories was unique. The results of the interview process are presented in the Results and Discussion section, alongside the rest of the broader study's results.

Document analysis
We asked our Delphi survey respondents to suggest links to reports they had written or of which they were aware that employed the MAE concept to address our second research question. This method was used to help address Research Question 2, How are MAE principles operationalised and presented in evaluation reports? However, when our participants did not suggest any reports, we purposively selected six evaluations reports from the databases of recognised evaluation funders and commissioners that potentially provided evidence of applying the MAE concept. We examined evaluation reports from the archives of the United States Agency for International Development

Actionable items prioritisation methodology
To address our third research question, in addition to asking our study participants to rate MAE related statements in order to enable us to develop a working definition of the construct, in a separate survey, we also asked our experts to evaluate the importance and feasibility of 12 actionable items to further develop and promote the MAE concept, as enumerated by Chilisa (2015). Because Chilisa presented these steps originally to chart a possible path forward for MAE, we used our empirical study as a way to build on and extend her 2015 work. Chilisa's action steps are represented in Table 2 by statements W 1 to W 12 . Note that this is not part of the Delphi technique, but rather was a separate survey administered to the same sample or experts who participated in the Delphi portion of the study.
We used Microsoft Excel to calculate the means for each of the statements concerning the criteria we asked our respondents to employ to evaluate each. We created a slope graph, which is presented in the Results and Discussion section, depicting the relationship between the assigned mean scores for the level of importance and the level of feasibility of the 12 actionable items.

Results from the Delphi process
Panellists rated the importance level of a total of 15 statements as a part of the Delphi process that addressed a range of issues linked to the MAE concept. In the end, the panellists ranked four statements (S 5 S 7, S 8, and B 3 ) as most important, as shown in Table 3. As a reminder, the 'S' statements were from the original Delphi round's prompt, whilst 'B' statements were generated by expert participants themselves and then included in Round 2. Because our first research question was to define the MAE concept more effectively, we derived the central theme of each of the four statements and used those descriptors as codes for each (Corbin & Strauss 2015;Glaser & Strauss 1967).
Each idea presented above is considered central to each of the statements. Taken together, these animating ideas as represented in the central ideas/codes form a working definition of MAE: Evaluation that is conducted based on AfrEA standards, using localised methods or approaches with the aim of aligning all evaluations to the lifestyles and needs of affected African peoples while also promoting African values.
This new definition aligns with Cloete and Auriacombe (2019) in their critique of Africa-rooted evaluation as a good example of decoloniality. They portrayed that Africa-rooted evaluation as an evaluation approach that promotes African practices and aligns with the African identities and conditions. Furthermore, one aspect of our new definition of MAE is using localised African approaches with the aim of aligning all evaluations to the lifestyles of the African people. This aspect further strengthens the argument for the ubuntu concept as the main fabric of MAE as elucidated by Chilisa and Malunga (2012), Cloete and Auriacombe (2019), and Uwizeyimana (2020). These authors presented ubuntu as epitomising the sense of community, collectiveness, and love amongst the African people, which is consistent with our new definition that places premium on aligning evaluation to the lifestyle and needs of African people whilst promoting African values.

Results from the interview process
Seven central themes emerged in our interviews with the two evaluation experts: (1)  In short, this interviewee contended that there is a need to understand further the philosophy that undergirds localised knowledge to deepen awareness of its implications for MAE.
Furthermore, it is important to highlight that these seven themes align with the themes the Delphi panellist rated as important as discussed above. For example, one of the central  themes from the interviewees is the integration of AfrEA standards in MAE. This theme aligns with conducting evaluation studies that are consistent with evaluation standards developed and used by the AfrEA. Also, the central theme that is focused on the relevance of CRE in MAE aligns with adapting evaluation work to the lifestyle and needs of the African communities, where evaluation is conducted. This alignment shows a convergence between the central themes from the interviewees and the statements rated as important to the Delphi panel.

Results from the document analysis
The second research question of this study was to explore how MAE principles are operationalised and presented in evaluation reports. To illustrate the presence and distribution of each theme in each report we used a concept map, which appears as in Figure 3. The Figure suggests that the six evaluation reports align with AfrEA's evaluation standards and guidelines, by showing evidence of being realistic, prudent, serving the needs of the intended stakeholders, and conducted legally and ethically. These reports also align with the needs of the African people by showing evidence that they are meeting the needs of the African people. Furthermore, African values were evident and promoted in reports 2, 3, 5, and 6, whilst evaluators employed localised methods in numbers 2, 3, and 6. For example, report 2 showed the evidence of employing localised methods by using focus groups that comprised traditional leaders/Indunas; traditional healers; youths, and others employed as part of the methodology. Additionally, report 2 showed evidence of promoting African values by targeting traditional leaders/Indunas and healers as part of the methodology for the evaluation.

Results from the actionable items prioritisation process
For the third research question, the panellists considered Chilisa's 2015 12 actionable items (represented by W 1 -W 12 ), which she presented as way posts for refinement of the MAE concept. However, only statements W 4 (fund research on MAE and evaluation that may be used as a test case for MAE) and W 10 (review AfrEA guidelines in the light of the MAE approach) stood out for our respondents. These two statements have high mean scores for both their levels of importance and feasibility. These findings appear in Table 4 and Figure 4. Statement W 4 has a mean score of 4.43 and 4.00 for the level of importance and of feasibility, respectively, whilst statement W 10 has the same mean score of 4.29 for both its perceived level of importance and feasibility. Figure 4 is a slope graph that depicts the difference in mean scores that our respondents assigned for importance and feasibility for each of Chilisa's 12 action items. In this slope graph, both the mean scores for the level of importance and the level of feasibility of statement W 10 are the same, with a mean of 4.29 for both variables. This is represented in the straight orange horizontal line and this high mean score shows that the panellists consider the statement as important and feasible. In statement W4, we can see a difference of 0.43 between the mean scores of the level of importance and the level of feasibility. These high mean scores, 4.43 and 4.00 for the level of importance and feasibility, respectively, show that the panellists also see the statement as important and feasible. This is represented in the yellow line (sloping down from left to right).

Implications
The first result of this study, in response to Research Question 1, is the newly elucidated concise definition of MAE, which in turn lends itself to a number of other implications for evaluator training and capacity building, evaluation practice, evaluation policy, and research on evaluation. We address each of these implications next.

Evaluator training
As shown in our new definition of MAE introduced earlier resulting from our empirical study presented in this manuscript, the recognition of AfrEA and other relevant Volunteer Organizations for Professional Evaluation (VOPE) guidelines, the use of localised knowledge and approaches, the increased consideration of the lifestyles of populations of interest, and the promotion of African values are central to the concept of MAE. Previous efforts have sought to expand the field by teaching evaluation competencies to ensure that would-be evaluators possess necessary technical skill-sets (Thomas & Madison 2010). However, beyond acquiring such competencies, our findings illuminate the need for African evaluators to become deeply aware of African philosophies and values, as revealed across the continent. These thoughts were also alluded to elsewhere. Cram (2018) opined that as a way of becoming responsive and adapting their practice to African cultures and values, evaluators must seek to acquire adequate knowledge and become more aware of African values (Cram 2018). Cram (2018) encourages partnership between evaluators and the tribal members. Cram argues that evaluators should seek advice and feedback from tribal members in the evaluation process, which will further deepen their knowledge of the African cultures and values.
Another example is the philosophy of ubuntu introduced earlier ('I am because we are'). This philosophy is woven through the fabric of many African cultures and communities. In such communities, no single person can claim to speak for the entire community ( Evaluation practitioners in Africa should be trained to prioritise the promotion of African values and increased consideration for the lifestyles of the African people in their understanding of the theory and practice of evaluation. If this type of training is encouraged, it will reduce the influence of the Eurocentric models of evaluation which continue to deny the important place of Africa's rich history, context, and philosophy in evaluation.

Evaluation practice
Our finding from the third research question which is to review AfrEA guidelines in light of the MAE approach corroborates the need, expressed elsewhere, to review current AfrEA guidelines in the light of evolving definitions of MAE. This can potentially enhance MAE and, ultimately, yield better evaluation practice in Africa (Chilisa 2015). For continuous growth and development in any field, there is a need to revisit foundations and guidelines that constitute the field of practice and improve on them continually. The governing board of AfrEA might consider reframing AfrEA guidelines to align them with the current thinking on MAE.

Research on evaluation
As with every good nascent and emerging concept, the MAE will continue to be enriched. It will continually be shaped and framed by different perspectives and thinking so that we can start seeing changes in practice. One key finding from this study is the need for further research to operationalise localised methods and approaches. For example, what are specific examples of localised methods or approaches? What are the implications of methods involving storytelling, local courts, campfires, and proverbs? Also, what are the ways to actively represent and recognise these approaches in evaluation reports? Chilisa (2012Chilisa ( , 2015 and Cloete (2016) have contributed much to the exploration of these terms and approaches. However, the need remains for further research along these lines.

Study limitations
Delphi methodology conveys important advantages, but it also has its limitations. Questions are often raised about the accepted sample size for a good Delphi study. Also, because the Delphi methodology is iterative and sequential as a result of the layered feedback process integral to the concept and use of it, some uncertainty can arise about the process when the sample size drops during the study because of participant attrition. Notably, in this study, because of personal and other issues beyond their control, two panellists had to be excused during the second round of the survey, and this reduced the number of panellists from seven to five.
However, it has been empirically established that the sample size has minimal impact on the quality of data during a Delphi study. What is most important in a Delphi study is the level of training and knowledge of panellists about the subject matter. In particular, Akins, Tolson and Cole (2005) established that response characteristics are stable for a small expert panel. In other words, there is stability in response characteristics irrespective of the sample size. One final methodological quandary related to our use of the Delphi method is that it itself is not a Made in Africa approach. It is established by Western epistemological ontological, and methodological assumptions. Yet, whilst some may find it ironic to study African methodologies using a non-African method, we maintain that the tool was appropriate for this type of study, that is, helping to arrive at expert-based consensus. Also, further support of this view is evident in the argument of Cloete (2019) in his critique of coloniality. He argued that to totally reject Eurocentric research and evaluation approaches is rigid and totally misplaced. Instead, African evaluators must acknowledge the importance and validity of western research and evaluation approaches whilst using them as a supplement to indigenous evaluation approaches.
In addition to the established findings discussed above, this study included interviews with two other stakeholders who champion MAE. These interviewees were initially scheduled to be part of the Delphi panellists but opted out because of their busy schedules. These interviews provided additional perspective on the findings from the Delphi. The participants interviewed did not only offer their understanding and definition of the concept, but they also offered a critical viewpoint of the consensus definition developed from the Delphi.
Additionally, six reports were sampled to address the second research question, which are not a comprehensive reflection of all evaluations on the continent. As such, claims about the mainstreaming of MAE concept in Africa may not be robust. However, it is sufficient to address the question because the main thrust of the question is to test-run the developed consensus definition of MAE and explore some illustrative ways in which evaluation in African aligns with the principles of MAE.

Conclusion
This article's primary contribution to the field is a working definition, although tentative, of MAE, which other practitioners and scholars are invited to further test and apply. We posit that the definition shared in this manuscript is a significant accomplishment in evaluation theory in Africa, which will, in turn, influence the practice on the continent. Beyond coming up with a definition of MAE, which is a critical step in evaluation theory and practice in Africa, the evidence presented above points to the need for the concept of MAE to be mainstreamed by making sure that it gains acceptability, prominence, and wider use amongst African evaluators. This can be one step in generating new possibilities for praxis in the face of the dominant power-knowledge assemblages that characterise postcolonial contexts.
It is important to note that this study made a step towards this by investigating how the concept is presented and operationalised in evaluation reports. Additionally, from the study, the panel of experts prioritised the next level for the concept in Africa which also move the concept towards its mainstreaming. However, even though these are important considerations for mainstreaming the concept in evaluation practice, there is a need for further research that will ingrain and mainstream the concept and make sure that it gains wider coverage, acceptability, prominence, and use in the African continent. Lastly, as with every emerging concept, it is expected that the findings from this investigation will contribute to improving evaluation theory and practice in Africa, although they will also require further critical testing and feedback. Insights gained from future research on the MAE concept will inform the needed efforts to more clearly describe and articulate the concept, enrich the discipline and ultimately improve practice and policymaking.