What works for poor farmers? Insights from South Africa’s national policy evaluations

Background: Growing numbers of developing countries are investing in National Evaluation Systems (NESs). A key question is whether these have the potential to bring about meaningful policy change, and if so, what evaluation approaches are appropriate to support reflection and learning throughout the change process.Objectives: We describe the efforts of commissioned external evaluators in developing an evaluation approach to help critically assess the efficacy of some of the most important policies and programmes aimed at supporting South African farmers from the past two decades.Method: We present the diagnostic evaluation approach we developed. The approach guides evaluation end users through a series of logical steps to help make sense of an existing evidence base in relation to the root problems addressed, and the specific needs of the target populations. No additional evaluation data were collected. Groups who participated include government representatives, academics and representatives from non-governmental organisations and national associations supporting emerging farmers.Results: Our main evaluation findings relate to a lack of policy coherence in important key areas, most notably extension and advisory services, and microfinance and grants. This was characterised by; (1) an absence of common understanding of policies and objectives; (2) overly ambitious objectives often not directly linked to the policy frameworks; (3) lack of logical connections between target groups and interventions and (4) inadequate identification, selection, targeting and retention of beneficiaries.Conclusion: The diagnostic evaluation allowed for uniquely cross-cutting and interactive engagement with a complex evidence base. The evaluation process shed light on new evaluation review methods that might work to support a NES.


Introduction
written largely from the perspective of those working closely within NESs. Documentation has accordingly focused on (1) assessing the effect of the system on building evaluation into the culture of government departments; (2) describing the procedural and institutional arrangements needed to support a functioning NES; (3) assessing the extent to which users within the system perceive the process to be affecting policy change. Whilst these reflections are important; (4) the extent to which these evaluation systems encourage the emergence of approaches that are suitable to evaluation in the South African context should be considered as well (Goldman et al. 2018).
The contribution of this article is to reflect on these two latter measures of evaluation system success, using the lens of a particular evaluation we participated in under the auspices of South Africa's NES. This was an evaluation that we undertook as commissioned external evaluators working on South Africa's 2014/2015 National Evaluation Plan (NEP), referred to at the time of commissioning as a 'Diagnostic Evaluation of the Government-Supported Smallholder Farmer Sector' (see 'Evaluation Approach' below). The evaluation was to inform the review of some of the most significant policies and programmes implemented by the South African government to support small-scale farmers since 1994.
This article is structured as follows. Firstly, we provide a background to the historical and political context of the evaluation. We then outline the evaluation method we adopted (and adapted) over time, given that the term 'diagnostic evaluation' is not well known outside the South African NES context, and that scholarship is scant regarding how this approach might be operationalised and further redefined. We then move on to highlighting the knowledge contribution of the evaluation. The evaluation we undertook was unprecedented in its scope -drawing on a very rich evidence base of internal and external evaluations to provide critical insights into what works for small-scale farmers in different contexts. Drawing on this, we critically engage with the plausibility of many of the assumptions that characterise the government's approach to small-scale farmer's development. Drawing parallels to the growing global evaluation evidence base, we comment on how many of these assumptions are common to smallholder farmer development programmes in other developing countries. We conclude the article by reflecting on the role the evaluation has played in furthering our understanding of methods suitable to support this kind of critical engagement within the framework of a NES. We use this reflection as a means to identify what could have been done differently by all partners in order to have enhanced uptake and utilisation of the evaluation findings even further.

Evaluation scope
Policies and interventions to support the pro-poor development of smallholder, small-scale or otherwise emerging farmers are an important part of the development agenda of most developing countries. In South Africa, the post-apartheid government has introduced many interventions targeting a broad range of sectors and amounting to billions of US dollars in expenditure. But have these interventions worked to improve the lives and prospects of emerging farmers? Unfortunately, the consensus in South Africa is not overwhelmingly positive. As the years progressed in the wake of the first democratic elections, policies and programmes were under increasing scrutiny. The growing consensus was that the interventions offered to date had not resulted in improved participation of smallholders and black farmers (DOA 2009:56).
With the inception of a South African NEP in 2012/2013, an officially endorsed national evaluation mechanism was finally available that held promise to systematically address these emerging questions around policy effectiveness. Between inception and 2016, South Africa's Cabinet commissioned no fewer than nine evaluations spanning the most important state-sanctioned interventions in support of small-scale farmers over the past two decades. Table 1 presents a highlevel summary of five of these nine national programmes that were subject to review under the auspices of the NEP between 2012 and 2016: the Comprehensive Agricultural Support Programme (CASP 2015); Comprehensive Rural Development Programme (CRDP 2013); Micro Agricultural Financial Institutions of South Africa (MAFISA 2016); Land Recapitalisation and Development Programme (RECAP 2013) and the Restitution Programme (2016). These five programmes represent some of the most important efforts over the past two decades by the South African government to realise their broad post-apartheid vision of eliminating poverty and reducing inequality by promoting rural development and land reform. Interventions ranged from on-farm input and infrastructural interventions, to microfinance, extension support, land restitution claims and value chain development.
Although individual departments were mandated as part of the NES to develop improvement plans on the basis of approved evaluation reports, personal communications with those working within the NES suggested to the evaluation team that there had been challenges in implementing these improvement plans. Part of the problem was that there was so much overlap in the mandates, objectives and activities of the different national 'programmes' subject to the NEP evaluations. Moreover, many of the individual evaluation reports were strongly emphasising the need for a coordinated, holistic response -in other words, they were calling for a fresh, integrated solution.
From this realisation, another NEP evaluation was commissioned -a diagnostic evaluation of the smallholder farmer supporting sector as a whole. This evaluation would aim to develop the basis (diagnostic) for a more integrated and ultimately evidence-based set of solutions that draw on a more comprehensive understanding as to what is working, and what is not, on the whole for small-scale farmers across South Africa. As the service providers commissioned to undertake this evaluation, it was our task to operationalise this work.

Policies under review
Historically, the Comprehensive Agricultural Support Programme (CASP) was the oldest programme under review. Soon after its launch in 2004, government began to provide smallholder farmers with financial services through the Micro Agricultural Financial Institutions of South Africa (MAFISA) programme, the financial pillar of CASP. Norms and standards for agricultural extension and advisory services were established in the same year. These were applied in the design of the Extension Recovery Plan (ERP) implemented by the Department of Agriculture, Forestry and Fisheries (DAFF) in 2008 to improve extension across the sector (DAFF 2011). In 2008, the Land and Agrarian Reform Project (LARP) was initiated in response to the low rate of transfer of land by other programmes since 1994 (DOA 2008). This was complemented by the Comprehensive Rural Development Programme (CRDP) from 2009 -the flagship of the new Department of Rural Development and Land Reform (DRDLR). In the same year, the Revitalisation of Smallholder Irrigation Schemes programme started and ran until 2014. Alongside these initiatives, the Land Recapitalisation and Development Programme (RECAP) was implemented in 2010 to replace previous forms of funding for land reform including support grants for farmers in distress, coupled with a revised Strategic Plan for Smallholder Support (DAFF 2013).
In total, the NEP evaluations for the review periods reported an expenditure across all five projects of R32.76 billion (US$2.45bn). The full extent of some expenditure remains unclear. In some of the projects, for example, with CASP and RECAP, beneficiary numbers and per-capita expenditures could only be inferred from the sub-samples of projects and participants consulted in the evaluation. For MAFISA, the records on loan beneficiaries were incomplete, and only the number of loans disbursed (and not number of beneficiaries) could be estimated.

Evaluation approach
The Terms of Reference made it clear that the main purpose of the evaluation was to help make sense of the lessons from the five preceding NEP evaluations as well as the wider (global) evaluation evidence base. This assessment, the Terms of Reference for the evaluation explained, would ideally pave the way forward for a revised policy framework for government support to smallholders. No new primary empirical data were to be collected. Rather, the idea was to initiate an evaluation process that would help end-users make sense of the existing evidence base vis-à-vis the root problems addressed, and the specific needs of the target populations.
The Terms of Reference labelled this as a diagnostic evaluation. To the best of our knowledge, this term is not well known in the evaluation literature outside of South Africa. Most of the information in the public space on the evaluation approach comes from the guidelines provided by the South African Department of Planning, Monitoring and Evaluation (DPME). These guidelines describe a typically ex-ante evaluation approach, defined as follows: Diagnostic evaluation is intended to help a programme manager to design a policy, project, programme or plan, or to revise these once they have been operating for some time. The purpose of the diagnostic evaluation is to provide empirical evidence to a programme manager or policymaker of the root causes of a particular problem, situation or opportunity, and to provide the evidence on which to base a strong theory of change and design for a new, or revised intervention (DPME 2014:2).
In this sense, a diagnostic evaluation holds similarities with what many evaluators would identify as a needs assessment approach, although the emphasis on identifying possible solutions suggests a theory and design evaluation element too. Development of a 'diagnostic' evaluation method posed considerable challenges. As others have pointed out, the literature on NESs remains predominantly authored by Western scholars, which creates a challenge to finding useful frameworks that speak to emerging evaluation trends in an African context (Goldman et al. 2018). The 11-page guidance document provided (i.e. DPME 2014) outlined a series of broad phases (1) understanding the current situation; (2) understanding the root causes that contribute towards a problem; (3) identifying possible solutions; (4) testing the plausibility of these solutions. The guidelines were, however, vague on specifics around how these phases could be operationalised, making somewhat cryptic references to needs assessment, forecasting, systematic reviews and also techniques related to the strategic planning domain, such as root cause and situational analysis. In short, there was little established precedent for how to conduct this evaluation, and the sheer scope and scale of the interventions under review were daunting. There was also the variety of empirical evidence gathered by many distinct evaluation processesbut at the same time a lack of budget, scope or mandate to collect new empirical data (or re-analyse the existing secondary data). Finally, there was the subtext in the Terms of Reference itself: the task, above all, was to lead a participatory and flexible evaluation process, which would essentially assist policymakers in making sense of so many evaluation reports. We needed a systematic approach to first clarifying, and then testing, assumptions about how these programmes were working to address specific problems in the given context. We were also aware that the adopted method would need to fully appreciate the complexities of the content we were working with. We could not just aim to focus on simple interventions and a linear process, but rather we needed an approach fluid enough to allow us to capture the reality of this complex situation.

Evaluation method
Being evaluators and not strategists by training, we leaned towards a theory-driven evaluation approach (Chen 1990;Weiss 1997), and developed a method that would allow for a certain amount of adaptation and flexibility ( Figure 1). With the four-step process outlined in the (DPME 2014) guidelines broadly in mind, we operationalised the initial step around understanding the current situation by means of constructing so-called 'programme theory' models. These models would outline how and why the five major programmes and policies we were tasked to review are assumed to bring about positive changes for small-scale farmers. Modelling a programme's theory goes by many other names and definitions (including the ubiquitous theory-of-change), but we took as an initial starting point the framework suggested by (Rossi, Lipsey & Freeman 2004). This framework views programme theory as having two aspects: an implementation theory and an impact theory. The implementation theory, in turn, can be broken down into three components; (1) organisational models, which outline the proposed implementation structures that should be in place for the policy or programme to work as planned; (2) service delivery models, which articulate the steps through which the programme intends to reach the target beneficiaries and deliver services to them; (3) service utilisation schematics, which further clarify the steps (from the beneficiaries' perspective) that they are expected to go through to access, receive and then to disengage from the service. Impact theory is similar to a theory-of-change, in that it elucidates the causal impact logic of the programme by clarifying the key activities and their intended effects in the short, intermediate and long-term.
As it turned out, the 'initial' step of eliciting programme theories for the five existing programmes subject to NEP review proved to be one of the most difficult (and also enlightening) parts of the evaluation process. Initially, the evaluation team used the service utilisation models to clarify the target beneficiaries of the programmes in an effort to clearly articulate how beneficiaries might be identified, engaged with and finally transitioned out of services (Rossi et al. 2004). This is because in a theory-based evaluation, a programme or policy is deemed ineffective if it serves the wrong population or is serving an insufficient number of the target population (Chen 2005:78). In other words, 'if the problem was incorrectly identified, or the focus is on symptoms not causes, then inappropriate mechanisms, the wrong beneficiary group or ineffective service delivery instruments may be chosen' (DPME 2014:2). For the implementation theory, if even the programme designers are unclear as to who the target beneficiaries are, and how they are supposed to be identified and recruited into the programme, one could safely anticipate (even without the benefit of an implementation evaluation) that one might encounter programme implementation issues. For the impact theory, we would expect even in a complex system to recognise the logical articulation of the mechanism that connects a given activity with outcomes for each target grouping. If this was missing, it speaks to a need to pay particular attention to design issues. As Rogers (2000) points out, one cannot as an evaluator declare if a programme is working if you do not understand how it is supposed to work.
Our next task was to use a deductive content analysis of the existing NEP evaluation reports to test the plausibility of these initial models more systematically. An excel spreadsheet was created with the five NEP programmes on the horizontal axis and programme theory elements on the vertical axis. The latter were target group, implementation and funding mechanism(s), organisational component(s), specific planned activities and determinants and key outcomes. Content from the NEP evaluation reports was then coded according to each of these corresponding elements in order to systematically assess the programme implementation and impact theories against the existing evidence base of NEP implementation and (where available) impact evaluations from the global evidence base.
Concurrently with this systematic assessment, we held three stakeholder engagement workshops co-facilitated by the DPME, as part of the diagnostic evaluation method. Stakeholder groups who participated in these engagements included government representatives from DAFF, DRDLR and DPME, academics from local universities and representatives from the Lima Rural Development Foundation, a South African nongovernmental organisation. Representatives from national associations such as the African Farmers Association of South Africa, the National Agricultural Marketing Council of South Africa and the National Emergent Red Meat Producers' Organisation also participated.
With stakeholders, problem statements were developed to better understand the specific needs of various types of smallscale farmers (target beneficiaries). After some discussion, it became clear that a typology would be indispensable in providing a viable conceptual framework for these programmes. By consensus, the typology presented in Table 2 was adapted for this purpose.
The purpose of distinguishing the categories outlined in Table 2 was not to silo small-scale farmers, but rather to recognise that there is a continuum of support services that differ according to need. Farmers will naturally move in and out of these categories (not always linearly), out of farming altogether, or in some cases fall into more than one category at a single point in time, for example, where a smallholder supplies both tight and loose value chains. However, understanding which category farmers are located in ensures that farmers receive the services that they need at a particular point in their development, which is more useful than adopting a universal approach.
The idea to define problem statements vis-à-vis specific farmer typologies or categories of target beneficiaries was a step that had not actually been anticipated a priori by the evaluation team. Rather, this was a need that emerged from our collaborative engagements, and the method followed inductively on from this suggestion. This culminated in the development of typology-specific programme impact theories about what interventions stakeholders currently understand to be required to activate change pathways for key target groups. The plausibility of these typology-specific impact pathways would then be assessed not only against the NEP evaluation content matrix, but also against the external (global) evidence base regarding what was working and not working to bring about desired changes to specific farmer groupings. This global evidence base was accessed by means of a narrative literature review, the content of which was revised several times after iterative feedback from stakeholders and the DPME.

Programme coherence
The process of iteratively building (and then scrutinising) programme theory models revealed much about shared understandings of programme design and purpose. Whilst all the programmes under review clearly defined both outcomes and target groups, the connections (whether linear or not) between target groups on the one hand and interventions and outcomes on the other hand were often incompletely defined. This phenomenon was frequently commented on in the NEP evaluation reports. For example, whilst (MAFISA 2016) loans were found to be a positive incentive to attract new entrants into farming, the mechanisms that linked microcredit access for smallholder farmers to the anticipated outcomes of the intervention, such as job creation, entrepreneurial development, income generation, sustainable livelihoods and food security, were not even loosely defined. Not surprisingly, a major barrier to success identified consistently throughout the MAFISA evaluation was that in the absence of on-site technical assistance and genuine mentorship, new entrants are likely to fail (MAFISA 2016). In the (RECAP 2013:11) evaluation, the evaluation team frequently commented on the lack of common understanding of RECAP and its objectives amongst all stakeholders. As a result, it was hardly surprising that the evaluation concluded that 'RECAP is not appropriately designed to achieve its intended objectives. The objectives are too ambitious, with most of them not directly linked to the programme'.
An additional insight from the diagnostic evaluation was that a poor understanding of who the target audience was tended to also be implicated in implementation failure. With the (CASP 2015) evaluation, the commissioned team expressed concern that the selection of the beneficiaries was poor and that CASP needs to develop proper selection criteria. The concern regarding selection and its potential effect on the efficacy of the programme was detailed further later in the report: [I]n such projects, it is stated that some beneficiaries are not committed to farming, and only join to the projects to benefit from government grants. This is said to lead to poor or lack of participation in project related activities (CASP 2015:65).
The criticism was not limited to CASP. The (RECAP 2013:12) evaluation found that 'lack of clarity on the selection criteria for beneficiaries/projects has resulted in the inclusion of beneficiaries/farms that did not really need to be assisted'. The authors of the evaluation went on to state the following: [W]ithin provinces, project officers and provincial government officials responsible for RECAP do not seem to agree on the number of projects/beneficiaries targeted for recapitalisation. This difference of opinion on the number of targeted beneficiaries also exists between provincial and national government officials (RECAP 2013:12).
Cases were identified in the RECAP evaluation where it was difficult to understand how some beneficiaries came to benefit from the programme, because of a clear lack of need. This finding was flagged as of great concern to the evaluation team and denoted as 'a considerable wastage of public funds'. (RECAP 2013:12). Similarly, the (MAFISA 2016:50) evaluators concluded that 'the MAFISA loan distribution is not aligned to the provincial profile of smallholder farmers' and that, 'the extent to which MAFISA reached its target population appears limited' (MAFISA 2016:ix). Specifically, despite MAFISA's stated objective to achieve equitable distribution of funds across a range of eligible farmer groupings, there was a bias towards larger loans, and over half (56.9%) of the MAFISA loans were used to finance larger livestock production. Vegetable production represented just 2.2% of the MAFISA loan book, and poultry less than 1% (MAFISA 2016:38). It was also notable that whilst marginal groups were part of the MAFISA target population, the service delivery and utilisation plans failed to show how these groups were to be identified and receive services. Similar comments were noted in the CASP and CRDP evaluation findings, which showed that marginal or vulnerable groups were not reached to the degree intended (CASP 2015; CRDP 2013).
These insights and others suggested that future programme design be better aligned with the contextual needs of the diversity of the farmer categories shown in Table 2. This emerging finding was in keeping with concordant voices which had hitherto lacked the convincing empirical evidence base needed to lend weight to their convictions. In the musings of Machethe (2004:11), it certainly seems unreasonable to assume that the government's 'one-size-fits-all' approach would work, as all farmers should not be assumed to be working towards the same objectives.
We next shift towards explicitly outlining some of the most important knowledge contributions of this evaluation. In the interest of brevity, we focus on the two service http://www.aejonline.org Open Access categories that featured most frequently and prominently in the implementation plans of the NEP programmes we reviewed: extension and advisory services and financial services.

Knowledge contributions in key intervention categories Extension and advisory services
The first question we asked was: How plausible are the impact pathways identified for these services? The literature review did indicate that extension services are very likely only effective if delivered at very high intensities (see [Waddington et al. 2014] e.g.). Historically, in the 1980s and 1990s, economic analysis showed very low short-term returns of investment on public extension expenditure (only 3%), and in the long run, the returns might even be negative (Schimmelpfennig et al. 2000). Unfortunately, a critical NEP evaluation on South Africa's ERP was unavailable to the evaluation team at the time of this work -which might have enriched this debate further. However, individual extension support and training through provincial extension services had been offered in CASP, CRDP, MAFISA and RECAP, and the mixed results of these offerings were noted in the five NEP reports. CASP was found to be moderately successful in terms of increasing access to extension advice, with a 17% increase in the number of project managers (i.e. CASP projects or farms) reporting that they received extension advice after CASP (CASP 2015). However, the evaluation also found that for most farmers (63%), marketing their products did not improve as a result of CASP extension interventions. The CRDP (2013) evaluation found that the programme provided little advice to cooperatives in establishing crucial market linkages, whilst the RECAP (2013) evaluation found that only 39% of farmers interviewed indicated that their access to output and input markets improved.
As we conducted the diagnostic evaluation, there was a recurrent suggestion from stakeholders that mentorship might be feasible for market-oriented smallholders in formal (tight) value chains. The reasoning was that this target group does not face the same problems of access to inputs, linkages to market, extension and credit. Instead, they do face issues of lack of power and equity in the value chain. However, the RECAP (2013) evaluation found limited skills transfer from mentors to beneficiaries and the CRDP (2013:xi) evaluation found that 'cooperatives did not receive appropriate mentoring support'. Even if one assumes that appropriate support and skills transfer occurs, positive impacts on onfarm technical efficiency do not necessarily translate into higher farm income (Martey et al. 2015).
Ultimately, the weight and credibility we can attribute to these evaluation findings are muted by the descriptive nature of the NEP evaluation designs, and the lack of a strong counterfactual. Aliber and Hall (2012) in many ways anticipated these suggestive descriptive trends in a useful analysis that puts these figures into harsh perspective.
Responding to a policy statement that indicated that 5500 new extension officers were needed to address South Africa's strategic aims, the authors pointed out that this would require tripling the current size of the extension corps. Everything else being held equal, this would enable 5.4% of black farmers to receive attention as opposed to only 1.8%. The message seems clear that, for this intervention at least, strategic targeting was critical to making this investment garner returns. Ideally, standards of between 1:250 and 1:500 extension officers per farmer are set by South African policy and planning documents (DAFF 2005), but actual extension support coverage is unclear (Aliber & Hall 2012;Cousins 2013). If the government is going to take on this new area, it is likely that targeting needs to improve, and resources need to be increased or at least strategically focused. The dose and intensity of required interventions in relation to observed impacts on specific target groups then urgently need to be assessed in terms of overall cost-effectiveness.

Microfinance and grants
A range of options were proposed by stakeholders depending on the needs of each target group, however, as with infrastructural investments, precise mechanisms for access to financial capital must be explored further. If anything emerged from the diagnostic evaluation, it was a strong call for government to be more circumspect in future regarding the roll-out of financial interventions. Recent empirically strong impact evaluations around the area of famer financing have clearly indicated that finance does not unilaterally benefit all recipients. There seems to be clear evidence that credit access can be highly beneficial to those select target groups who have an interest in (and motivation to) to rapidly capitalise on the cash injection. This has been supported both in the success case studies from the MAFISA (2016) evaluation, and the broader evaluation literature. In one notable South African evaluation, Karlan and Zinman (2010) found convincing positive impacts on incomes, food consumption, community status and overall optimism of a loan (after it had been taken out and repaid) in a group of small-scale farmers who received a loan, compared to an equivalent counterfactual group of small-scale farmers who were identical in their motivation for, and eligibility towards receiving a loan.
During our stakeholder engagements, stakeholders proposed that financial support be provided through intermediaries (e.g. commodity associations) and through a mix of grants, capital expenditures and loans. There is some evidence to support this suggestion. For example, an impact evaluation from Mali (Beaman et al. 2015) explored the hypothesis that cash grants are more effective than micro-loans (of equivalent value) in supporting female smallholders. The evaluation results found that grants led to higher agricultural investments and profits than loans. Those who received grants cultivated slightly more land, and outputs and profit went up by 13% and 12%, respectively. However, households that declined loans but then received grants showed almost zero marginal returns on the grants. This evaluation provides support to the general finding within the MAFISA (2016) evaluation, which is that grants and loans effectively trigger the start-up of value-adding activities by select target groups who are in a position (either through their intrinsic entrepreneurial tendencies or through structural facilitators) to rapidly capitalise on the cash injection.
Whether the financing mechanism is grants, loans or a combination of the two -what is clear from both the NEP evaluations and the literature is that the success of these interventions rests on successful screening and targeting processes. It is critical that the right recipients be identified for these interventions -and the 'right' recipients here may essentially be those individuals who are on the cusp of a major transition, and only require an enabling environment. Given this finding, major critiques emerging from the MAFISA (2016) evaluation are especially concerning. These include observations that loan distribution was not aligned to the provincial profile of smallholder farmers, that the implementation process was not representative of smallholder grouping generally, and that the selection and targeting of beneficiaries were done with very poor accountability. The MAFISA (2016) evaluation further that only 43% of sampled loan recipients self-reported having paid back their loan. As a result, the evaluation strongly recommended that more oversight is needed as to how financial intermediaries implement MAFISA loans. However, financial intermediaries inevitably indicated that transactional costs for such screening and oversight are high and not recovered by the 7% interest received from MAFISA debtors.
As implemented, a model like MAFISA has proven to be unsustainable and requires top-up funding because of low repayments and the costs associated with technical assistance. The onus is now on the government to find a way in which cost-effective targeting, screening and loan distribution mechanisms can be identified and provided to financial intermediaries as a means to manage the financing mechanism. Such screening tools and procedures are well established in the commercial loan sector -and it seems reasonable that suitable targeting and management mechanisms could be developed in the line with a new model for financial aid distribution.

Policy implications
All social interventions, policies or programmes are based on an assumption of change. Despite a large number of policies and programmes implemented by the South African government over the past two decades, the first decade of policy implementation resulted in a general consensus has been that the smallholder farmer sector in South Africa has not progressed as intended or as required (Karaan et al. 2012).
The primary contribution of our research was to utilise the framework of South Africa's NES to systematically elicit, and then interrogate, the plausibility of assumptions of change in this sector even further. A second contribution relates to the overall appropriateness of the so-called diagnostic evaluation method to answering these questions, as well as the necessary adaptations, innovations and refinements that were required to ensure that this approach contributes significantly to a well-functioning NES.
Reflecting on the diagnostic evaluation, we regard it as a useful process that allowed for uniquely cross-cutting through and interactive engagement with a complex (and at times overwhelming) evidence base. It revealed fundamental issues in the coherence of several programmes and their underlying theories, which the government had been supporting for decades. As a result, we could identify a need for programmes to take the definition of their target groups much more seriously, by asking a simple set of questions: 'Who is supposed to be affected by the intervention, and what problem is this supposed to solve?' This, of course, is in some ways glaringly obvious, but the questions were constructively challenging to many stakeholders. For our purposes, the process sparked fresh conceptualisation of the very problems the government programmes were supposed to be addressing.
The second step, which involved an interrogation of the solutions that are deemed by stakeholders most appropriate given these problem statements, revealed a number of counter-intuitive findings, but more importantly opened up priorities for future evaluations and research. Most notably, extension services, which were regarded as important for Category C and D farmers, showed little current evidence for efficacy, with the literature suggesting that they would probably only be effective if delivered at much higher intensities than the government currently envisions. More rigorous cost-effectiveness evaluations would be valuable in this area. Microfinance and grants featured prominently in past (and future) government plans, but what was clear from both the NEP evaluations and the literature is that the success of these interventions rests on successful screening and targeting of beneficiaries. Stronger institutional mechanisms and a more selective targeting process are needed to operationalise this further. Finally, although on-farm infrastructural interventions were deemed critical for all typologies of farmers, the international literature clearly showed these interventions were risky -a finding hinted at in the NEP reports, but the evidence base remains weak in this regard.
Notwithstanding these successes, serious challenges remain. The report, which was approved in 2016 by the commissioning DPME, to the knowledge of the current evaluators has not been taken to Cabinet for approval. Moreover, South Africa's 2018 National Budget continues to show support for even some of the most contentious programmes considered by the national policy evaluations. Certainly, on the surface, it is hard to detect evidence for policy change. For example, the DAFF has subsequently developed a National Policy on Comprehensive Producer Development Support 1 , where the only reference to the diagnostic evaluation is in an annex, and even then, the policy references a diagram documenting a proposed theory of change for producer development support which does not appear in any of the approved evaluation documents. We hope that our emphasis on the knowledge contributions of this article will open the research findings to more broad public discussion that they warrant.
Whilst indeed there is still much that we do not know, at the same time to persist in the claim that there is insufficient evidence to move forward is not supportable. The evidence base is incomplete, but it is more compelling now that it has been assessed holistically in relation to key problems policymakers and stakeholders identified as in need of redress. There seem to be a lot of common problems and overlapsmistakes that were perhaps consistently made, some of which we have sought to highlight in this article. It is our hope that moving forward in future, discussing the importance of the different impact models developed for different categories of farmers should be an essential part of programme planning, as well as the planned evaluation cycle.
Notwithstanding this important knowledge, a key reflection from this evaluation process is that we still do not know much about impact. All five NEP evaluations were commissioned as impact and implementation evaluationsbut many of the shortcomings around implementation listed in this article made impact evaluation premature. That said, most of the national policy evaluations could speak comfortably about implementation and determined with quite plausible precision that programmes were often not implemented well, or as planned, or, even worse, that there was, in fact, at times not even a clear implementation plan against which to assess progress in the first place. As evaluators, we often argue that in the absence of successful implementation, questions about impact become futile; but even assuming that credible conclusions about impact could have been made from the national policy evaluations, our point around the political environment persists. If the MAFISA evaluation, for example, had been able to show that microfinance on the whole did not work relative to a counterfactual scenario -would this really have changed the South African government's position on the perceived need for a microfinance branch to their programming?
Looking forward, the DPME has indicated a desire and need to move towards reliably establishing impact of their policies and programmes. Whilst there is an increasingly rich discourse around what constitutes rigorous impact evaluation for policybased decision-making, in many contexts, this is likely to default to a push towards a rigorous counterfactual-based approach -as has been generously supported by international organisations -most notably the International Initiative for Impact Evaluation (3ie).
Whilst we cannot fault the move towards more 'rigorous' impact-orientated evaluations in principle, having completed this evaluation we have a grain of salt to add. Whilst a counterfactual-based approach may well be methodologically plausible to many evaluators and their peers -as well as possibly reassuring for international funders and observers -this approach in itself may not necessarily serve the needs of policy makers who have already committed on a policy basis to the roll-out of certain interventions across smallholder categories. Indeed, one of the most resounding messages from this experience is that most of the interventions described in the NEP policies are in fact not negotiable. Politicians quite understandably develop policies and determine their programmatic elements in response to the wants of their constituents. Evaluators who wish to influence policy need to begin their evaluations with the questions around the differences within and between recipients of key interventions in their ability to make the programme theory work for them. The value of this diagnostic evaluation was in many senses its explicit emphasis on aligning problems with specific types of farmers and scrutinising the plausibility of interventions to work for them, and in this we come quite close to Chen's (2010) emphasis on viable validity. Given what we have witnessed as evaluators working in the NES, there is a strong argument that more evaluations like this should be done.
The above realisation has been a profoundly important oneas it has direct bearing on our final reflections as to how cocreation of the evaluation agenda should really unfold in a NEP phase. We have come to doubt that it is really useful to answer if interventions work on an average for small-scale farmers, but rather what it is about the interventions that might cause it to work. The proposition, after Pawson (1997), is that evaluators need to begin their evaluations with some theory of the differences within and between recipients of key interventions in their ability to sustain the programme theory if impact evaluation is to prove truly useful to policy makers. This alignment of context, mechanism and outcome is, in the words of realist evaluation proponents, very different to the mere sub-group analysis that a rigorous impact evaluation might allow for (particularly given constraints of evaluation design and sample size) (Pawson 1997). The earlier cited example of the impact evaluation from Mali on microfinance versus grants by Beaman et al. (2015) is a good example of this kind of research. Through very clever experimental design, the evaluation was able to balance a counterfactual based impact evaluation approach with meaningful research that exposed the different causal mechanisms triggered in very specific contexts for different types of beneficiaries. However, reaching this level of rigor and sophistication in experimental evaluation design might well require that an entire, that is, (diagnostic) evaluation be done before we can even think about developing the terms of reference for a meaningful rigorous impact evaluation. Unfortunately, in NESs and elsewhere, few impact evaluations are commissioned in such a manner as to allow for this elaborate groundwork. Ofir (2013) asserted the following in her rousing call to action for revolutionising evaluation for development in Africa: [T]his is not about 'cultural sensitivity', but rather about the fundamental questioning of worldviews, frameworks and definitions on which evaluation theory and practice -and resultant development -have been built. The potential for new http://www.aejonline.org Open Access theories and practices that might revolutionise development evaluation is not yet quite clear, but fledgling efforts need to be harnessed and nurtured. (Ofir 2013:586) Prior to facilitating this diagnostic evaluation, we must confess that, like Ofir (2013:586) asserted, exactly what form these 'new theories and practices' in evaluation would take were similarly not 'quite clear'. Reflecting now on this evaluation, our contribution as evaluators to this process is perhaps just a little more lucid. In keeping with Ofir, we acknowledge how senseless it was to simply break down policies, programmes and interventions into simple parts, be they linear or otherwise. Rather, the solution lay in cutting across programmes and even intervention categories, to get to a clearer understanding of change trajectories as linked to diverse (and often overlapping) beneficiary types and specific root problems. For this, we require a fluidity and adaptability in the evaluation method that is grounded in an approach that required iterative feedback and collaboration between evaluators, and evaluation end-users. South Africa's DPME, as well as other African governments who have supported the roll out of NESs should be commended for supporting this type of evaluation, and for supporting it as not only a strategic priority, but also an essentially innovative endeavour.

Conclusion
Although our evaluation process has mapped out an alternative pathway forward for critically needed interventions directed towards smallholder farmers, the reality is that much work is still to be performed. The potential for this to happen remains within the boundaries of the government's commitment to a developmental state as documented in South Africa's National Development Plan. We are conscious that larger policy questions (about state involvement) could arise and that these could have major political implications. These questions were both beyond the mandate and scope of this evaluation and beyond our evidence base. We cannot pass judgement on this point, much less present recommendations. However, unless addressed, this issue will continue to undermine government's efforts to link evaluation more strongly to development and transformation objectives.