What works in attracting and retaining teachers in challenging schools and areas?

ABSTRACT This paper describes a systematic review of international research evidence identifying the most promising approaches to attracting and retaining teachers in hard-to-staff areas. Only empirical studies that employed a causal or suitable comparative design and had robust measurements of recruitment and retention outcomes were considered. Studies were assessed for strength of evidence taking into account threats to trustworthiness which may bias the results. A search of 13 electronic databases and Google/Google scholar identified 20 distinct research reports that met the inclusion criteria. Financial incentives was the only approach that seemsto work in attracting teachers to challenging schools, but not effective in retaining them. To keep teachers working in challenging schools a supportive and conducive working environment would be needed. Other approaches such as mentoring, support, or teacher development do not have strong evidence of effectiveness, largely because much of the research on these approaches was weak. More robust research capable of addressing causal questions is therefore urgently required to determine their impact in attracting and retaining good teachers in areas where they are most needed. Long-term solution would be to change school-allocation policies and improve economic conditions in such areas so that the problem of staffing does not arise.


Background
Education systems worldwide attempt to provide good quality education for their citizens, and this requires a supply of high-quality teachers. Supply has reportedly become more difficult in recent times because of challenges in recruiting and retaining teachers. Widespread media reports of teacher shortages both in England (Boffey & Helm, 2015;Hazell, 2018;Sky News, 2017) and the US (e.g. Garcia & Weiss, 2019) have dominated newspaper headlines in the last few years.
In England (Foster, 2019;TES Global, 2019) and the US (Sutcher et al., 2016) teacher shortages are predicted to get worse as pupil population is rising and more teachers in the profession are leaving before retirement. But for some schools and regions this overall shortage of supply is more serious because they are already facing great difficulties in attracting and retaining teachers by virtue of their undesirable location and student intake (House of Commons, 2017).
Attracting and retaining suitably qualified teachers in some subjects and geographical areas is a challenge common to the school staffing policies of many developed countries (European Commission/EACEA/Eurydice, 2018). Teacher shortages related to the remoteness of some regions are mentioned in half of the countries that participated in the European Commission survey. In other countries, it was the high cost of living and high proportion of disadvantaged pupils in some large urban cities (such as Brussels and London) that reportedly made it difficult to attract and retain teachers.
In England, there are geographical cold spots where schools are rated as least likely to have teachers in shortage subjects with a relevant degree. In coastal rural areas, which can be highly deprived, 7% of secondary teachers are unqualified, compared with 4.6% in more affluent inland rural areas (Social Mobility Commission, 2017). Regionally, the North East, West Midlands and East of England are less likely to have teachers with a relevant degree teaching shortage subjects compared to London. For example, only 17% of physics teachers in poorer schools outside London have a relevant degree, compared with 52% in affluent areas in the rest of the country (Sibieta, 2018).
In the House of Commons (2017) 5 th Report on teacher recruitment and retention, the government acknowledged that there were wide regional variations in teacher supply. While there have been plans to encourage more teachers to work in areas most in need, these have not been very successful. The pilot for a National Teaching Service, for example, which was set up to get teachers to teach in areas most struggling to recruit, had to be abandoned after managing to recruit only 54 of 1,500 intended teachers (Hazell, 2016).
Most education systems in the world use similar strategies in their efforts to attract and retain teachers in hard-to-staff schools and for some high demand subjects. Among these are financial incentives, such as bursaries and scholarships (e.g. DfE, 2019a). Since 2018/19, the government in England has been piloting the early-career payments for some shortage subject teachers and a student loan reimbursement for science and language teachers in some local authorities to incentivise such teachers to stay in the profession (Foster, 2019). Other strategies being used in many countries also include alternative certification and mentoring and induction or support for new teachers. For example, the DfE in England has introduced an Early Career Framework to be rolled out in September 2020 in some challenging areas (DfE, 2019b).
Most of these programmes have not been robustly evaluated. Although the use of monetary inducements has been tested in a number of studies, especially in the US, there is so far no synthesis of the research findings, so the evidence of their effectiveness is still unclear. These incentives are expensive and it would be a waste of taxpayers' money and the country's resources to continue using them if there is no evidence that they work. There is also an opportunity cost as the money used for these incentives could be otherwise channelled to more effective programmes. If they show promise it is important to know how they can best be implemented, and the extent to which they can be deployed in other countries facing similar challenges. It is therefore crucial that these strategies are robustly evaluated and tested before more money is spent on them worldwide.
As far as we know, this is the first large-scale comprehensive single-study review of teacher recruitment and retention policies to addressing teacher shortages in hard-tostaff areas/schools. Previous reviews often do not consider the reliability of the evidence and design of research in their evaluation (e.g. Wheeler & Glennie, 2007). This paper summarises the findings of a systematic review of empirical research to identify the most promising approaches in attracting and retaining teachers in hard-to-staff schools and areas.

Search strategy
This review was part of a larger review that addresses teacher recruitment and retention (R&R) issues in general. For this paper we identified and analysed those relevant to R&R in challenging areas and/or schools. Challenging areas refer to school districts or states (as in the US), remote rural areas where recruiting and retaining teachers have been difficult. Hard-to-staff schools refer to schools where teacher R&R have been difficult. These include, for example, schools in high poverty areas and low performing schools and schools where it is particularly difficult to attract and retain certain subjects.
The studies in this review were identified from a search of 13 educational, psychological and sociological electronic databases. These included: These were supplemented by studies known to us and following up on studies in a daisychain search of relevant studies mentioned in previous systematic reviews.
The search terms included teacher recruitment, teacher retention, teacher shortages, teacher supply and policy initiatives, incentives, approaches and schemes (and their synonyms). As the purpose of this review was to identify approaches that show evidence of impact only studies that employ a causal design were included (see Gorard, 2013 for a definition of a causal design). Therefore, the search terms also included any causal term (or a synonym) or any research design that would be appropriate for testing a causal model, such as experiments, quasi-experiments, regression discontinuity and differencein-difference. A scoping review was first conducted to test out the sensitivity of the search terms on well-known sociological, educational and psychological databases to ensure that the search terms picked up relevant pieces of literature and also known studies on this topic. Following this, a very general and inclusive statement of search terms was generated for each database. These were tested, adjusted and retested iteratively to ensure that as little as possible relevant material was missed. We modified the syntax for different databases to suit the idiosyncrasies of each, but used similar key words.
To determine the causal evidence of policies and initiatives on teacher recruitment and retention, we included only studies using experimental (e.g. randomised control studies) or quasi-experimental designs (e.g. regression discontinuity, matched comparison, difference-in-difference, longitudinal time-series analysis and instrumental variables) and largescale longitudinal studies, or similar.
The scoping review and previous reviews of literature suggested that there were few robust experimental evaluations of policy initiatives or approaches for teacher R&R. The decision was therefore made to include any empirical studies with at least some type of comparative design, but which would have low ratings for trustworthiness in terms of causal claims.
The search was limited to studies published or reported in the English language. We intentionally did not set any date limits, to keep the search open. To avoid publication bias, the search included any material published or unpublished that mentions both substantive and causal terms.
A total of 6,708 research reports were identified and exported to EndNote for screening. This review was completed at the end of 2018 and therefore would not include studies that come after 2018.
There is no one definition of teacher retention. While most studies considered retention as involving teachers staying within their current school, in others retention referred to teachers staying within the school district, the state, state-funded schools, or even within teaching as a profession. The same mix appears in claims about teacher wastage in England (See & Gorard, 2019). In this review, we included any studies that look at retention of teachers regardless of how this is defined.

Screening
Each identified study was first screened to remove duplicates, and for relevance on the basis of title and abstract. Only studies that related specifically to recruitment and retention for hard-to-staff areas/schools were retained. This process removed 6,161 studies, leaving 547 which were read in full.
We screened the studies applying the inclusion and exclusion criteria. Studies were included if they were: • Empirical research • About activities aimed at attracting people into teaching or about retaining teachers in teaching • Specifically about recruitment and retention of classroom teachers • About incentives/initiatives/policies or schemes on teacher recruitment and retention • About mainstream teachers in state-funded/government schools • Studies that had measurable outcomes (either retention or recruitment) • Studies that relate to mainstream education • Recruitment and retention of traditional shortage subject teachers (e.g. mathematics, science and design and technology) Studies were excluded if they were: • Not primary research • Not published or reported in English • Not actually a report of research at all • Simply descriptions of programmes or initiatives with no evaluation • Not about strategies or approaches to improve recruitment or retention of teachers (e.g. observational or correlational studies of factors influencing recruitment and retention) • Studies that have no clear evaluation of outcomes • Studies with non-tangible or measurable outcomes (e.g. teachers' attitude or beliefs or perceptions) • Ethnographic studies and narrative case studies • Opinion pieces, guidance briefs or manuals on how to attract and retain teachers • Outcome is not teacher recruitment or retention • Not about recruitment and retention of teachers • If it is specifically about school leaders, school administrators or teaching assistants • Outcome is about student achievement (e.g. Cowan & Goldhaber, 2016) • Not about mainstream teachers, e.g. special education teachers or ethnic minority teachers • Not relevant to the context of English speaking developed countries (e.g. Duflo et al., 2007) • Not relevant to the research questions • Anecdotal accounts from schools about successful strategies A large number of studies involving surveys or comparisons before and after with no comparison groups were eventually excluded because they do not add to the evidence base. There were also many studies about recruitment and retention initiatives and what some schools or school districts have adopted, but with no evaluation of the outcomes. These were excluded.
At this stage the full reports were skim-read by one researcher. A sample of 10 studies now thought not to meet the inclusion criteria were then reviewed by the other three members of the research team for consensus on their inclusion or exclusion decision.
Only 52 studies deemed to be relevant to the research question were retained.

Data extraction
Key information necessary for strength of evidence assessment was extracted from each of the included studies. Such information included the research design, sample size, allocation to groups, outcome measures, missing data, methods of analysis and the results. This process excluded a further 30 studies. These were narrative discussions of previous research or studies that merely asked respondents about strategies they thought worked or were important to them. Three reports were of different approaches to evaluating the same intervention by the same set of authors. These were treated as being one report here.

Evidence assessment
In total, 20 reports that were deemed to be both relevant and met our inclusion criteria were retained and strength of evidence assessed using the 'sieve' )a tool for judging the trustworthiness of research findings. This determines how much confidence could be placed on the findings, and is a necessary step to ensure that the evidence is trustworthy. If public investment is to be made on recruitment and retention programmes, it is crucial that the most robust evidence is given the most weight. This ensures that the synthesis is not misled by automatically giving equal weighting to good and weak studies. This step is important since much of education policy so far has been based on incorrect, misleading or incomplete evidence, which probably explains why some of the initiatives have not been successful in achieving their objectives. The strength of evidence was assessed based on five criteria (Table 1): the research design (e.g. whether the design was appropriate for a causal claim, such as an RCT with random assignment of cases and whether there was a comparator group), scale of the study (smallest cell size), level of attrition, validity of outcome measurement (e.g. administrative data versus teacher self-report) and other threats to validity (conflicts of interest). All such factors are important (Slavin & Smith, 2008). Each study was then assigned a score between 1* (the minimum standard to be given any weight, including some kind of comparison) and 4* (the most robust that could be expected in reality). Four-star studies are the most secure, meaning that the evidence is most reliable. We ignored the source of any publication, the reputation of its author/researcher or funder as any guarantee of research quality. Instead we judge the strength of evidence for each of the included studies by applying the sieve (Table 1). Note that throughout the paper we use the phrase 'quality' of studies to refer to the quality of evidence in establishing causality and not the quality of research.
The table is to be read from left to right starting with the strongest design, and down the row. The strongest evidence (rated 4 star) must have at least one comparison group and the groups being compared must be randomly allocated. If the sample size is large (this is arbitrary) but around 50-100 cases per group, then it stays as 4*, but if the groups are small-medium (i.e. less than 50 in each group) then it drops a star. But if the groups are very small e.g. randomly allocating two schools (one to intervention and one to control) it drops to 1* since the evidence will not be reliable as the 2 schools may be different in unobserved measures. The rating cannot go up the scale.
If the study starts with a large sample and the sample is randomly allocated to two groups, it starts with a 4*. If there is a high level of dropout or missing cases it drops to 3*. If the attrition is over 25% it drops to 2*. If 50% or more data is missing (e.g. non-response) then it will be given 1* as the groups can no longer be regarded as random or equivalent. If the study has a large randomly allocated sample with no or small attrition (4*), but the outcome is measured based on teachers' perceptions or teachers' report of intention to stay or leave, it drops to 2* as in reality teachers may or may not stay despite their reported intention.
A study that uses two comparable groups which are not randomly allocated, such as a difference-in-difference approach (e.g. comparing outcomes before and after intervention in one state in the US with another), will be rated 3*. A star is dropped if the sample is small, and two stars if the sample is very small. A further star is dropped if there is high attrition. Studies with no comparison groups at all and no before and after comparison will be rated 0. These are not discussed in this paper.
To ensure inter-rater reliability in evidence assessment, a sample of 10 studies were rated independently by the first three authors. Their ratings were compared and there was a 98% agreement. The disagreements were mostly with the 1* and 2* studies. In such cases, further reading of the full papers was carried out and discussed before a consensus was reached.
We would like to point out that this is a judgement of the trustworthiness of the evidence in establishing causal impact and not the quality of research. However good a piece of research protocol may be there are always compromises in real life studies. For example, interventions or programmes may not go as planned, people drop out or cannot be traced or randomisation of cases is not feasible for ethical or logistical reasons. Therefore, the ratings here reflect the evidence and not necessarily the quality of work. Also, our review aims to answer a causal question so studies using correlational design may be of high quality for a different question, but not as suitable for the review because it does not answer our research question.

Synthesis
The research reports were classified according to whether they were about recruitment, retention or both, and sorted according to the types of approaches. A broad classification of incentives/initiatives was created. These include financial incentives (e.g. signing bonuses, wage uplifts, scholarships and loans), other not directly financial incentives (e.g. housing benefits, retirements, pension, health care and child care benefits) and other non-financial incentives (e.g. alternative routes into teaching, staff development, mentoring and induction and workload reduction) or a combination.
Approaches with the most highly rated studies showing positive effects are considered the most promising. It has to be made clear that approaches with no evidence of impact does not mean that they are not effective, but rather that the existing evidence is such that its effectiveness cannot be determined.

The results
The 20 studies reported 26 individual outcomes relevant to either or both the recruitment and retention of teachers (Table 2). Most involved some kind of financial incentives, and on balance, such approaches appear to work. Many are from the US, while very few are from England.

Use of financial incentives
All the studies reported for recruitment that met our minimum criteria for inclusion were conducted in the US apart from one which was based in Brazil. The stronger studies suggest that offering financial incentives appear to work in attracting teachers to hard-to-staff schools and areas. Of the nine study outcomes that met at least our minimum criteria for a causal claim, three reported positive outcomes for the use of some kind of monetary inducements. These three were of a higher quality (i.e. 2* and above). All three were relevant to the US context. The highest quality study rated 3* (Hough & Loeb, 2013) showed positive outcomes for recruitment (Table 3), but not for retention. Hough and Loeb used a difference-in-difference approach to compare the recruitment and retention of 1,611 applicants in the San Francisco Unified School District which awards higher salaries/bonuses to teachers teaching shortage subjects and in schools with a high proportion of poor and ethnic minority students with teachers in different school districts before and after the introduction of the policy. These teachers were also given a retention bonus if they stayed on after four years and more after eight years. The results showed an increase in the proportion of shortage subject teachers in hard-to-staff areas from 27% to 37%. There was also an increase in the proportion of new hires in the targeted group (those that received the incentives) from 49% to 54%. However, there was no difference in the retention rates of targeted and non-targeted teachers. Over 90% of teachers stayed on in the district and over 85% stayed in their school, in both groups. This comparison is difficult because of the economic downturn in 2008 when unemployment was high. Such retention bonus might be more effective in a more competitive labour market. The other four lower rating studies were also carried out in the US. The two medium strength pieces (2*) also showed positive results for recruitment, but not for retention. Steele et al. (2010) evaluated the Governor's Teaching Fellowship (GTF) scheme, involving a 20,000 USD incentive to attract and retain new teachers to low-performing schools for four years. Teachers had to repay 5,000 USD for each year that they did not meet the commitment. An instrumental variable design was used, based on 718 GTF teachers, excluding those who could not be tracked, were missing data, or not enrolled at recognised institutions. GTF recipients were not randomly selected, and so may have had a predisposition to teach in low-performing schools. Twice as many teachers were enrolled during GTF as in the years before and after, and 28% more taught in low performing schools. It seemed that money was an attractor. However, there was no difference in retention rates (75% over four years) between recipient and non-recipients, despite the penalty clause. Glazerman et al. (2013) examined the impact of the Talent Transfer Initiative, which offered bonuses to the highest performing teachers for agreeing to move to and stay in low-performing schools. The incentive was 20,000 USD paid in instalments over a twoyear period. Teachers who were already teaching in low-performing schools received a 10,000 USD retention stipend if they remained in the school over the two-year period. The participants included 85 teacher pairs matched on school characteristics and randomised to intervention or not, across 114 elementary and middle schools. Because the teacher pairs changed their personnel between randomisation and the start of the school year, the two groups were no longer equivalent at the beginning of the study. Of the vacancies assigned to the scheme, 88% were filled, compared to 44% the year before, and 71% in the comparison group. Retention after one year was 93% (70% in the comparator group), and 60% after two years (compared to 51%). The results suggest that while the transfer incentive may have had a positive impact on teacher recruitment and retention during the payout period, the effect did not last once the payment stopped.
The weaker studies (in terms of design for a causal question) are more mixed in results. Fowler (2003) examined the Massachusetts Signing Bonus Program for New Teachers, offering a 20,000 USD bonus for highly qualified people switching careers to teaching. Recipients received training before being assigned to high-need schools, and provided  (2003) Dwinal (2012) with further training, support and mentoring. There was no explicit comparison group. The programme failed to recruit candidates from outside the area. Despite advertising across states, only seven candidates outside Massachusetts were recruited over four years. It could be that other states were also experiencing severe teacher shortages, and were offering higher salaries. The programme also failed to place all teachers in high-need schools (only 71% from the first cohort, and 48% and 35% in following years). Dropout among bonus recipients was higher than the national average (46% by the third year), and highest in the high-need districts (55%). A survey of head teachers suggests that bonus recipients on the scheme were more attracted to the fast-track scheme than the bonus incentive (Churchill et al., 2002). Evaluation of signing bonus incentives in general suggests that any effect tends to be short-lived (Choi, 2011). Goldhaber et al.'s (2010) analysis suggests that teachers will need to be paid more to get them to teach and stay in challenging schools. The study compared salaries in private and public schools using a combination of administrative datasets. The sample included 56,354 public school teachers and 10,760 private school teachers. The results showed that private schools with a high proportion of poor students paid their teachers 17% higher salaries than schools with an average number of poor students. This is more than the higher salaries that public school teachers were paid to teach in disadvantaged areas. Of course, there are other differences between the two sectors, such as working conditions, which should not be ignored.
In a longitudinal retrospective cohort study, Gordon and Vegas (2005) analysed the impact of a funding reform in Brazil which stipulated that at least 60% of additional funds be allocated to teacher wages. The reform saw an increase in the number of teachers (Castro, 1998;World Bank, 2002). However, this study may not be directly relevant to the US or UK context as the intervention coincided with major education reforms in Brazil which saw additional educational resources for some municipalities and the legislation that all teachers must be qualified. It is therefore difficult to attribute any causal effect. Before the funding reform there was already an increase in the number of teachers and a reduction in student:teacher ratios although not in the poorest areas. The impact of the programme is therefore difficult to assess.

Alternative routes into teaching
The evidence for alternative certification to get teachers to teach in hard-to-staff schools or areas is unclear. Only two studies met our inclusion criteria and rated at least 1*. Both were conducted on the US. The first is a case study of Teach for America (similar to England's Teach First scheme) in the rural Mississippi-Arkansas Delta region, an area of geographical isolation and a heavily ethnically segregated school population (Dwinal, 2012). The programme recruited high performing university graduates through an intensive selection process. Candidates are committed to teach for at least two years in state schools. The low response rates (under 20%) to interviews with principals, and a comparison between regions over time using the weaker measure of vacancy rates rather than number of teachers recruited, made it difficult to establish the impact on recruitment. There was no decrease in vacancy rates relative to other areas, partly because the programme imposed limits to the number of participants in each district (so directing them elsewhere). Retention rates for Teach for America teachers were also low as suggested by other studies (e.g. Clark et al., 2017;Decker et al., 2004;Glazerman et al., 2006, Henry et al., 2014Raymond et al., 2001).
The second study by Clewell and Villegas (2001) reported positive impact of the Pathways to Teaching Careers programme, which included the use of paraprofessionals and noncertified teachers, and Peace Corps Fellows. The paraprofessional and noncertified programmes involved identifying non-qualified staff already working in schools and offering them scholarships as well as other support services to help them obtain qualified teacher status. The Peace Corps Fellowship identifies and supports potential teachers from returning Peace Corps volunteers (similar to the Troops to Teachers programme in England). Fellows are placed in schools on a full-time contract and paid a salary where they work towards a teaching qualification. This was a six-year study which was largely based on self-report, with a high level of missing data. Only 44% reported where they were teaching initially, and only 31% after three years. Pathway teachers reported higher completion rates than traditionally certified teachers (75% to 60%). A high proportion (84%) ended up teaching in hard-to-staff schools and had better retention rates over three years compared to the national average (81% to 71%). This could be because Pathways teachers had to agree to continue teaching in the schools they were trained in for a specified period. Some were also already working in the schools.

Improving working conditions
We found only one study that met our inclusion criteria with minimum 1* rating. In this study, Waters-Weller (2009) explored the relationship between retention of teachers and the working conditions of schools (which they defined as reduction in class size and teaching load, plus more planning time) in Virginia, USA. This was an exploratory cross-sectional study looking at the relationship between the attitudes of teachers towards low socioeconomic status schools, and the kind of incentives likely to increase retention. The survey of 3,525 teachers in two urban districts only had a 29% response rate. The majority of teachers indicated that they would stay in their current school for the next year, including those who were in high poverty schools. They generally indicated that extra money for salaries and bonuses were not necessarily needed to keep them if the school had an excellent administrator, but money was an inducement to transfer to a poor school. The design of the study could not establish a causal link between improvement in working conditions or retention bonuses on retention, hence it was rated only 1* for strength of evidence.

Improving retention
There were 17 studies that examined the impact on teacher retention. Again, none were of the highest quality. All the stronger studies were conducted in the US and rated 2* and above suggested no lasting benefit from financial incentives for retention of teachers in hard-to-staff schools (e.g. Hough & Loeb, 2013;Steele et al., 2010;Glazerman et al., 2013;Fowler, 2003). These have been discussed under recruitment (above section). A further 11 studies dealt solely or mostly with retention of teachers in hard-to-staff schools (Table 4) all reported positive effects.

Use of financial incentives
Although Clotfelter et al. (2008), a 3* study, indicated a positive effect of financial incentives on retention, it concurred with the other studies above that incentives work only as long as they are available and once removed, they have no lasting effect. Clotfelter et al. (2008) examined the impact of the North Carolina annual bonus scheme on the retention of qualified mathematics, science and special education teachers in high poverty and challenging schools, using a difference-in-difference approach. Teachers received the bonus ($1,800 per year) for as long as they stayed in the eligible school. This was a reasonably wellconducted study, using administrative data for four years on public school teachers to estimate the likelihood of teachers leaving a particular school. The research compared hazard rates before and after the implementation of the bonus programme, eligible and ineligible teachers in the same schools, teachers in eligible schools and those in schools that narrowly missed out on being eligible. Teachers receiving the bonus were an estimated 15% less likely to leave at the end of the school year compared to other teachers in the same schools.
The other 2* studies suggested positive impact on teacher retention, but those from the US often involved a tie-in where teachers are committed to staying on if they receive the financial incentives. Fitzgerald (1986) looked at offering an annual stipend (of between 500 USD and 2,000 USD) to encourage teachers to teach in schools with a high proportion of pupils eligible for free or reduced lunches, in high priority areas in the US. The study used a difference-in-difference approach to compare the retention rates of teachers in 25 high priority schools with 25 high poverty control schools not receiving the stipend. The groups were similar in terms of pupil and teacher characteristics. Vacancies dropped in treatment schools in the first year, and the fall in retention rates was lower than for control schools (ES = +0.39).
In Norway, Falch (2010) examined the impact on the retention of teachers in highvacancy schools of paying teachers differential wages, using a difference-in-difference approach. In the period 1993/4 to 2002/3, Norway had a central wage system, but teachers in schools with high vacancies received a wage premium of between 7.5% and 12%. Over the nine years, schools were initially eligible if they had 20% more 'shortages' than the previous year. This increased to 30% for the 1996/7 and 1997/8, and then back to 20% for the last four years. In total, 161 schools received the wage premium at least once,   (2007) Fowler, ) Waters-Weller (2009 and in these schools the attrition rate of teachers was lower than comparison schools by 6%. The reporting of this study, however, was not clear, and the number of schools and teachers included varied considerably over time. This makes it difficult to judge the efficacy of the incentive. Feng and Sass (2018) considered the effects of the Florida Critical Teacher Shortage Program 1986 to 2011, on the retention of teachers in shortage subject areas (mathematics, science and special education). Early career teachers were offered loan forgiveness of up to 10,000 USD to pay off their student loan but only if they taught in a shortage subject for at least 90 days. There was a recruitment bonus for new teachers, of up to 1,200 USD (to cover removals or equipment), and a retention bonus of up to £1,200 if teachers continued to teach a shortage subject the next year. Since subjects designated as shortage changed over time, the teachers eligible for these incentives also changed over time. These variations were used to compare bonus recipients with non-recipients, in terms of recruitment and attrition using a proportional hazard model, taking into account student demographics, pupil prior behaviour, prior achievement, class size, teacher gender, race/ethnicity, salary base and experience. Loan forgiveness reportedly had a positive effect on the likelihood of teachers staying in teaching the following year, reducing attrition by 12%. The one-time retention bonus for shortage subject teachers also reduced the likelihood of teachers leaving by 25%, but not once funding was removed.
Three studies have common authors and so they are treated as one complex study. Fulbeck (2011) evaluated the impact of ProComp (Professional Compensation for Teachers) -a teacher incentive programme in Denver -including 10 financial incentives. School-based incentives were awarded to teachers who teach at schools serving lowincome students and high performing schools and schools that make the most progress in mathematics and reading. Eligibility was restricted to those who were members of teacher unions not working in Charter schools. The total number of teachers included in the retention analyses was 4,145, representing 91% of all Denver Public School District teachers. This study employed interrupted time-series and difference-in-difference regression models. The average change in retention rate was −0.06% before ProComp and +1.5% afterwards, and participation in ProComp increased retention rates by 2.1 percentage points. It was more effective in hard-to-staff schools (ES 0.25) compared to others (ES 0.08). Retention was higher in high poverty schools where teachers were eligible to receive a financial incentive to stay. Fulbeck and Richards (2015) looked at all 7,333 public school teachers in Denver from 2006 to 2010 who were eligible for the ProComp incentive (regardless of whether they did receive it) and who made at least one voluntary move within the district (989). The incentive tended to attract teachers to high growth and high performing schools, and was less successful for schools with a high proportion of low income pupils. A limitation of the study is its inability to take account of other factors that may over-estimate the effect of financial incentives, such as principal's hiring preferences and the actual school vacancies advertised. Fulbeck (2014) looked at participation in ProComp and teacher mobility in high poverty areas, using longitudinal teacher-level data from 2001/2 to 2010/11, and comparing teachers who received ProComp with those who did not, and those who taught in high poverty schools with those who did not. Teachers working in high poverty schools were more likely to move but the odds of leaving the district (and so losing the incentive) were lower for ProComp teachers than for others. The study suggests that the incentive alone was not enough to compensate for poor working conditions, issues with school leadership and school climate.
The 1* study, Colson and Satterfield (2018) tested the effects of a teacher compensation plan, known as the Innovation Acceleration Fund, on the retention of SEN (special educational needs), mathematics, science and language teachers in a small rural district. This was a merit pay system, paying teachers deemed effective based on the contentious Tennessee Value-Added Assessment System. The total potential population was reported as 134. Of these, 93 volunteered for the compensation scheme. Teachers who did not want to have individual teacher effect results were excluded. Only 56 of these were deemed effective. Around 80% of teachers who participated in the compensation scheme were retained compared to 70% who did not participate. The report does not include effect sizes, and the design means that volunteers were compared with non-volunteers, hence the 1*.

Mentoring, support and induction
Five studies looked at the impact of some kind of early support system for new teachers, either through mentoring, induction or teacher preparation. All reported positive results for retention, but they were all lower quality studies rated 1* (with only one rated 2* - Gold, 1987). The effectiveness of such programmes therefore cannot be determined. Gold (1987) evaluated the New York City Retired-Teachers-as-Mentors Program by comparing mentees with a comparison group of non-mentored teachers. The programme recruited retired teachers as mentors for new in-service teachers. The study used Board of Education records and questionnaires completed by teachers, mentors and principals. The results showed that retention rates went up for all, but the rates were higher for the mentored teachers (85% and 80% in the second year). It is not clear whether mentors were randomised to new teachers in eligible schools, and no account was taken of missing data in the analysis.
An evaluation of a mandatory mentoring system for new teachers in a rural school district in North Carolina (Anthony, 2009) reported an increase in teacher retention (defined as the proportion of teachers retuning each year to the school system). Both mentors and mentees were given training. Data on retention was taken from the school system database. The proportion of teachers returning to the school system increased each year from 84% in 2005/6 before the programme to 92% in 2007/8. There was, however, no counterfactual as part of this study, and it is therefore a very weak study for a causal question.
Positive results on retention were also reported for a statewide program known as the Texas Beginning Educator Support programme which offers instructional support and mentoring for beginning teachers (Fuller, 2003). Although this was a state-wide programme, participation was selective, and it is unclear how selection was organised. Using the state personnel database, the study compared the retention rates of beginning teachers who participated in the scheme with those not participating, from 1999/2000 to 2002/03. The participants had higher retention, but this could be at least partly due to the prior selection process. Helfeldt et al. (2009) described a four-year internship programme aimed at retaining new teachers in high-need urban schools. Interns were paid, with full teacher benefits, and worked as full-time regular teachers in the classroom. They were assigned an approved trained mentor, and 8,000 USD from the intern's salary was paid towards this mentoring scheme. The sample only included 38 interns and 8 mentors, and the bulk of the analysis concerned participant perceptions of the programme. The programme was reported as effective in retaining teachers in high-need urban schools with 100% of teachers staying on in teaching one year later, compared to state retention of 81%.
In another study, also based in the US, Lyons (2007) evaluated a teacher preparation programme where participants were volunteers, selected for their commitment to the goals of the programme. Unfortunately, much of the reporting is unclear. Findings suggest that teachers exposed to all programme components were less likely than the national average to leave classroom teaching after a year in a high poverty school.

What is the most promising approach to recruiting and retaining teachers in hardto-staff schools/areas?
Most of the work described here concerns financial incentives of some kind. Looking at the number of positive studies with higher evidence rating, financial incentives appear to be a promising approach in recruiting teachers. Offering remission of student loans, higher salaries or premiums for teaching in hard-to-staff areas and schools is effective in attracting teachers. However, it is not clear that such external motivation is desirable, or attracts the best teachers, and it is quite clear that the attraction is not lasting.
The stronger studies indicate that financial incentives are effective in retaining teachers in hard-to-staff areas, but only when there is a kind of tie-in involved where teachers are committed to staying on in the school or district for a specified period or else incur a penalty. The impact disappears once the incentive is removed. For example, in the US retention is associated with receipt of the incentive (e.g. Clotfelter et al., 2008;Glazerman et al., 2013;Hough & Loeb, 2013;Steele et al., 2010) and teachers are committed to stay in the challenging school/area or continue teaching shortage subjects for a specified period. In Norway, teachers in high vacancy schools receive a wage premium, but lose this once they move to a low vacancy school. In contrast, recruitment and retention schemes in the UK often do not have such stipulations. This is probably why bursary schemes, for example, have not been successful (Worth et al., 2018). The DfE's own analysis suggests that the proportion of bursary holders in state-funded teaching was lower than nonbursary holders. It has to be mentioned that bursary holders are trainees in shortage subjects while non-bursary holders are not. This indicates that the busaries are not attracting shortage subject teachers to state-funded schools. To address this, the government is piloting a phased bursary for mathematics teacher trainees with a lower bursary upfront and two additional years of payments once in teaching to encourage them to stay in teaching for at least two years after they qualify (DfE, 2019a(DfE, , 2019b. We do not yet know if this strategy works and if the amount of the incentive is sufficient to keep beginning shortage subject teachers in teaching in challenging schools.
The use of financial incentives is usually premised on the assumption that if sufficiently well compensated, people can be enticed to go into teaching or be persuaded to stay on in teaching. The evidence in this review suggests that on its own monetary inducements are clearly not enough to keep teachers in challenging schools or in difficult areas.
The question is how much is enough to compensate for working in these challenging schools and areas DeFeo et al. (2016) estimated that the differential to compensate for factors that might make a community or school more or less attractive ranged from 0.85 to 2.01, with remote rural communities having higher differentials. Other studies suggest that such salary compensation only had a short-term effect (Bueno & Sass, 2016). Working and living conditions, and a lack of community engagements were reported to be important factors in teachers' decision to stay or leave (Fulbeck, 2014;Goldhaber et al., 2010;Waters-Weller, 2009). It therefore behoves that such financial compensations should be accompanied by improvements in the working conditions for their effects to be sustained.

Is there evidence that approaches other than financial incentives work in recruiting and retaining teachers in challenging schools?
There is no good evidence yet that other approaches such as mentoring and induction, teacher development and alternative routes into teaching work for recruitment and retention, in high-need areas. The evidence for these programmes is mixed and unclear. The strongest studies find little or no impact. The positive studies often have a mix of activities in the intervention making it difficult to attritbute support as the active ingredient in any success, and some have intention outcomes (rather than actual figures on attrition).
A number of studies also looked at 'grow your own' (training and the recruiting from local community), but none of these could establish causation. Almost all these studies were based on stakeholders' anecdotal reports of successful practice in their own school or district. Most of the research we found was very weak in design, and all of the stronger evidence work involved easier-to-measure, more concrete strategies (such as financial incentives). The absence of evidence should not be taken as evidence of absence. What is needed now is for research in this area to use stronger designs that can address the causal answers to these questions.

Conclusion
More research with the kind of designs needed to address causal issues is urgently required to cover mentoring, support, training for teaching in difficult schools, and a host of other alternative approaches that could be combined with financial interventions to attract good teachers and then keep them where they are needed most. In the medium to longer-term a more comprehensive approach would be to change school allocation and economic policies so that there were no longer such clearly defined schools and areas with high levels of poverty , meaning that these schools would not be as hard to staff, even though some would remain geographically isolated.
We recognise that in any review of this scale some studies may have been missed, and new and more robust studies may be conducted in the future. This may alter the findings of our review, but given the evidence available at the time of this review the strongest evidence is for financial incentives.

Disclosure statement
No potential conflict of interest was reported by the authors.

Funding
The project was funded by the Economic and Social Research Council [ES/R007349/1].

Notes on contributors
Beng Huat See is Associate Professor at Durham University. She is a Fellow of the Higher Education