How AI Helps Kids Learn: Evidence, Risks & Strategy

Executive Summary

The question "how can AI help kids learn new things" has moved beyond speculation. A growing body of rigorous evidence, including peer-reviewed randomized controlled trials (RCTs) from Harvard, a joint Google and Eedi study of LearnLM, Stanford's Tutor CoPilot trial, and a World Bank evaluation in Nigeria, demonstrates that well-designed, pedagogically-tuned AI tutoring can produce real and sometimes large learning gains. These same studies, alongside cautions from Microsoft Research, Carnegie Mellon University, MIT, RAND, and UNESCO, make clear that benefits are conditional: they depend on careful design, human oversight, and institutional guidance that is currently lagging adoption.

At a Glance

Key finding: In a Harvard physics RCT, students learned more than twice as much in less time with an AI tutor versus an active-learning classroom, while being more engaged (Source: Scientific Reports / Nature, 2025).
Key finding: A World Bank RCT in Edo, Nigeria measured a 0.23 standard deviation gain in English, equating to roughly 1.5 to 2 years of business-as-usual schooling (Source: World Bank, 2025).
Key opportunity: Teachers who use AI weekly save an average of 5.9 hours per week, the equivalent of six weeks across a school year (Source: Gallup, 2025).
Key risk: Knowledge workers with higher confidence in generative AI engaged in significantly less critical thinking, offloading cognitive effort to the technology (Source: Microsoft Research / CMU, 2025).
Key risk: Guidance lags adoption; only 35% of district leaders reported providing students AI training as of spring 2025 (Source: RAND, 2025).
Key gap: The strongest causal evidence concerns older students and young adults; direct, long-term causal evidence for young children remains thin.

Strategic implications. The evidence supports a measured, design-led deployment of AI tutoring rather than either blanket enthusiasm or blanket prohibition. The interventions that worked shared two traits: they were pedagogically fine-tuned (not raw chatbots) and they kept humans in the loop. Decision-makers in education systems, edtech firms, and ministries should treat human oversight, integrity safeguards, and equity provisioning as non-negotiable preconditions, not afterthoughts. (Interpretation.)

Background & Context

The aspiration behind AI tutoring has a clear historical anchor. Benjamin Bloom's "2 sigma problem" - the observation that one-on-one tutoring can raise the average student roughly two standard deviations above conventional classroom instruction - is frequently cited as the benchmark that AI tutoring aspires to scale affordably (Source: Studient, 2025; verification: interpretation). The core promise is that software can approximate the personalization of a human tutor at a fraction of the marginal cost, thereby reaching children who could never afford private instruction.

Adoption among children and teens is no longer marginal. According to Pew Research Center, the share of U.S. teens aged 13 to 17 who say they have used ChatGPT to help with schoolwork doubled from 13% in 2023 to 26% in 2024 (Source: Pew Research Center, 2025). A later survey reported by GovTech placed the figure even higher, with about 54% of U.S. teens saying they use AI chatbots to help with schoolwork (Source: GovTech, 2026; verification: partially-verified). Whatever the precise number, the trajectory is one of rapid mainstreaming.

The commercial layer has matured in parallel. Duolingo launched its "Max" subscription in March 2023, using OpenAI's GPT-4 to power features including "Roleplay" (conversational practice with characters) and "Explain My Answer" (context-specific feedback on mistakes) (Source: Duolingo, Inc., 2023). Platform-scale players such as Khan Academy report reaching very large learner populations, though, as discussed below, the headline figures originate from the company and advocacy partners rather than independent peer review.

Crucially, institutional governance has begun to form around this expansion. UNESCO issued its "Guidance for generative AI in education and research," calling for a human-centered approach and appropriate regulation as AI tools enter classrooms (Source: UNESCO, 2023). The OECD has separately published guidance on leveraging AI to support students with special education needs (Source: OECD, 2025). The policy posture is converging on a recurring theme: capability is racing ahead of guidance.

Evidence Review

This section presents the verified and partially-verified findings that bear directly on whether and how AI helps children learn. The table below summarizes the core causal and survey evidence, followed by detail on the most consequential studies.

Claim	Source	Date	Confidence	Verification
LearnLM students 5.5 pts more likely to solve novel problems (66.2% vs 60.7%)	LearnLM Team, Google & Eedi (arXiv)	2025	High	Verified
Expert tutors approved 76.4% of LearnLM messages with zero/minimal edits	LearnLM Team, Google & Eedi (arXiv)	2025	High	Verified
AI tutor: students learned 2x as much in less time vs active learning	Kestin et al., Scientific Reports (Nature)	2025-06-03	High	Verified
Nigeria RCT: 0.23 SD English gain, ~1.5-2 years of schooling	World Bank	2025-12-11	High	Verified
Tutor CoPilot: +4 pts math mastery; +9 pts for lower-rated tutors	Stanford / NSSA	2024	High	Verified
Teachers using AI weekly save 5.9 hours/week	Gallup	2025-06-25	High	Verified
Only 35% of districts train students on AI; 80%+ students untaught	RAND Corporation	2025-09-30	High	Verified
Teen ChatGPT-for-schoolwork use doubled 13%→26% (2023-2024)	Pew Research Center	2025-01-15	High	Verified
Higher AI confidence linked to less critical thinking (319 workers)	Microsoft Research / CMU	2025-04	High	Verified
ChatGPT essay-writers showed lowest brain engagement (EEG)	MIT Media Lab (via TIME)	2025	Medium	Partially-verified
96% of instructors believe students cheated (up from 72% in 2021)	John Wiley & Sons	2024-07-29	High	Verified
Khan Academy: +18 hrs/60 skills nearly doubles NJSLA gains	Khan Academy Blog	2026-02-07	Medium	Partially-verified
Khan Academy: every 10 skills ≈ +1 point on PARCC	Khan Academy Blog	2025-11-21	Medium	Partially-verified
Personal AI tutor case study: up to +15 percentile points	arXiv case study	2023	Medium	Partially-verified

The strongest causal evidence

Four studies anchor the credible case for AI-assisted learning. First, the Harvard physics department RCT found that students learned more than twice as much in less time with an AI tutor compared with an active-learning classroom, while also being more engaged and motivated (Source: Scientific Reports / Nature, 2025). This is a high-confidence, peer-reviewed result, and it directly counters the assumption that AI tutoring is merely a convenience rather than an efficacy lever.

Second, the exploratory RCT across five UK secondary schools (N=165) using Google's pedagogically fine-tuned LearnLM found that students guided by the AI were 5.5 percentage points more likely to solve novel problems on subsequent topics (66.2% versus 60.7%) than those tutored by human tutors alone (Source: LearnLM Team, Google & Eedi, arXiv, 2025). Equally important for risk assessment, supervising expert tutors approved 76.4% of LearnLM's drafted tutoring messages with zero or minimal edits, indicating reliable pedagogical instruction with low risk of unsafe or hallucinated content (Source: LearnLM Team, Google & Eedi, arXiv, 2025).

0.23 SD - English learning gain in the World Bank Nigeria RCT, equating to roughly 1.5 to 2 years of business-as-usual schooling, ranking it among the most cost-effective education interventions (Source: World Bank, 2025).

Third, the World Bank randomized evaluation of an LLM-based after-school tutoring program in Edo, Nigeria measured a 0.23 standard deviation gain in English, with learning gains equating to roughly 1.5 to 2 years of business-as-usual schooling, situating the intervention among some of the most cost-effective education programs (Source: World Bank, 2025). This matters because it provides causal evidence from a low-resource, developing-country context, expanding the geographic and economic relevance of the findings.

Fourth, the Stanford RCT of Tutor CoPilot, involving over 700 tutors and 1,000 students from underserved communities, found that students whose tutors used the AI tool were 4 percentage points more likely to master math topics, with the largest gains (+9 percentage points) accruing to students of lower-rated tutors (Source: Stanford / NSSA, 2024). This is a human-AI augmentation design - AI assisting human tutors rather than replacing them - and the differential benefit to lower-rated tutors is a distinctive signal about leveling effects.

Teacher productivity and access

A Gallup-Walton Family Foundation poll of over 2,200 teachers found that those who use AI tools at least weekly save an average of 5.9 hours per week, amounting to six weeks over the course of a school year that can be reinvested elsewhere (Source: Gallup, 2025). The reinvestment of teacher time is an indirect but material channel through which AI can help children learn: time freed from administrative and preparatory tasks can flow toward direct instruction and individual attention.

Inclusion and special education needs

The evidence for accessibility is promising but less causally rigorous. The University of Virginia Teaching Hub reports that text-to-speech is a prominent use of generative AI by students with disabilities, and that AI tools can support learners with invisible disabilities such as ADHD, dyslexia, and dysgraphia by making content more accessible (Source: University of Virginia Teaching Hub; verification: partially-verified). A study of an AI-driven individualized learning plan delivered via a point-of-care digital platform (CognitiveBotics) was evaluated for children with Autism Spectrum Disorder, using interactive videos, chatbots and AI games tailored to each child with continuous capture of attention and retention (Source: European Psychiatry / NCBI PMC, 2024; verification: partially-verified). As noted in the gaps, this autism study was a small single-arm pre/post evaluation without a control group, limiting causal claims.

Platform-reported evidence (treat with caution)

Several widely-circulated figures come from platforms or advocacy partners rather than independent peer review. A three-year panel study in Newark, NJ (with teacher, student and year fixed effects) found that increasing Khan Academy use by 18 hours or 60 skills mastered year over year can almost double the average year-to-year state test (NJSLA) gains (Source: Khan Academy Blog, 2026; verification: partially-verified). A related 3-year efficacy study estimated that every 10 new skills a student learns leads to about a one-point gain on New Jersey's PARCC test (Source: Khan Academy Blog, 2025; verification: partially-verified). These are correlational panel studies with fixed effects rather than randomized trials, so causal interpretation should be cautious. Advocacy reporting separately claims the platform reaches over 150 million learners with 50+ efficacy studies showing students progressing 30-50% faster (Source: Stand Together, 2025; verification: unverified). A separate arXiv case study of a personal AI tutor using GPT-3-generated microlearning reported active engagement led to an average improvement of up to 15 percentile points compared with a parallel course without the AI tutor (Source: arXiv, 2023; verification: partially-verified).

The risk evidence

The countervailing evidence is substantial. A Microsoft Research and Carnegie Mellon University study (CHI 2025) of 319 knowledge workers found that those who placed higher confidence in generative AI engaged in significantly less critical thinking, effectively offloading cognitive effort to the technology (Source: Microsoft Research / CMU, 2025). An MIT Media Lab study using EEG found that participants who used ChatGPT to write essays showed the lowest brain engagement of three groups and consistently underperformed at neural, linguistic, and behavioral levels (Source: TIME, 2025; verification: partially-verified). As noted in the gaps, the MIT study is a small preprint (about 54 participants) and details circulating online are inconsistent and should be checked against the primary paper.

On academic integrity, a Wiley survey found that nearly all instructors (96%) believed at least some of their students cheated in the past year, a sharp rise from 72% in Wiley's 2021 survey (Source: John Wiley & Sons, 2024). For early childhood, researchers warn that excessive screen time among young children is linked to delayed language development, reduced executive function, and disruptive behaviors, a key concern for AI tools aimed at the youngest learners (Source: Frontiers in Education, 2026; verification: partially-verified).

AI Intelligence Analysis

The evidence resolves into a clear and important pattern: the design of the AI system, not the mere presence of AI, determines whether children learn more. The two strongest positive results - Harvard and LearnLM - both involved systems engineered for pedagogy. LearnLM was explicitly described as "pedagogically fine-tuned," and its 76.4% expert-approval rate for drafted messages is a direct measure of pedagogical reliability (Source: LearnLM Team, Google & Eedi, arXiv, 2025). By contrast, the risk evidence - Microsoft/CMU's cognitive offloading and MIT's low brain engagement - emerged from general-purpose chatbot use for tasks like essay writing, where the AI substitutes for the cognitive work rather than scaffolding it.

Strategic insight: The decisive variable is whether the AI does the thinking for the child or makes the child do the thinking. LearnLM and Tutor CoPilot show gains because they scaffold problem-solving and keep humans supervising; the offloading studies show harm because the AI produces the output the learner was supposed to generate. "AI for learning" and "AI as answer machine" are different products with opposite cognitive effects. (Interpretation.)

A second pattern is the leveling effect. Tutor CoPilot delivered its largest gains (+9 percentage points) to students of lower-rated tutors (Source: Stanford / NSSA, 2024), and the World Bank's Nigeria result came from an underserved context (Source: World Bank, 2025). This suggests AI tutoring may compress quality variance by raising the floor - bringing less-expert instruction closer to expert instruction. (Interpretation.) If this signal holds at scale, the policy value of AI tutoring is greatest precisely where human tutoring capacity is weakest.

A third pattern is the human-in-the-loop dependency. The safest, most reliable result (LearnLM) was one where expert tutors supervised AI output, and the largest augmentation result (Tutor CoPilot) was one where AI assisted human tutors. The strongest negative results occurred in unsupervised, individual use. Hypothesis: the efficacy and safety of child-facing AI may be a function of the degree of human oversight built into the workflow, with fully autonomous child-AI interaction sitting at the higher-risk end of the spectrum.

The emerging signal that should most concern decision-makers is the divergence between adoption velocity and guidance maturity. RAND data show that as of spring 2025 only 35% of district leaders reported providing students with training on AI, and over 80% of students said teachers did not explicitly teach them how to use AI for schoolwork (Source: RAND, 2025). Yet teen adoption doubled to 26% in a single year per Pew, and a later survey put usage at 54% (Sources: Pew Research Center, 2025; GovTech, 2026). The result is that the majority of children are using AI for schoolwork in an environment where the cognitive-offloading risks documented by Microsoft/CMU and MIT are most likely to materialize, and where the pedagogical scaffolding that made LearnLM and Harvard succeed is largely absent. (Interpretation.)

The integrity data reinforce this. The jump in instructors believing students cheated, from 72% to 96%, coincides with the generative AI era (Source: John Wiley & Sons, 2024). Interpretation: unsupervised generative AI access without integrity redesign converts a learning tool into a completion tool, which is the offloading failure mode in institutional form.

Competitive & Market Implications

Industry and competitor landscape

The market is bifurcating into pedagogically-engineered systems and general-purpose chatbots repurposed for study. Google's LearnLM represents the engineered-pedagogy track, with verified RCT evidence and a high expert-approval rate (Source: LearnLM Team, Google & Eedi, arXiv, 2025). Duolingo occupies the consumer-application track, embedding GPT-4 into Roleplay and Explain My Answer features (Source: Duolingo, Inc., 2023). Khan Academy sits in the platform-scale track with large reported reach but evidence that is platform-generated and correlational rather than independently peer-reviewed (Sources: Khan Academy Blog, 2025-2026; Stand Together, 2025). Interpretation: the competitive moat is shifting from raw model access toward verifiable pedagogical efficacy and demonstrable safety, since foundation models are increasingly commoditized.

Customer and institutional impact

For schools, the most immediately bankable value proposition is teacher time. The Gallup finding of 5.9 hours saved per week for weekly AI users is a concrete operating-efficiency case that does not depend on contested learning claims (Source: Gallup, 2025). For families and students, the value proposition is personalization and access. For ministries and NGOs in resource-constrained settings, the World Bank's cost-effectiveness framing is the headline: an intervention ranked among the most cost-effective at delivering 1.5 to 2 years of learning (Source: World Bank, 2025).

Geographic implications

The Nigeria result is strategically significant because it demonstrates efficacy outside affluent, high-connectivity contexts (Source: World Bank, 2025). However, equity advocates caution that AI tutoring access alone will not close achievement gaps, because home connectivity and structural decisions limit who benefits, raising digital-divide concerns (Source: Valere, 2026; verification: opinion). Interpretation: the same technology that can level instructional quality can simultaneously widen access gaps if deployment ignores connectivity and device disparities. The geographic dividend is conditional on infrastructure provisioning.

Regulatory environment

The presence of UNESCO guidance and OECD guidance signals that the regulatory perimeter is forming around human-centered design and inclusion (Sources: UNESCO, 2023; OECD, 2025). Interpretation: vendors that pre-align with human-oversight and data-privacy expectations will face lower future compliance friction than those optimizing purely for engagement metrics.

Scenario Analysis

Scenario 1: Optimistic - Pedagogy-led scaling

Assumptions: Education systems adopt pedagogically fine-tuned, human-supervised AI tutoring (LearnLM-style) at scale; teacher time savings are reinvested into instruction; integrity and guidance frameworks mature alongside adoption.

Supporting evidence: Harvard's more-than-double learning gains (Source: Scientific Reports / Nature, 2025); LearnLM's 5.5-point novel-problem advantage and 76.4% expert approval (Source: LearnLM Team, Google & Eedi, arXiv, 2025); Nigeria's 0.23 SD gain (Source: World Bank, 2025); Tutor CoPilot's leveling effect (Source: Stanford / NSSA, 2024); Gallup teacher time savings (Source: Gallup, 2025).

Confidence level: Medium. The efficacy evidence is strong, but it is concentrated in older students and specific contexts, and depends on disciplined, design-led deployment that is not yet the norm.

Scenario 2: Neutral - Uneven, two-track adoption

Assumptions: Engineered tutoring delivers gains in well-resourced or well-designed programs, while the majority of children continue using general-purpose chatbots without scaffolding or instruction; guidance partially catches up but lags in many districts.

Supporting evidence: RAND's guidance gap (35% district training; 80%+ students untaught) (Source: RAND, 2025); Pew and GovTech adoption rising to 26-54% (Sources: Pew Research Center, 2025; GovTech, 2026); Wiley's integrity strain (Source: John Wiley & Sons, 2024). The coexistence of strong efficacy studies and strong risk studies makes a mixed outcome the modal expectation.

Confidence level: High. This scenario is the straight-line extrapolation of current adoption-versus-guidance dynamics.

Scenario 3: Risk - Cognitive offloading and equity erosion

Assumptions: Unsupervised chatbot use dominates; children offload thinking; critical-thinking and integrity harms accumulate; the digital divide widens; early-childhood screen-time harms surface where age-inappropriate tools are deployed.

Supporting evidence: Microsoft/CMU cognitive offloading (Source: Microsoft Research / CMU, 2025); MIT low brain engagement (Source: TIME, 2025; partially-verified); teacher concern that AI is impacting critical thinking (Source: Axios, 2025; verification: opinion); early-childhood screen-time harms (Source: Frontiers in Education, 2026; partially-verified); equity warnings (Source: Valere, 2026; opinion).

Confidence level: Medium. The mechanisms are evidenced, but long-term effects on retention, motivation, and critical-thinking development beyond short study windows are not yet well established, so the magnitude of realized harm is uncertain.

Strategic Recommendations

Priority 1: Mandate pedagogically-tuned, human-supervised AI rather than open chatbot access

Action: Education systems and vendors should require that child-facing AI tutoring be pedagogically fine-tuned and operate with human oversight, mirroring the conditions under which LearnLM and Tutor CoPilot succeeded.

Expected impact: High. This is the configuration that produced the verified gains (Harvard, LearnLM, Tutor CoPilot) and the high safety signal (76.4% expert approval).

Risk level: Low to medium - primarily implementation and procurement cost.

Confidence score: High, grounded in multiple verified RCTs (Sources: Scientific Reports / Nature, 2025; LearnLM Team, Google & Eedi, arXiv, 2025; Stanford / NSSA, 2024).

Priority 2: Close the guidance gap with explicit AI-use instruction and integrity redesign

Action: Districts should provide students structured training on how to use AI for learning, and redesign assessments to reward reasoning rather than output, directly addressing the RAND guidance gap and the Wiley integrity surge.

Expected impact: Medium to high. Teaching learners to use AI as a scaffold rather than an answer machine targets the exact mechanism (cognitive offloading) flagged by Microsoft/CMU and MIT.

Risk level: Low.

Confidence score: High on the problem (RAND, Wiley verified); medium on the intervention's effect size, since direct evidence that training reverses offloading is not in the supplied evidence (Interpretation) (Sources: RAND, 2025; John Wiley & Sons, 2024; Microsoft Research / CMU, 2025).

Priority 3: Provision for equity and protect young children

Action: Pair any AI tutoring rollout with connectivity and device provisioning to avoid widening the digital divide, and apply age-appropriate screen-time safeguards for early-childhood deployments, aligning with UNESCO and OECD guidance.

Expected impact: Medium. Equity provisioning converts the leveling potential (Tutor CoPilot, Nigeria) into realized gains for those who need it most; screen-time safeguards mitigate documented early-childhood harms.

Risk level: Medium - depends on funding and infrastructure.

Confidence score: Medium. Equity and screen-time concerns are evidenced at opinion-to-partially-verified levels (Sources: Valere, 2026; Frontiers in Education, 2026; UNESCO, 2023; OECD, 2025).

Decision framework: Design for scaffolding -> Keep humans in the loop -> Teach AI literacy -> Provision for equity -> Measure long-term effects. (Interpretation.)

Sources & Evidence Appendix

LearnLM Team, Google & Eedi - "AI tutoring can safely and effectively support students: An exploratory RCT in UK classrooms," arXiv, 2025. Verified. Link
Kestin et al. - "AI tutoring outperforms in-class active learning: an RCT," Scientific Reports (Nature), 2025-06-03. Verified. Link
World Bank - "From Chalkboards To Chatbots: Evaluating the Impact of Generative AI on Learning Outcomes in Nigeria," 2025-12-11. Verified. Link
Khan Academy Blog - "Multiple Studies Show Khan Academy Drives Learning Gains," 2026-02-07. Partially-verified. Link
Khan Academy Blog - "Khan Academy Improves State Test Scores: Results from New 3-Year Efficacy Study," 2025-11-21. Partially-verified. Link
Stanford / National Student Support Accelerator - "Tutor CoPilot: A Human-AI Approach for Scaling Real-Time Expertise," 2024. Verified. Link
Gallup - "Three in 10 Teachers Use AI Weekly, Saving Six Weeks a Year," 2025-06-25. Verified. Link
RAND Corporation - "AI Use in Schools Is Quickly Increasing but Guidance Lags Behind," 2025-09-30. Verified. Link
Pew Research Center - "About a quarter of US teens have used ChatGPT for schoolwork," 2025-01-15. Verified. Link
Microsoft Research / Carnegie Mellon University - "The Impact of Generative AI on Critical Thinking" (Lee et al., CHI 2025), 2025-04. Verified. Link
TIME (MIT Media Lab study) - "ChatGPT's Impact On Our Brains According to an MIT Study," 2025. Partially-verified. Link
Duolingo, Inc. - "Duolingo Max Shows the Future of AI Education," 2023-03-14. Verified. Link
University of Virginia Teaching Hub - "Using AI to Support Students with Disabilities." Partially-verified. Link
OECD - "Leveraging AI to support students with special education needs," 2025-09-24. Verified. Link
John Wiley & Sons - "AI Has Hurt Academic Integrity in College Courses but Can Also Enhance Learning," 2024-07-29. Verified. Link
European Psychiatry / NCBI PMC - "Effectiveness of AI-driven Individualized Learning Approach for Children with ASD," 2024-08-27. Partially-verified. Link
arXiv - "Implementing Learning Principles with a Personal AI Tutor: A Case Study," 2023. Partially-verified. Link
Axios - "Teachers warn AI is impacting students' critical thinking," 2025-03-30. Opinion. Link
Frontiers in Education - "Sustainable education management of AI in early childhood education," 2026-03-18. Partially-verified. Link
Valere - "AI Tutoring Equity Gap: Why Access Alone Won't Close It," 2026-04-01. Opinion. Link
UNESCO - "Guidance for generative AI in education and research," 2023. Verified. Link
GovTech - "Survey Finds 54% of U.S. Teens Use AI for Schoolwork," 2026-02-26. Partially-verified. Link
Studient - "How AI Solves Bloom's 2 Sigma Problem," 2025-11-13. Interpretation. Link
Stand Together - "How personal AI tutors are helping students and reshaping education," 2025-12-12. Unverified. Link

Known evidence gaps

Khan Academy's headline figures (150M learners; 30-50% faster; 50+ studies) are company/advocacy sourced, not independently peer-reviewed.
Khan Academy NJ/Newark results are correlational panel studies with fixed effects, not RCTs; causal interpretation should be cautious.
The MIT "brain on ChatGPT" study is a small preprint (about 54 participants); circulating figures are inconsistent and should be checked against the primary paper.
The strongest RCT evidence involves older students/young adults or specific contexts; long-term causal evidence for young children remains thin.
The autism/CognitiveBotics study is a small single-arm pre/post evaluation without a control group.
No single authoritative, independently-verified figure for global AI-in-education adoption among children specifically was located.
Long-term effects on retention, motivation, and critical-thinking development beyond short study windows are not yet established.

Audit Trail

Conclusion	Claim	Evidence	Source
AI tutoring can produce large learning gains	Students learned 2x as much in less time vs active learning	Peer-reviewed RCT, verified	Scientific Reports / Nature, 2025
Pedagogically-tuned AI outperforms human tutors on novel problems	+5.5 pts novel-problem success (66.2% vs 60.7%)	Exploratory RCT, N=165, verified	LearnLM Team, Google & Eedi, arXiv, 2025
AI tutoring is highly cost-effective in low-resource contexts	0.23 SD English gain ≈ 1.5-2 years of schooling	Randomized evaluation, verified	World Bank, 2025
AI augmentation levels instructional quality	+4 pts math mastery; +9 pts for lower-rated tutors	RCT, 700+ tutors, 1,000+ students, verified	Stanford / NSSA, 2024
AI frees significant teacher time	5.9 hours saved/week for weekly users	Poll of 2,200+ teachers, verified	Gallup, 2025
Adoption outpaces guidance	35% districts train students; 80%+ untaught	Survey, verified	RAND, 2025
AI study help is mainstreaming among teens	13%→26% (2023-2024); later 54%	Surveys, verified / partially-verified	Pew, 2025; GovTech, 2026
Over-reliance reduces critical thinking	Higher AI confidence → less critical thinking	Study of 319 workers, verified	Microsoft Research / CMU, 2025
Unsupervised AI writing lowers cognitive engagement	Lowest brain engagement among ChatGPT users	EEG study, small preprint, partially-verified	MIT Media Lab via TIME, 2025
AI strains academic integrity	96% of instructors believe students cheated (up from 72%)	Survey, verified	John Wiley & Sons, 2024
Early-childhood AI raises screen-time concerns	Excessive screen time linked to developmental harms	Journal article, partially-verified	Frontiers in Education, 2026
Access alone will not close gaps	Connectivity and structural barriers limit benefit	Advocacy analysis, opinion	Valere, 2026
Governance is forming around human-centered AI	UNESCO and OECD guidance published	Institutional guidance, verified	UNESCO, 2023; OECD, 2025

Final note on confidence: This report's positive conclusions rest on high-confidence, verified RCTs, but those trials are concentrated among older students and specific contexts. Its risk conclusions rest on a mix of verified and partially-verified evidence. The single most important unresolved question - the long-term effect of AI on children's critical-thinking and retention - is explicitly unestablished in the available evidence and should govern the pace of deployment. (Interpretation.)

How AI Helps Kids Learn: A Decision-Grade Intelligence Report on the Evidence, Risks, and Strategy