ANA RUIZ ALONSO-BARTOL
University of California, Davis, Estados Unidos
https://orcid.org/0000-0002-0281-6762
ERIK GARABAYA-CASADO
University of California, Davis, Estados Unidos
https://orcid.org/0000-0002-4782-6665
CLAUDIA SÁNCHEZ-GUTIÉRREZ
University of California, Davis, Estados Unidos
https://orcid.org/0000-0001-8081-7528
Abstract
This study explores the integration of Artificial Intelligence tools into the curriculum of a beginner-level college Spanish course, focusing on the use of ChatGPT-3.5 to support writing tasks. This pilot aimed to compare the reception of AI-generated feedback against the more instructor-led written corrective feedback (WCF). Students were divided into three sections: (a) a control group that received instructor feedback following a color-coded system based on the Dynamic WCF (Hartshorn et al., 2010) approach; (b) an experimental group that used ChatGPT instead, following a prompt created through an iterative process to mimic the same categorization of errors as the control group; and (c) a second experimental ChatGPT group whose prompt entailed less error categorization. Results, collected through pre-/post-quarter surveys that included Likert scales and open-ended questions, suggest that only when AI operates under detailed instructions and supported approach do students perceive the quality, clarity, and usefulness of the feedback as comparable to that of the instructor. Additionally, students in that categorized feedback section experienced a reduction in writing anxiety almost similar to the control group. In contrast, a prompt designed to elicit less categorized feedback led to lower levels of satisfaction and greater distrust toward AI as a feedback tool. This preliminary study highlights that careful AI prompt engineering and teacher support is a key factor in ensuring its pedagogical effectiveness in the language classroom and manage writing anxiety.
Keywords
Artificial Intelligence; written corrective feedback; prompt engineering; curricular implementation; L2 writing anxiety.
Experiencias de estudiantes principiantes de español con feedback del profesor y basado en IA: un estudio exploratorio
Resumen
Este estudio explora la integración de herramientas de Inteligencia Artificial en el currículo de un curso universitario de español para principiantes, con especial atención al uso de ChatGPT-3.5 como apoyo en tareas de escritura. Este proyecto piloto se centró en comparar la recepción del feedback generado por IA frente a la retroalimentación basada en Written Corrective Feedback (WCF) aportada por el profesor. Los estudiantes se dividieron en tres secciones: (a) un grupo de control que recibía comentarios del instructor siguiendo un sistema de codificación por colores basado en el enfoque Dynamic WCF (Hartshorn et al., 2010); (b) un grupo experimental que empleaba en su lugar ChatGPT con una prompt diseñada tras un proceso iterativo para imitar las categorías del grupo de control; y (c) un segundo grupo experimental que utilizaba una prompt que requería una menor categorización. Los resultados, recogidos a través de encuestas pre y post-trimestre con escalas Likert y preguntas abiertas, sugieren que únicamente cuando la IA opera bajo instrucciones detalladas y específicas, la percepción de calidad, claridad y utilidad del feedback se acerca a la del profesor. Además, estos estudiantes experimentaron una reducción en la ansiedad de escritura comparable al grupo de control. En contraste, la falta de control en la formulación de la prompt resultó en menor satisfacción y mayor desconfianza hacia la IA como herramienta de apoyo. Este estudio preliminar sugiere que el diseño cuidadoso de las instrucciones dirigidas a modelos de IA es un factor clave para garantizar su eficacia pedagógica en el aula de lenguas y controlar su ansiedad.
Palabras clave
Inteligencia Artificial; retroalimentación escrita; ingeniería de prompts; implementación curricular; ansiedad de escritura en segundas lenguas.
Recibido el 16/04/2025
Aceptado el 24/06/2025
Cómo citar/how to cite
Ruiz Alonso-Bartol, A., Garabaya-Casado, E. & Sánchez-Gutiérrez, C. (2025). Beginner Spanish student experiences with AI and teacher written corrective feedback: an explorator study. Revista Internacional De Lenguas Extranjeras / International Journal of Foreign Languages, (23), 161-196. https://doi.org/10.17345/rile23.4177
The rapid emergence of generative Artificial Intelligence (AI) tools such as ChatGPT (OpenAI, 2023) has prompted both enthusiasm and concern in educational contexts. Some fear that these technologies may undermine learning by allowing students to produce texts with minimal effort and redefining the limits of plagiarism (Satariano & Kang, 2023; Stokel-Walker, 2022; Yan, 2023). Others, however, argue that when used thoughtfully under the guidance of educators, AI can complement rather than replace active learning (Escalante et al., 2023; Kohnke et al., 2023; Tseng & Warschauer, 2023).
In the context of second language (L2) writing, ChatGPT shows particular potential for supporting written corrective feedback (WCF; Yang & Li, 2024). Providing effective and timely feedback is essential for developing writing skills (Bitchener, 2021; Bitchener & Ferris, 2012; Hartshorn et al., 2010; Hyland & Hyland, 2006), yet teachers often face significant constraints on their time and capacity, especially in large classes (J. Lee & Vahabi, 2018; Yu et al., 2021). AI tools such as ChatGPT, which can deliver immediate and scalable feedback, offer new opportunities to expand learners’ access to WCF without substantially increasing instructor workload (Guo, 2024; Mizumoto & Eguchi, 2023; Yang & Li, 2024; Zou et al., 2025).
Despite these possibilities, research on AI-assisted WCF has largely focused on learners of English, and primarily at intermediate or advanced proficiency levels. Much less is known about how beginner-level learners, particularly those studying languages other than English, engage with AI-generated feedback. Furthermore, while existing studies have examined learning outcomes (Athanassopoulos et al., 2023) or student interaction and engagement with ChatGPT feedback (Zou et al., 2025), fewer have addressed the practical challenges instructors encounter when integrating AI tools into the curriculum. Key decisions regarding prompt design, the structure of feedback cycles, and strategies to promote student engagement with AI feedback remain underexplored in naturalistic classroom studies.
This exploratory study seeks to address these gaps by examining the use of ChatGPT for WCF in a beginner-level Spanish course where writing and self-editing are emphasized. Three approaches to feedback are compared: (1) instructor-only feedback based on a list of set error categories, (2) ChatGPT feedback with a prompt based on the same categories as the ones used by instructors, and (3) ChatGPT feedback with a prompt that only requests general error areas that are not as categorized. The study reflects on the practical lessons learned from implementing AI-assisted feedback in a real classroom setting, designing the AI prompt and using it in L2 writing tasks, as well as provides a comparison of students’ perceptions between the different types of feedback. The aim is to offer insights for instructors of Spanish and other languages who are considering integrating AI tools into their teaching and are looking for guidance on both the benefits and the challenges of doing so.
Written corrective feedback (WCF) has long been recognized as an essential component of second language (L2) writing instruction, with extensive research demonstrating its effectiveness in improving learners' grammatical accuracy and writing fluency (Bitchener, 2021; Bitchener & Ferris, 2012; Cheng & Zhang, 2021; Ferris & Kurzer, 2019; Hyland & Hyland, 2006; Kang & Han, 2015). A key dimension of WCF concerns the level of explicitness provided in the feedback, ranging from direct feedback, where the instructor supplies the correct form, to more indirect feedback, where errors are indicated but learners are left to identify and resolve them (Hyland & Hyland, 2006, Kang & Han, 2015). Although direct feedback offers immediate clarity and may be simpler to unpack, indirect feedback may foster deeper learning by requiring students to engage in problem-solving and hypothesis testing about their linguistic choices, while also facilitating iterative writing processes and building their autonomy (Bowles & Gastañaga, 2022; Ellis, 2009; Mao & Crosthwaite, 2019; Mao & Lee, 2020). Beyond increasing self-efficacy, this active engagement has been associated with greater retention of linguistic forms and the development of self-editing strategies over time, particularly when learners receive appropriate guidance on how to interpret and apply indirect feedback (Ferris & Roberts, 2001; Ferris & Kurzer, 2019).
In addition to the level of explicitness, WCF can also vary in its scope, being either focused or unfocused—that is, targeting specific categories of errors or covering a broader range of issues without limiting attention to particular forms (Frear & Chiu, 2015). Both approaches carry advantages and limitations: while unfocused feedback may overwhelm learners, especially beginners, due to the sheer number of corrections, focused feedback risks leaving unaddressed errors that may become fossilized if not eventually corrected (Bonilla López et al., 2018; Lee, 2020; Mao & Lee, 2020; Nassaji & Kartchava, 2021). As a result, instructors typically adopt a flexible approach, selecting a level of focus that aligns with the specific learning objectives of the course and the nature of the writing task (Ferris, 2012).
Building on these foundations, and considering the physical and emotional demands placed on teachers in providing feedback (Lee & Vahabi, 2018), the emergence of automated writing corrective feedback tools has introduced new opportunities for delivering more personalized and immediate feedback to language learners (Chen & Cheng, 2008). Unlike earlier automated systems, recent generative AI tools such as ChatGPT offer dynamic, conversational responses that can be adapted to individual learner needs (Vera, 2023). This adaptability enables not only an increase in the quantity of feedback learners receive but also makes the process more interactive and responsive to student input (Han & Li, 2024). However, the effectiveness of ChatGPT as a feedback tool depends heavily on the design of the prompts that shape its output (Lee, 2024). Naamati-Schneider & Alt, 2024). Without carefully crafted prompts, AI-generated feedback risks being overly general, inconsistent, or even misleading, potentially causing confusion rather than promoting learning (Yoon et al., 2023). While ChatGPT provides scalable feedback options, current evidence suggests that it functions best not as a replacement for teacher feedback, but as a complementary tool that enhances and supports teacher input (Lee, 2024; Steiss et al., 2024). Determining the most effective balance between automated feedback and teacher-led guidance remains an ongoing area of exploration in L2 writing pedagogy.
In this context, prompt engineering has emerged as a critical practice for maximizing the educational value of AI tools like ChatGPT. At its core, prompt engineering involves crafting queries for Large Language Models (LLMs) that elicit desired outputs aligned with pedagogical goals (Liu et al., 2023). Even subtle variations in prompt design can significantly affect the accuracy, relevance, and tone of AI-generated feedback (Jacobsen & Weber, 2025). To address this challenge, Lo (2023) proposed the CLEAR framework (concise, logical, explicit, adaptive, reflective), which offers practical guidelines for designing effective prompts. However, prompt engineering remains a developing field. Although specific frameworks for educators have been proposed (see UNESCO, 2024), many teachers still face the challenge of navigating AI tools independently, and the steepness of the learning curve often depends on factors such as individual characteristics, time constraints, and technological familiarity (Isemonger, 2023). For this reason, sharing real-life classroom experiences with prompt design and implementation is especially valuable for building communities of practice and prompt databases tailored to specific instructional contexts, such as L2 written corrective feedback, that educators can reuse and adapt across a variety of educational settings.
An understudied aspect of prompt engineering concerns not only the design of prompts themselves but also the language in which they are written, as well as the overall effectiveness of AI models in providing feedback in languages other than English. Regarding the language of the prompt, Yang and Chen (2025) compared prompts written in English and Chinese that were designed to elicit feedback on erroneous sentences in their respective languages. Their study found no significant differences between the two in terms of the accuracy, level of detail, or comprehensibility of the feedback produced. However, it is well established that most LLMs used broadly in the United States are primarily optimized and trained on English texts, leaving many minority languages underrepresented, both linguistically and culturally (Godwin-Jones, 2025). In a recent study, Fokides and Peristeraki (2025) demonstrated that ChatGPT provided significantly more accurate feedback for English texts than for Greek, highlighting the need for instructors to carefully consider the limitations of AI-generated feedback in languages other than English. These findings suggest that while prompt design is crucial, the language of both the prompt and the target text may also influence the effectiveness of AI-assisted feedback, reinforcing the importance of teacher oversight when deploying these tools in diverse linguistic contexts.
In addition to examining the quality and quantity of AI-generated WCF, recent research has also explored students’ perceptions of this feedback in comparison to teacher or peer feedback. For example, Zeevy Solovey (2024) compared these three feedback modes in an upper-intermediate L2 English course. While students appreciated the immediacy and accessibility of AI feedback, 46.67% reported preferring teacher feedback over any other type, and 33.33% indicated that their ideal scenario would combine both AI and teacher feedback. In their qualitative comments, students noted that although AI feedback allowed them to correct errors quickly, they ultimately viewed the teacher as the expert, whose feedback was more detailed, reliable, and targeted. Overall, while students welcomed the support provided by AI feedback, they continued to place greater trust in the feedback given by their instructor and found peer feedback to be the least useful. These subjective measures align with findings from studies on the quality of ChatGPT feedback, such as Steiss et al. (2024), which show that teacher feedback tends to be more accurate, supportive, and focused on essential features of the text, regardless of the overall writing quality of the student. Collectively, prior research in L2 English suggests that ChatGPT and other AI chatbots can serve as a valuable complement to teacher feedback, particularly in contexts where the teacher is not immediately available. However, given known differences in feedback quality across languages, there remains a need to replicate these findings in other linguistic contexts and to compare different types of feedback (e.g., focused vs. unfocused).
The present study seeks to address this gap by comparing three types of feedback: (1) teacher feedback based on a semi-focused list of error categories (e.g., grammatical gender agreement, tense-aspect), (2) AI feedback following similar error categories, and (3) AI feedback using broader, non-specific categories (e.g., grammar, vocabulary). Conducted in a beginner-level Spanish language class, the study offers insights into an under-researched group of learners. To the best of our knowledge, there are currently no empirical data on L2 Spanish students’ experiences with ChatGPT feedback, and more broadly, most of the literature on WCF—whether AI-based or traditional—has centered on intermediate to advanced learners rather than beginners (Chong, 2019). This exploratory study aims not only to compare students’ experiences across the three feedback conditions but also to provide a detailed account of the decision-making and prompt engineering processes involved in designing this AI-based feedback intervention.
Concretely, this exploratory study aims to answer the following research questions:
1. What are students’ perceptions of AI versus teacher feedback, regarding their overall experiences and impact on their writing anxiety?
2. How do their perceptions and experiences vary depending on the level of specificity of AI feedback?
In addition to answering these questions, the study aims to provide a detailed account of the authors’ experience going through the iterative process of prompt engineering, resulting in a prompt that produced feedback as similar as possible—at least in terms of quantity—to that of a teacher. This narrative section of the methods is intended to be of value to language teachers interested in implementing similar AI feedback protocols in their beginner Spanish classes.
This exploratory study was conducted within the Spanish program at a higher education institution in Northern California. Specifically, it analyzed courses from the first quarter of the Spanish program, designed for absolute beginners—though most students had some prior exposure to Spanish, either in high school or at home. The course, titled Spanish for Travelers, was structured following the frameworks of the flipped classroom, task-based learning, and specific-purpose approaches. Grammar, vocabulary, and various communicative contexts and dialogues related to traveling abroad were integrated into the curriculum. Each hybrid section (meeting twice per week) typically enrolls approximately 25 students per quarter.
Students enrolled in this course were invited to participate in the study during the first week of class. One of the researchers visited the class to explain the project, addressing issues related to anonymity, voluntary participation, benefits, and the ability to withdraw at any point during the quarter. Students who consented to participate received extra credit for their final grade. Participation involved completing a survey at both the beginning and end of the quarter. The compositions analyzed in this study were already embedded in the curriculum and therefore mandatory for all students, ensuring that participation did not require additional effort. Students’ demographic information can be consulted in Table 1. The specific methodological differences between the three groups are further explained in Section 3.4. below.
Table 1. Student Demographic Information
Groups |
Gender |
Age range in years (Mean) |
||
Male (N) |
Female (N) |
Non-binary/Prefer not to say |
||
Control Teacher DWCF (N=16) |
8 |
8 |
0 |
17-23 (19.62) |
Experimental 1 Categorized AI (N=19) |
4 |
14 |
1 |
18-22 (19.84) |
Experimental 2 Uncategorized AI (N=11) |
4 |
7 |
0 |
18-25 (19.91) |
In turn, instructors teaching the participating sections also provided consent, completed a demographic pre-quarter survey and maintained reflective diaries, in which they documented their experiences providing feedback on student assignments. The instructor responsible for both the Control and Categorized AI groups is a native Spanish speaker with a PhD in Second Language Acquisition, has four years of experience teaching in higher education, and is also a co-author of this article. The Uncategorized AI section was taught by a non-native Spanish speaker with a PhD in Linguistics and 25 years of language teaching experience.
The length of the course was 10 weeks (quarter system), with one activity module per week. Students met with the instructors in person twice per week, and completed at-home assignments that presented grammar and vocabulary beforehand, following the flipped classroom approach. Table 2 shows a list of the different activities, quizzes, and tasks included in the curriculum.
Table 2. Course activities descriptions
Task |
Description |
Frequency |
iSpraak |
Students record themselves reading a text related to the content seen in class. This platform assigns a score based on pronunciation. |
One every week. |
Playposit |
Course-specific videos featuring on-screen comprehension questions related to vocabulary, grammar, and cultural contexts from the week. |
2 per week. |
Grammar/Vocabulary Quizzes |
Structural activities used to reinforce the grammar and vocabulary from the week. |
1-2 per week. |
Cultural/Contextual Activities |
More open-ended activities where students usually look up for cultural information that complements the content of the week (e.g., a recipe, a trip). |
2-3 per week. |
Writing Tasks |
Written compositions completed on the students’ computers in class, consisting of a variety of textual genres related to the course content. |
Biweekly. First draft every other week, followed by the reviewed draft the week after. |
Review Quizzes |
Longer, open-book quizzes with a mix of structural activities and open-ended questions to review each week’s content. |
One every week. |
Self-Assessment Quiz |
Multiple-choice and open-ended questions that ask the students to elaborate on how confident they were and how well they understood the week’s content. |
One every week. |
Oral exam |
3-to-5-minute oral exam conducted in pairs, following a scripted dialogue related to the different modules’ context (e.g., ordering food at a restaurant, a taxi driver-client conversation, etc.). |
One midterm and one final oral exam. |
The writing tasks (WTs) were developed to adapt to the SPA1Y course’s content and skills progression, designed following the Task-Based Language Teaching (TBLT) methodology and around a Study Abroad/traveling theme (see Sánchez-Gutiérrez et al., 2022). These WTs were later created to simulate real-life speech acts useful for said contexts. Pertaining to different academic/personal communication genres—namely formal email information requests, social media posts, and informal writing—, the WTs aimed at practicing course grammar/vocabulary in different registers. They started by asking students to write 3-4 sentences at the beginning of the 10-week quarter, building up to 4-6 by the end. In turn, the coding system was developed as a mid-focused approach between fully comprehensive written corrective feedback (WCF)—which may end up overwhelming students, especially at this introductory level—and focused WCF that may not be as useful for students’ varied needs and would not be as ecologically valid, given that they would only cover specific linguistic structures (Chong, 2019; Ferris, 2010; Ferris & Roberts, 2001; Lee, 2020; Mao & Lee, 2020).
In addition, these writing tasks had also been implemented in an attempt to increase students’ self-efficacy, by incorporating regular writing and self-editing practice beyond isolated asynchronous assignments that only required one version and portrayed writing as a product, more than as an iterative process (Manchón & Leow, 2020). In conjunction with these attempts, the Dynamic Written Corrective Feedback method (DWCF; Evans et al., 2010, Hartshorn et al., 2010) was implemented, which was designed based on Skill Acquisition Theory (DeKeyser, 2007) so that writing “instruction, practice, and feedback should be meaningful, timely, and constant” (Hartshorn & Evans, 2012, p. 225). Therefore, indirect feedback was chosen over direct types that provide a correct form (Ellis, 2009), to increase student agency, self-regulation and align with the course’s student-centered approach (Muñoz-Basols & Bailini, 2019, Papi et al., 2020, Yang et al., 2022). Then, color-coding was prioritized for its manageability benefits over other indirect methods like abbreviations, which may prove more complicated to implement and decipher. Based on common errors on previous student samples at the same level, error categories were expanded from ongoing program-level explorations of DWCF’s adaptability for First-Year Spanish to allow for additional flexibility and avoid overly broad categories. Therefore, different colors were included for a) vocabulary, b) verb conjugation, c) other grammar (e.g., gender-number agreement, tú/usted register), d) spelling, e) connection issues, f) rephrasing/major structural errors, g) plus an extra one for unnecessary words.
In terms of logistics, participant students were asked to create a copy of a Google Doc template to write all their compositions for the quarter. This template was divided into three sections, regardless of section. As shown in Appendix 1, in Section A of the template students wrote the first draft. Section B was reserved for them to paste a screenshot of teacher/AI feedback, to then review it and categorize their mistakes into an already-provided error log table, identifying errors in the above-mentioned categories. Lastly, Section C was a space to write the self-corrected second version. For all three groups, grades remained hidden and feedback could not be accessed until exactly a week later, when students received it in class, self-revising with it to submit a final version. They then received a new grade and additional feedback, if necessary, at the discretion of the instructor. This process was repeated for WT 2, 3 and 4, as WT 1 and 5 consisted of a pre- and post-test sample, respectively, consisting of a short introduction written in class and repeated in the last class of the quarter. For teacher grading, a rubric had been developed for the first draft for all sections (Appendix 2), focusing on application of 1) course grammar, 2) course vocabulary, worth 1.5 points each; 3) relevant, detailed content based on the prompt, for 1 point; and 4) comprehensibility of and cohesion between the full sentences, for an additional point. Moreover, a different rubric was created for the self-edited final version (Appendix 3), which assessed 1) whether the student covered everything that had been marked (regardless of how accurately), with 2 points; 2) adhered to the template and instructions, with 1 point, and 3) made efforts to actually improve the text (accuracy of self-edits), with 2 points.
Regarding the AI prompts’ engineering, prompts were carefully designed through five rounds of edits, in order to avoid students’ revision pressure and frustration with excessive/not actionable feedback as mentioned above, since students themselves were going to be engaging with ChatGPT-3.5 and retrieving its feedback in class. Firstly, the instructions that teachers were going to receive to provide their feedback for the writing tasks (WTs) were written up (Figure 1) and fed to said version of ChatGPT, alongside a de-identified sample student paragraphs from the same tasks written by past students who had consented to a similar study.
Figure 1. Prompt for Categorized Section - Round 1

This prompt was tested with two samples (Figures 2 & 3), corresponding to the two ends of the proficiency scale that are commonly seen in this Beginner class—as determined by one of the researchers, with an SLA background and more than 10 years of teaching experience. With this first prompt, the “Upper Beginner” sample, in which the teacher-researcher had only marked 3 errors the previous quarter, returned 8 errors in ChatGPT. In turn, the “Low/Mid Beginner” sample, previously marked by the teacher as containing 9 errors, received 13 errors by ChatGPT. These discrepancies were compounded, since the AI was also listing all correct structures and providing revised forms via direct feedback, even though the prompt specifically asked it not to do so, and despite being issues that could be reasonably accessible for beginner students to tackle on their own. In total, hence, ChatGPT would provide 36 feedback items to the student if they were to engage in this exercise, which seemed excessive and demotivating (Lee, 2020). This feedback also included confusing, vague comments like “The sentence is clear but consider the verb adjustment”, or seemingly unnecessary/inadequate ones given the WT’s goals, prompt and rubric, as in “**Connection and Structure**: "pero" connects ideas well, but consider a semicolon for a stronger pause.”
Figure 2. Comparison of Teacher (left) versus AI (right) Feedback on High Beginner Sample using First Prompt

Figure 3. Comparison of Teacher (left) versus AI (right) Feedback on Low/Mid Beginner Sample using First Prompt

Therefore, the prompt was refined and, since the “Low/Mid Beginner” sample seemed to lead to the most discrepancies, we focused on this end of the spectrum for further prompt engineering. We added a sentence to emphasize that there was no need to mention correct structures. As seen in Figure 4, it returned 10 errors, therefore decreasing from the first round, but still included unnecessary noise like noting sentences that were fully correct. This second version still provided direct re-writes consistently, despite being specifically prompted not to engage in direct feedback. Also, ChatGPT was not following the instruction to organize its feedback by sentence and was repeating the same comments in different sections, as seen in numbers 1 and 5.
Figure 4. AI Feedback on Low/Mid Beginner Sample using Second Prompt

Once again, the prompt was reorganized in shorter sentences to isolate each individual instruction and re-tested with the same “Low/Mid Beginner” sample from above. Similarly, it continued to return 10 comments and followed the desired sentence-by-sentence structure and the indirect feedback items increased, although some comments were just pointing out correct structures and. In addition, as shown in Figure 5, there were still some comments deemed by the teacher-researcher to be optional/unnecessary—like the “Word order” item on Sentence 4—, and even incorrect—the “Spelling” item on Sentence 4.
Figure 5. AI Feedback on Low/Mid Beginner Sample using Third Prompt

The prompt continued to be refined a fourth time, constraining ChatGPT to only highlight errors that interfered with communication without optional improvements, and also to emphasize the indirect feedback approach. Figure 6 shows that this fourth prompt also resulted in 10 marked items, with a high percentage of direct feedback, much like the previous round, although with different examples of unnecessary (“Vocabulary” item in Sentence 2), or inaccurate comments (“Gender and Number Agreement” item in Sentence 5).
Figure 6. AI Feedback on Low/Mid Beginner Sample using Fourth Prompt

After the general format had been tested for a few rounds, this fourth prompt was also tested for the additional treatment section. As detailed in Section 3.4. below, this group was meant to follow the same general feedback philosophy and format specifications—as to ensure a certain level of support that would be ethical for students working through this novel treatment—, yet to remain slightly less structured by not requesting AI feedback to be categorized in the specific error categories that teachers were to follow. Here, then, the uncategorized prompt just requested to identify grammar, meaning, spelling and sentence-structure mistakes, as seen in Figure 7:
Figure 7. Prompt for Uncategorized Section - Round 4

Figure 8 shows the number of marked errors, 12, which was deemed acceptable considering what the 9 that teacher had marked previously for this sample. However, most of the feedback above appeared to be direct and there were also some inadequate suggestions, such as the “Word Order” item in Sentence 2, or mischaracterizations of errors, such as the description of the issue as “Preposition” instead of “Article” in Sentence 10. Therefore, the prompt was slightly edited in a fifth round of testing (Figure 9), by tweaking the language, order, and length of sentences in order to emphasize each intended piece, with better results.
Figure 8. AI Feedback on Low/Mid Beginner Sample using Fourth Prompt with Uncategorized AI Prompt

Figure 9. Prompt for Uncategorized Section - Round 5

Once that prompt had been through several rounds, in addition to the “Low/Mid Beginner” text from above, it was also tested with a “True Beginner” sample—also fairly common in the target courses. The teacher-researcher had marked it with 16 errors the previous quarter yet, despite being aware that more issues were affecting communication, decided against adding more feedback to avoid overwhelming the student, who the teacher-researcher knew was very enthusiastic about learning Spanish but was struggling with accuracy. Moreover, one of those 16 markings corresponded to direct feedback, considered to be outside what had been covered and expected in the course at that point, and another additional comment was provided. In turn, when prompted with the equivalent fifth prompt for categorized feedback, ChatGPT arrived at 17 errors. As seen in Figure 10, there were still some miscategorizations (such as categorizing a missing “ñ” as an accent under “Gender and Number Agreement” in Sentence 3), or some broad suggestions in the last three items. Yet, all comments corresponded to indirect feedback, followed the intended order and structure of sentence-by-sentence feedback, therefore reaching the overall intended approach.
Figure 10. Comparison of Teacher (top) vs. AI (bottom) WCF on True Beginner Sample using Categorized Fifth Prompt

This “True Beginner” sample was also tested with the uncategorized prompt, leading to many fewer errors than in the previous attempt, although majoritarily corrected directly. Therefore, the prompt was updated a sixth and final time, for added clarity, emphasis of the indirect approach and desired structure (as seen in Figure 11).
Figure 11. Prompt for Categorized Section - Round 6

This sixth prompt test, shown below in Figure 12, was repeated several times, which showed that if the queries remained in the same tab session, feedback and format became increasingly inaccurate. Therefore, the same prompt was tested by closing the tab and opening a different session, at which point ChatGPT returned slight variations, but similar overall ratios that align to what had been described.
Figure 12. AI Feedback on True Beginner Sample using Sixth Prompt with Uncategorized AI Prompt

Despite variability about the percentage of direct feedback and ratio of optional, unnecessary, or inadequate comments, the prompt usually provided positive reinforcements and was deemed supportive enough to be piloted in the classroom. The aim was going to be exploring AI feedback as a companion tool, not a full replacement of teacher feedback, given that teachers were going to be present during the AI interactions, were going to supervise its feedback when grading and were going to provide written feedback in other short asynchronous assignments. In addition, it presented the potential to still be useful for building students’ writing, editing and self-regulation, as this additional source of feedback would provide opportunities to increase their monitoring skills. Therefore, after the final updates, the implemented prompt is shown in Figure 13 (see also Appendix 4 for all final prompts).
Figure 13. Final AI Feedback on True Beginner Sample

The three sections of participants were divided as follows: 1) Control Group (feedback was provided by the instructor following the modified DWCF method with color-coding, see section 3.2. above); 2) Experimental Categorized AI Group (feedback was provided by AI using the prompt designed by the researchers, aiming to mimic the DWCF error categories, see Section 3.3. above); and 3) Experimental Uncategorized AI (when interacting with ChatGPT, students used a less categorized prompt). As indicated earlier, Control and Categorized AI groups shared the same instructor.
It should be noted that the additional experimental section (Uncategorized AI) could have remained even more broad, had we not limited the format of the requested AI feedback as much or even not included a prompt. Yet, the variability observed during prompt engineering rounds regarding amount and directedness of WCF led us to believe that it would not be ethical to have some students receive fully uncontrolled feedback, given potential frustrations with overwhelming or not actionable comments and revision pressure (Cheng & Liu, 2022). Moreover, privacy considerations were also important when asking students to input their writing into proprietary AI, therefore WT instructions specifically asked them to de-identify any personal information prior to entering their samples. It also offered them the choice of either to log in with their student account—although turning off any personal customizations—or remain logged out to avoid any tracking of their data. In addition, in the event they needed to re-enter their text, students were instructed to close the ChatGPT tab and start over on a different one, to avoid potential memory retention bias through previous interactions (Lin & Crosthwaite, 2024).
Pre- and post-quarter questionnaires were implemented using Qualtrics (2023) and included sections for informed consent, demographic/linguistic background, perceptions about the writing and feedback regiments, and about generative AI—generally and specifically for academic purposes. Moreover, a version of Cheng’s Second Language Writing Anxiety Inventory (2004), adapted for Spanish writing, was also collected before and after the treatment.
Descriptive statistics provided by the Qualtrics Suite and Excel were used for quantitative data. Participants’ open-ended comments were analyzed and categorized, as shown in Table 4, following inductive Qualitative Content Analysis (Mayring, 2000) and then a phenomenological approach (Creswell & Creswell, 2017) was implemented on teachers’ journals to triangulate data from student questionnaires.
The primary purpose of this pilot study was to explore the reception of AI tools as part of the curriculum integrated into writing activities. Recurrent themes were identified through repeated readings of the responses, and categories were refined to capture shared perceptions and experiences. As shown in Appendix X, none of the experimental participants identified AI as the most outstanding part of the course. Given the communicative nature of the course, it is not surprising that “Speaking in class/In class activities” emerged as the most valuable component, followed by various homework assignments.
Table 3 presents students’ perceptions of the usefulness of the type of feedback that they received, either from teacher color-coding or AI, using a Likert scale for agreement. The standard deviation was less than 1.00 for most of the parameters analyzed in the Control, Categorized and Uncategorized AI groups, indicating consistency in participants’ impressions. For the former, students generally appreciated instructor comments and found them useful, with the exception of the last two items: the need for additional clarification to self-correct drafts (mean 3.20) and the challenge of completing the error log (mean 2.53), which the Uncategorized section teacher also noticed in her journal. In turn, participants in the Categorized AI section reported a lower need for additional comments on their feedback. This section’ instructor, though, was glad to interact with students during this session, although most questions seemed to be logistical. However, students in the Uncategorized AI group expressed an even greater need for further assistance. Regarding quality of feedback, the Control and Categorized AI sections seem to share similar perceptions. Yet, the rankings were overall lower in the Uncategorized AI results.
Table 3. Usefulness of Feedback (FB) Received from Instructor/AI
Control |
Categorized AI |
Uncategorized AI |
||||
Statement |
Mean |
SD |
Mean |
SD |
Mean |
SD |
1. Overall, the FB I received was useful to develop my language skills |
4.27 |
0.68 |
4.21 |
0.67 |
4.00 |
0.63 |
2. Overall, I engaged with the FB |
4.13 |
0.72 |
4.14 |
0.83 |
3.60 |
0.92 |
3. The [codes (colors)/ChatGPT] FB was useful for me to self-correct my draft |
4.27 |
0.68 |
4.21 |
0.56 |
3.80 |
0.75 |
4. The amount of FB was manageable |
4.33 |
0.70 |
4.36 |
0.61 |
4.20 |
0.60 |
5. The amount of FB was enough to help me self-correct my draft |
4.27 |
0.57 |
4.14 |
0.99 |
3.90 |
0.70 |
6. The FB was specific for my needs |
4.20 |
0.75 |
4.07 |
0.88 |
3.60 |
0.92 |
7. I felt like I needed side comments or additional clarification from the teacher in order to self-correct my draft |
3.20 |
1.11 |
2.50 |
0.98 |
3.40 |
1.20 |
8. Completing the error logs was hard to do based on the [color-coded/ChatGPT] FB |
2.53 |
0.96 |
2.71 |
1.28 |
2.70 |
1.10 |
Note. Higher scores represent positive experiences, except in items 7 & 8, where lower scores reflect more positive experiences.
In addition to Likert-scale items, the follow-up open-ended section similarly showed that a majority of students responded positively to self-correction sessions generally, both the AI- and the teacher-mediated formats. Moreover, participants in the two AI groups were explicitly asked to respond to an open-ended question about their experiences receiving feedback from an AI platform compared to feedback from their instructor. Student responses reflected a range of perspectives, revealing both perceived advantages and limitations of AI-mediated feedback, as seen in Table 4. Most participants from the Categorized group leaned towards overtly praising ChatGPT’s WCF with 8 out of 141 students, followed by 3 who expressly preferred teacher-led WCF and 3 who showed appreciation for both. The trends were reversed for the Uncategorized group, since 6 out of 102 open-ended answers stated a preference for human feedback, while 3 described in more positive terms, and 1 remained mixed, although very succinctly.
Table 4. Continuum of Student Perceptions about AI- vs. Teacher-mediated Feedback
Overtly positive about AI WCF |
“Benefits was [sic] the instant response and straightforwardness.” (ID8_Categorized) “The process was smooth and efficient ” (ID9_Uncategorized) “I think AI gives you more detail on why it's wrong and I found that super helpful.” (ID19_Categorized) “It was great. It was clear and gave clear instructions and I knew exactly were [sic] to fix. very beneficial.” (ID15_Categorized) “It helped to make things fast and personalized since waiting for a teacher could be time consuming.” (ID12_Categorized) |
More mixed/ combined appreciation of AI WCF |
“It really helped because I got feedback from AI and the professor.” (ID4_Categorized) “The benefits were that the feedback was precise and easy to understand with the disadvantages being that if you didn't understand why you got something wrong there was not much help which is when a teacher is useful.” (ID11_Categorized) |
Appreciative of AI WCF but explicit dispreference / Overtly negative about AI WCF |
“The benefits of it is its accessibility as it is an online feature and is given free for anyone to use. There was no anxiety at all, since it is just artificial intelligence. That being said, I believe that getting advice or any sort of feedback from my teacher will always be better.” (ID6_Uncategorized) “I do not think there were any major disadvantages for me, but it is also nice to get feedback from a person who speaks the language versus a computer. I had a hard time trusting AI and submitting my work because I wasn’t sure if I was doing enough corrections, or inaccurate corrections and would have liked to have had more human feedback on the writing assignments specifically.” (ID10_Uncategorized) “It was helpful, giving me specific pointers that I needed. The teacher doing the same thing would have also helped, perhaps better since there is some sort of engagement there.” (ID8_Uncategorized) “I would rather have comments from a teacher. Knowing that AI messes up occasionally makes me trust it far less than I would an instructor.” (ID10_Categorized) “I felt that AI only corrected my mistakes and did not give valuable feedback as a teacher would.”(ID4_Uncategorized) “I would prefer real feedback, the AI still can make some mistakes and tried to correct my words with things I have yet to learn.” (ID5_Uncategorized) |
These and the remaining open-ended questions highlight the positive attributes assigned to AI WCF. In the Categorized Group particularly, several students highlighted the speed and accessibility of AI-generated feedback and its clarity, although somewhat less clear-cut. These participants appreciated how the platform provided exhaustive suggestions that helped improve their writing, complemented instructors’ support and, being aware of their workloads, were more immediate. In connection to Table 3 item’s about needing extra clarifications, only one student reported engaging with ChatGPT for support, since “you can command it to help you in varying degrees, depending on what type of aid you feel is necessary” and “control holistically” whether to receive a direct answer (ID14_Categorized), with another referring to the potential to “ask it dumb questions” (ID1_Uncategorized).
Regarding the negative attributes, however, most students in both sections point to trust and accuracy issues. Those perceptions were particularly salient among Uncategorized group’s students, where there was an added feeling of being corrected excessively or on aspects beyond their proficiency. These aspects were shared by instructors in their reflections, since both reported inaccurate feedback and direct corrections, which in the Categorized section even one student noticed and asked the instructor about. The Uncategorized section teacher noticed an “oversight” when ChatGPT did not recognize a student’s name—present despite instructions to anonymize drafts—and therefore corrected the corresponding verb “Me llamo” (my name is), which seemed problematic to her “perhaps because her name isn't traditionally English-sounding.” In a very similar case, the Categorized group teacher even happily recorded that the student had noticed the incorrect feedback themselves while self-correcting in class, and asked whether to ignore, which to the instructor evidenced “a good level of internalization of the grammatical rule”. However, this teacher also noticed that there were issues left unmarked by ChatGPT that the students did not notice on their own and that he felt the duty to highlight when grading, yet without ducking points, which ended up re-balancing most if not all the time saved not providing initial WCF. The other teacher, however, felt that grading was super easy and only spent some time reviewing students’ writing/AI-markings/revisions, especially when they seemed too advanced for their level. In that regard, both instructors’ journals mentioned this concern while grading, while only one student mentioned potential plagiarism violations as a downside, fearing “it becomes a crutch and can lead to cheating” (ID16_Categorized).
In line with overall positive perceptions about the writing tasks, descriptive statistics of the Second Language Writing Anxiety Inventory (SLWAI, modified for Spanish from Cheng, 2004) showed that all sections decreased their L2 writing anxiety throughout the course. However, as seen in Table 5, the students from the teacher-feedback section decreased their anxiety the most, followed closely by the section that received categorized AI feedback, and finally by the section with more uncategorized AI feedback.
Table 5. Comparison of Decrease in L2 Writing Anxiety Scores (SLWAI) by Treatment Section
Sections |
Pre- quarter average score |
Post- quarter average score |
Overall decrease |
Control (N=16) |
54.7 pts |
48.25 pts |
12.5% |
Experimental Categorized AI (N=19) |
51.9 pts |
46.1 pts |
11.8% |
Experimental Uncategorized AI (N=11) |
53.5 pts |
52.1 pts |
4.5% |
Once again, qualitative data shed light on some of the dynamics, since most open-ended comments highlighted a general perception of low anxiety surrounding the writing tasks, regardless of the section. Some students from the AI groups linked it to a lack of “negative feedback” (ID13_Categorized) or to the nature of the source, which seemed to lower the stakes of their attempts at L2 writing, since “it is just artificial intelligence” (ID6_Uncategorized), “it did not feel as personal when compared to instructor feedback” (ID17_Categorized) and “having a human look over my work is a little scary” (ID2_Categorized). Whenever anxiety was present, students related it to mostly to the lack of trust and support, as mentioned above, and minorly to having “a lot of writing to go over [...], so it could get confusing” (ID13_Categorized), or to the initial learning curve that eventually passed. Relatedly, teachers journals also showcased that initial anxiety of not providing feedback first and going into an innovative treatment that required trust and a lot of logistics, although it eased with time and repetitions. Another source of concern for the instructor that taught with both teacher-WCF and AI-WCF was the potential for excessive feedback if texts get longer, corrections that he would not include considering the current course level, and a lack of engagement with the rubric comments when students are working on ChatGPT.
This pilot study aimed at exploring students' perceptions of AI-provided feedback compared to teacher feedback. Regarding research question 1, results point to a continuum of student impressions about AI/teacher feedback. Despite remaining overall positive quantitatively, their quantitative perceptions often hedge AI’s perceived benefits of speed, accessibility, and clarity with recognitions of the value of ensuring a minimum of human feedback, at times showing explicit preference for the latter. In that regard, students particularly seemed to appreciate the added trustworthiness, accuracy, engagement, personalization, and contextualization provided by teacher comments, who could more deftly take into account the specifics of the course, proficiency level, and real-life connections, as one of the teachers also noted. Teachers seemed more concerned than students about the potential for expanded plagiarism stemming from using AI as a self-revising tool, and not really concerned about lack of effort or engagement (Satariano & Kang, 2023; Stokel-Walker, 2022; Yan, 2023).
Throughout the study, students in all groups reported a general decrease in writing anxiety from pre- to post-test, as measured by the adapted SLWAI scale. However, the group that received teacher feedback (Control Group) showed the most significant reduction in anxiety levels (12.5%), followed closely by the Categorized AI group (11.8%), and from afar by the Uncategorized AI group (4.5%). This trend suggests that more structured and personalized feedback—whether human or AI-mediated—may contribute to greater student confidence in their ability to produce better writing tasks. The relatively smaller reduction in anxiety in the Uncategorized AI group might reflect the challenges posed by less guided and broader error codes, echoing calls for educators to use AI as a companion rather than a replacement (Escalante et al., 2023; Kohnke et al., 2023; Tseng & Warschauer, 2023) and the importance of prompt engineering (Yang & Chen, 2025).
In terms of research question 2, questionnaire data regarding feedback quality shows that Control and Categorized AI share similar positive perceptions. Students in the Control Group generally found the color-coded teacher WCF clear and useful, though they still expressed some difficulty with completing error logs and the need for additional clarification. Interestingly, participants in the Categorized AI group reported even lower needs for further clarifications, perhaps due to the combination of ChatGPT providing brief explanations, including metalinguistic feedback, plus more informative error categorization with the instructor logistical support. In turn, students in the Uncategorized AI group expressed a greater need for further assistance and their rankings were overall lower, hinting that some degree of specificity over the feedback from the instructor/AI may be more effective for students to develop as agentive, self-regulated writers and editors (Muñoz-Basols & Bailini, 2019; Papi et al., 2020; Yang et al., 2022).
Participants’ qualitative perceptions of the received feedback support this interpretation. When explicitly asked to reflect openly about AI/instructor feedback, students shared mixed impressions. Some, especially in the Categorized section, praised the immediacy, accessibility, and exhaustivity of AI-generated feedback, which may connect to its potential to serve as a personalizable tool (Vera, 2023). In turn, most students in the Uncategorized group favored instructor feedback, emphasizing its pedagogical value and reliability. Parallelly, concerns about AI WCF’s accuracy were also shared in teacher reflections, both in terms of advanced, inaccurate, de-contextualized and direct feedback, to different degrees. In addition, despite some students’ concerns about teacher workloads, only one of the two instructors felt a true relief when letting ChatGPT’s be in charge of the course’s bulk written feedback. The other, in contrast, felt a large responsibility to monitor the AI comments and compensate its above-mentioned shortcomings, questioning some previous findings about its potential (Guo, 2024; Mizumoto & Eguchi, 2023; Yang & Li, 2024; Zou et al., 2025). Together, these findings align with those voices that highlight the potential of AI as a complementary resource in language instruction, particularly when its implementation is guided and purposeful (Barrot, 2023; Lin & Crosthwaite 2024).
As a result of this preliminary study, the writing and feedback treatment was updated and is now being implemented in the classroom for the ongoing academic term with several modifications described in this section. Firstly, we noticed that direct feedback was still present in some cases of ChatGPT feedback, which both students and the instructor-researcher noticed. The new prompt was thus updated to include a summary of what students were asked to do in the task, for context. Moreover, the indirect WCF approach was further emphasized in the prompt with the following language: “Do NOT correct the mistakes yourself and please do NOT give me the correct form directly, but instead list the mistakes out for me and specify based on the types of errors [described above]”. In addition, verbal instructions were also added during the first writing task (WT), with a warning regarding the potential for ChatGPT to provide inaccurate/irrelevant feedback, to increase students’ monitoring/critical skills without making them too distrustful. A model was also added with an example from a past student's whole process of submitting each step of a WT. Finally, given the challenge of completing the error log as noticed by students’ quantitative rankings and teachers’ reflections, that piece was deleted from the templates/grade and replaced by a written discussion towards the end of the quarter, to look back at patterns and reflect in English about the entire writing/feedback process.
All in all, here are some proactive recommendations for pedagogical implementation of AI tools for L2 writing feedback at beginner levels:
• Choose the AI tool, its timing and format of implementation based on the specific learning goals of the corresponding writing tasks.
• Reflect about how said tasks will be assessed, clarifying to students your rationale, your expectations regarding (dis)honest academic uses of the tool, as well as a rubric detailing how they will be assessed.
• If possible, emphasize L2 writing as a multi-step process where mistakes are opportunities for learning, rather than as a one-off product.
• Design your AI prompt to be specific and clear. Try to provide it with the context of the writing task
• Test out your AI prompt for the desired tool several times, mimicking the conditions in which students would use it.
• Think through the logistics of implementing the tools, including timing, potential technical difficulties and learning curves.
• Considering the variability in the AI output, offer support to students, syncrhonously if possible, so that they do not feel alone throughout that process or feel it as a replacement of you as a teacher. In addition, try to encourage opportunities for them to reflect about said output and to use it critically.
• Keep in mind the ethical, environmental and privacy implications of the AI tool and offer alternatives that reduce them as much as possible.
Data from this preliminary study offer meaningful insights into learners’ preferences, levels of trust, and perceived usefulness of both human and AI-mediated feedback. In addition to quantitative trends that show a general positive reception of both teacher and AI-feedback, qualitative data overall suggests some level of polarization between participants in the two AI groups, with different balances between students who fully welcome the AI input, those seeing it as only a complement to needed teacher feedback, or others explicitly stating a preference for human WCF. Students in the Categorized AI Group generally recognized the practical advantages of AI feedback, particularly its speed, accessibility and, to a lesser extent, clarity. However, many participants in the Uncategorized AI Group expressed a clear preference for instructor feedback, citing greater trust in human input and the pedagogical value of more personalized, context-aware guidance. A similar pattern was observed regarding writing anxiety, with the Control and Categorized AI Groups showing higher reductions, while the Uncategorized AI Group decreasing as well, but trailing rather behind.
This initial exploration of integrating AI tools into the beginner-level Spanish curriculum, while still exploratory, reveals potential for further research and pedagogical innovation. The findings highlight the importance of a carefully designed prompt, as results suggest that greater structured feedback and error categorization lead to a more significant reduction in writing anxiety and higher perceived quality of feedback. Conversely, prompts with less control and leading to less clarity were associated with a smaller decrease in writing anxiety and with increased skepticism toward AI as a reliable source of feedback.
The study’s limitations relate to the small sample sizes, especially regarding discrepancies between the different sections and the added attrition in some open-ended questions, which prevents results from being generalized to other populations. In addition, the prompt engineering still led, despite seven rounds of updates, to variability and undesired direct corrections when students interacted with ChatGPT, therefore resulting in potential inconsistencies that speak to the uncertainty of studying “live” tools that cannot be truly controlled for. Therefore, future scaled research should collect student written samples to assess the accuracy and specifications of the actual feedback received by students, compare teacher vs AI feedback in future research. In addition, data that specifically measures AI-related trust would prove useful.
Ana Ruiz Alonso-Bartol: Conceptualization, Data curation, Formal analysis, Methodology, Project administration, Visualization, Writing – original draft, review & editing.
Erik Garabaya-Casado: Conceptualization, Formal analysis, Methodology, Visualization, Writing – original draft, review & editing.
Claudia Sánchez-Gutiérrez: Conceptualization, Methodology, Writing – original draft, review & editing.
ATHANASSOPOULOS, Stravos, MANOLI, Polyxeni, GOUVI, Maria, LAVIDAS, Konstantinos, & KOMIS, Vassilis (2023). The use of ChatGPT as a learning tool to improve foreign language writing in a multilingual and multicultural classroom. Advances in Mobile Learning Educational Research, 3(2), 818-824. https://doi.org/10.25082/AMLER.2023.02.009
BARROT, Jessie S. (2023). Using ChatGPT for second language writing: Pitfalls and potentials. Assessing Writing, 57. https://doi.org/10.1016/j.asw.2023.100745
BITCHENER, John (2021). Written Corrective Feedback. In H. Nassaji & E. Kartchava (Eds.), The Cambridge Handbook of Corrective Feedback in Second Language Learning and Teaching (pp. 207–225). Cambridge: Cambridge University Press. https://doi.org/10.1017/9781108589789.011
BITCHENER, John, & FERRIS, Dana R. (2012). Written corrective feedback in second language acquisition and writing. New York, NY: Routledge.
BONILLA LÓPEZ, Marisela, VAN STEENDAM, Espeelman, SPEELMAN, Dirk, & BUYSE, Kris (2018). The differential effects of comprehensive feedback forms in the second language writing class. Language Learning, 68(3), 813-850. https://doi.org/10.1111/lang.12295
BOWLES, Melissa A., & GASTAÑAGA, Kacie (2022). Heritage, second and third language learner processing of written corrective feedback: Evidence from think-alouds. Studies in Second Language Learning and Teaching, 12(4), 675-696. https://doi.org/10.14746/ssllt.2022.12.4.7
CHEN, Chi-Fen Emily, & CHENG, Wei-Yuan Eugene (2008). Beyond the Design of Automated Writing Evaluation: Pedagogical Practices and Perceived Learning Effectiveness in EFL Writing Classes. Language Learning & Technology, 12(2), 94-112. http://llt.msu.edu/vol12num2/chencheng/
CHENG, Y. S. (2004). A measure of second language writing anxiety: Scale development and preliminary validation. Journal of Second Language Writing, 13(4), 313-335. https://doi.org/10.1016/j.jslw.2004.07.001
CHENG, Xialong & LIU, Yan (2022). Student engagement with teacher written feedback: Insights from low-proficiency and high-proficiency L2 learners. System, 109, 102880 https://doi.org/10.1016/j.system.2022.102880
CHENG, Xialong & ZHANG, Jun Lawrence. (2021). Sustaining University English as a Foreign Language Learners’ Writing Performance through Provision of Comprehensive Written Corrective Feedback. Sustainability, 13, 8192. https://doi.org/10.3390/su13158192
CHONG, Sin Wang (2019). A systematic review of written corrective feedback research in ESL/EFL contexts. Language Education and Assessment, 2(2), 57-69. https://doi.org/10.29140/lea.v2n2.138
CRESWELL, John & CRESWELL, J. David (2017). Research Design: Qualitative, Quantitative, and Mixed Methods Approaches. Sage.
DEKEYSER, R. (2007). Skill acquisition theory. In B. VanPatten & J. Williams (Eds.), Theories in second language acquisition (pp. 97–113). Mahwah, NJ: Lawrence Erlbaum. https://www.scirp.org/reference/referencespapers?referenceid=410472
ELLIS, Rod (2009). Corrective feedback and teacher development. L2 Journal, 1(1), 3–18. https://doi.org/10.5070/l2.v1i1.9054
ESCALANTE, Juan, PACK, Austin, & BARRETT, Alex (2023). AI-generated feedback on writing: Insights into efficacy and ENL student preference. International Journal of Educational Technology in Higher Education, 20(1), 57.https://doi.org/10.1186/s41239-023-00425-2
EVANS, Norman W., HARTSHORN, K. James, MCCOLLUM, Robb M. & WOLFERSBERGER, Mark (2010). Contextualizing corrective feedback in second language writing pedagogy. Language teaching research, 14(4), 445-463. https://doi.org/10.1177/1362168810375367
FERRIS, Dana R. (2010). Second language writing research and written corrective feedback in SLA: Intersections and practical applications. Studies in Second Language Acquisition, 32(2), 181–201. https://doi.org/10.1017/S0272263109990490
FERRIS, Dana R. (2012). Written corrective feedback in second language acquisition and writing studies. Language Teaching, 45, 446–459.
FERRIS, Dana R. & ROBERTS, Barrie (2001). Error feedback in L2 writing classes: How explicit does it need to be? Journal of Second Language Writing, 10(3), 161-184. https://doi.org/10.1016/S1060-3743(01)00039-X
FERRIS, Dana R., & KURZER, Kendon (2019). Does Error Feedback Help L2 Writers?: Latest Evidence on the Efficacy of Written Corrective Feedback. In K. Hyland & F. Hyland (Eds.), Feedback in Second Language Writing: Contexts and Issues. Cambridge: Cambridge University Press, 106-124. https://doi.org/10.1017/9781108635547.008
FOKIDES, Emmanuel, & PERISTERAKI, Eirini (2025). Comparing ChatGPT's correction and feedback comments with that of educators in the context of primary students' short essays written in English and Greek. Education and Information Technologies, 30(2), 2577-2621. https://doi.org/10.1007/s10639-024-12912-8
FREAR, David, & CHIU, Yi-hui (2015). The effect of focused and unfocused indirect written corrective feedback on EFL learners’ accuracy in new pieces of writing. System, 53, 24-34.
GODWIN-JONES, Robert (2025). Technology integration for less commonly taught languages: AI and pedagogical translanguaging. Language Learning & Technology, 29(2), 11–34. https://hdl.handle.net/10125/73609
GUO, Xiashouang (2024). Facilitator or thinking inhibitor: understanding the role of ChatGPT-generated written corrective feedback in language learning. Interactive Learning Environments, 1–19. https://doi.org/10.1080/10494820.2024.2445177
HAN, Jining, & LI, Mimi (2024). Exploring ChatGPT-supported teacher feedback in the EFL context. System, 126, 103502. https://doi.org/10.1016/j.system.2024.103502
HARTSHORN, K. James, & EVANS, Norman W. (2012). The differential effects of comprehensive corrective feedback on L2 writing accuracy. Journal of Linguistics and Language Teaching, 3, 16–46. https://sites.google.com/site/linguisticsandlanguageteaching/home-1/volume-3-2012-issue-2/volume-3-2012-issue-2---article-hartshorn-evans
HARTSHORN, K. James, EVANS, Norman W., MERRILL, Paul F., SUDWEEKS, Richard R., STRONG-KRAUSE, Diane, & ANDERSON, Neil J. (2010). Effects of dynamic corrective feedback on ESL writing accuracy. TESOL Quarterly, 44, 84-108. https://doi.org/10.5054/tq.2010.213781
HYLAND, Ken, & HYLAND, Fiona (2006). Feedback on second language students' writing. Language teaching, 39(2), 83-101. https://doi.org/10.1017/S0261444806003399
ISEMONGER, Ian (2023). Generative Language Models in Education: Foreign Language Learning and the Teacher as Prompt Engineer. TEFL Praxis Journal, 2, 3–17. https://doi.org/10.5281/zenodo.10402411
JACOBSEN, Lucas Jasper, & WEBER, Kira Elena (2025). The Promises and Pitfalls of Large Language Models as Feedback Providers: A Study of Prompt Engineering and the Quality of AI-Driven Feedback. AI, 6(2), 35. https://doi.org/10.3390/ai6020035
KANG, EunYoung & HAN, Zhaohong (2015). The Efficacy of Written Corrective Feedback in Improving L2 Written Accuracy: A Meta-Analysis. The Modern Language Journal, 99(1), 118. https://doi.org/10.1111/modl.12189
KOHNKE, Lucas, MOORHOUSE, Benjamin Luke, & ZOU, Di (2023). ChatGPT for language teaching and learning. Relc Journal, 54(2), 537-550. https://doi.org/10.1177/00336882231162868
LEE, Yoo-Jean (2024). Can my writing be polished further? When ChatGPT meets human touch. ELT Journal, 78(4), 401-413. https://doi.org/10.1093/elt/ccae039
LEE, Icy (2020). Utility of focused/comprehensive written corrective feedback research for authentic L2 writing classrooms. Journal of Second Language Writing, 49, 1–7. https://doi.org/10.1016/j.jslw.2020.100734
LEE, Joseph J. & VAHABI, Farzaneh (2018). Second Language Teachers’ Written Response Practices: An In-House Inquiry and Response. Journal of Response to Writing, 4(1), 34-69. https://scholarsarchive.byu.edu/journalrw/vol4/iss1/3
LIN, Shiming, & CROSTHWAITE, Peter (2024). The grass is not always greener: Teacher vs. GPT-assisted written corrective feedback. System, 127. https://doi.org/10.1016/j.system.2024.103529
LIU, Pengfei, YUAN, Weizhe, FU, Jinlan, JIANG, Zhengbao, HAYASHI, Hiroaki, & NEUBIG, Graham (2023). Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM computing surveys, 55(9), 1-35. https://doi.org/10.1145/3560815
LO, Leo S. (2023). The CLEAR path: A framework for enhancing information literacy through prompt engineering. The Journal of Academic Librarianship, 49(4), 102720. https://doi.org/10.1016/j.acalib.2023.102720
MANCHÓN, Rosa M. & LEOW, Ronald P. (2020). Investigating the language learning potential of L2 writing: Methodological considerations for future research agendas. In R. M. Manchón (Ed.), Writing and language learning: Advancing research agendas (pp. 336-355). John Benjamins. https://doi.org/10.1075/lllt.56.14man
MAO, Shiman Shae & CROSTHWAITE, Peter (2019). Investigating written corrective feedback: (Mis)alignment of teachers’ beliefs and practice. Journal of Second Language Writing, 45, 46–60. https://doi.org/10.1016/j.jslw.2019.05.004
MAO, Zhicheng & LEE, Icy (2020). Feedback scope in written corrective feedback: Analysis of empirical research in L2 contexts. Assessing Writing, (45), 100469. https://doi.org/10.1016/j.asw.2020.100469
MAYRING, Philipp (2000). Qualitative Content Analysis. Forum: Qualitative Social Research, 1(2). https://doi.org/10.17169/fqs-1.2.1089
MIZUMOTO, Atsushi, & EGUCHI, Masaki (2023). Exploring the potential of using an AI language model for automated essay scoring. Research Methods in Applied Linguistics, 2(2), 100050. https://doi.org/10.1016/j.rmal.2023.100050
MUÑOZ-BASOLS, Javier & BAILINI, Sonia (2019). Análisis y corrección de errores. In J. Muñoz-Basols, E. Gironzetti & M. Lacorte (Eds.), The Routledge Handbook of Spanish Language Teaching: Metodologías, contextos y recursos para la enseñanza del español L2 (pp. 94-108), Routledge. https://doi.org/10.4324/9781315646169
NAAMATI-SCHNEIDER, Lior, & ALT, Dorit (2024). Beyond digital literacy: The era of AI-powered assistants and evolving user skills. Education and Information Technologies, 29(16), 21263-21293. https://doi.org/10.1007/s10639-024-12694-z
OPENAI. (2023). ChatGPT (3.5) [Large language model]. https://chat.openai.com/chat
PAPI, Mostafa, BONDARENKO, Anna, WAWIRE, Brenda, JIANG, Chen, & ZHOU, Shiyao (2020). Feedback-seeking behaviour in second language writing: Motivational mechanisms. Reading and Writing, 33, 485–505. https://doi.org/10.1007/s11145-019-09971-6
QUALTRICS XM (2023). Provo, UT, USA. https://www.qualtrics.com
SÁNCHEZ-GUTIÉRREZ, Claudia Helena, LLORENTE BRAVO, Marta, GUERRA, Kathleen, AGUINAGA ECHEVERRÍA, Silvia (2022). It Works in Theory and in Practice: A Practical Guide for Introducing TBLT In a Beginner Spanish Program. L2 Journal, 14(3), 1-18. https://doi.org/10.5070/L214354581
SATARIANO, Adam, & KANG, Cecilia. (2023). How nations are losing a global race to tackle A.I.’s harms. The New York Times. Retrieved from https://www.nytimes.com/2023/12/06/technology/ai-regulation-policies.html?searchResultPosition=3
STEISS, Jacob, TATE, Tamara, GRAHAM, Steve, CRUZ, Jazmin, HEBERT, Michael, WANG, Jiali, MOON, Youngsun, TSENG, Waverly, WARSCHAUER, Mark, & OLSON, Carol Booth (2024). Comparing the quality of human and ChatGPT feedback of students’ writing. Learning and Instruction, 91, 101894. https://doi.org/10.1016/j.learninstruc.2024.101894
STOKEL-WALKER, Chris (2022). AI bot ChatGPT writes smart essays—Should professors worry? Nature. https://doi.org/10.1038/d41586-022-04397-7
TSENG, Waverly, & WARSCHAUER, Mark (2023). AI-writing tools in education: If you can’t beat them, join them. Journal of China Computer-Assisted Language Learning, 3(2), 258-262. https://doi.org/10.1515/jccall-2023-0008
UNESCO (2024). Guía para el uso de la IA generativa en educación e investigación. https://unesdoc.unesco.org/ark:/48223/pf0000389227
VERA, Fernando (2023). Enhancing English language learning in undergraduate students using ChatGPT: A quasi-experimental study. Libro de Actas del Congreso Internacional de Aprendizaje Activo, 18–21. https://apolo.unab.edu.co/ws/portalfiles/portal/27240222/Libro-de-actas-CIAA-2023.pdf
YAN, Da (2023). Impact of ChatGPT on learners in a L2 writing practicum: An exploratory investigation. Education and Information Technologies, 28(11), 13943-13967. https://doi.org/10.1007/s10639-023-11742-4
YANG, Christine, & CHEN, Howard Hao-Jan (2025). ChatGPT and L2 Chinese writing: evaluating the impact of model version and prompt language on automated corrective feedback. Computer Assisted Language Learning, 1–29. https://doi.org/10.1080/09588221.2025.2453205
YANG, Lu, & LI, Rui (2024). ChatGPT for L2 learning: Current status and implications. System, 124, 103351. https://doi.org/10.1016/j.system.2024.103351
YANG, L. F., LIU, Y. & XU, Z. (2022). Examining the effects of self-regulated learning-based teacher feedback on English-as-a-foreign-language learners’ self-regulated writing strategies and writing performance Frontiers in Psychology, 13. https://doi.org/10.3389/fpsyg.2022.1027266
YOON, Su Yoon, MISZOGLAD, Eva, & Pierce, Lisa R. (2023). Evaluation of ChatGPT feedback on ELL writers' coherence and cohesion. Arxiv. https://doi.org/10.48550/arXiv.2310.06505
YU, Shulin, ZHENG, Yao, JIANG, Lianjiang, LIU, Chunghong, & XU, Yiqin (2021). “I even feel annoyed and angry”: Teacher emotional experiences in giving feedback on student writing. Assessing Writing, 48, 100528. https://doi.org/10.1016/j.asw.2021.100528
ZEEVY-SOLOVEY, O. (2024). Comparing peer, ChatGPT, and teacher corrective feedback in EFL writing: Students’ perceptions and preferences. Technology in Language Teaching & Learning, 6(3), 1482. https://doi.org/10.29140/tltl.v6n3.1482
ZOU, Shaoyan, GUO, Kai, WANG, Jun, & LIU, Yu (2025). Investigating students’ uptake of teacher- and ChatGPT-generated feedback in EFL writing: a comparison study. Computer Assisted Language Learning, 1–30. https://doi.org/10.1080/09588221.2024.2447279
**PLEASE MAKE YOUR OWN COPY TO YOUR INSTITUTIONAL GOOGLE DRIVE**
Writing task 2.A - First draft
Please paste here your first draft
Writing task 2.B - Get AI feedback:
Please paste here the entire interaction with ChatGPT where you got feedback on you first draft:
Writing task 2.C - Self-revision of draft:
Before reviewing your draft, log how many errors of each type were included in ChatGPT feedback, to be best of your abilities:
Writing Task 2 |
|
__Vocabulary errors__ |
|
__Verb conjugations errors __ |
|
__Other grammar errors (gender and number…) __ |
|
__Orthography errors (spelling, accents, punctuation…) __ |
|
__Connection errors: (word/connector missing, word order) __ |
|
__Needs large rephrasing errors__ |
|
__Unnecessary words__ |
Please re-write your self-edited revision here, marking any changes you make from your first draft WITH A DIFFERENT COLOR, BOLD OR UNDERLINE:
Writing Tasks Rubric (1)
You've already rated students with this rubric. Any major changes could affect their assessment results.
Criteria |
Ratings |
Pts |
Content Student addresses all required points in depth and provides enough relevant details. |
This area will be used by the assessor to leave comments related to this criterion. |
1 pts |
Grammar Student applies grammatical structures from the course. You will lose points if you don't use verb tenses and structures from the modules covered so far. You will also lose points if you use outside resources that result in highly advanced structures that are beyond this course's level and purposes. |
This area will be used by the assessor to leave comments related to this criterion. |
1.5 pts |
Vocabulary Student applies varied vocabulary from the course and uses it effectively. You will lose points if you don't use vocabulary and expressions from the modules covered so far. You will also lose points if you use outside resources that result in highly advanced expressions that are beyond this course's level and purposes. |
This area will be used by the assessor to leave comments related to this criterion. |
1.5 pts |
Comprehensibility & Cohesion Student makes an effort to create full sentences and to connect ideas clearly, comprehensibly and cohesively. |
This area will be used by the assessor to leave comments related to this criterion. |
1 pts |
Total Points: 5 |
||
Self-editing Writing Task Rubric (1)
You've already rated students with this rubric. Any major changes could affect their assessment results.
Criteria |
Ratings |
Pts |
Covered all mistakes |
This area will be used by the assessor to leave comments related to this criterion. |
2 pts |
Followed template & marked changes in a different color |
This area will be used by the assessor to leave comments related to this criterion. |
1 pts |
Applied feedback to truly improve self-edits |
This area will be used by the assessor to leave comments related to this criterion. |
2 pts |
Total Points: 5 |
||
I am learning Beginner Spanish, and I am going to write a few sentences I have created to practice my writing skills. I need you to provide detailed feedback on which mistakes affect comprehension. Give your feedback sentence by sentence and only for mistakes that relate to a beginner level, not optional improvements. Please refrain from correcting the mistakes directly but instead point them out and specify the types of errors based on the following categories. For each sentence, give me bullet points for the following types of mistakes: 1) vocabulary (meaning, word order, English words), 2) verb conjugations only for the present tense, 3) gender and number agreement, 4) spelling (orthography, punctuation and accents), 5) connection between ideas and words, 6) words that shouldn't be there and must be taken out. No need to point out what is correct and only mention a category if there are mistakes in it within that sentence.
Please read the following instruction step by step: I am learning Beginner Spanish, and I am going to write a few sentences I have created to practice my writing skills. I need you to provide feedback on which mistakes affect comprehension. Give your feedback sentence by sentence and only for mistakes that relate to a beginner level, not optional improvements. Please refrain from correcting the mistakes directly, but instead point them out and specify the types of errors based on the following categories. In bullet points for each sentence, please identify grammar, vocabulary (meaning), spelling and sentence-structure mistakes. Do not correct the mistakes yourself or give the correct form directly, but instead point the mistakes out and specify the types of errors based on the above-mentioned categories. No need to point out what is correct and only mention a category if there are mistakes in it within that sentence.
Please read the following instruction step by step: I am learning Beginner Spanish, I have created a paragraph to practice my writing skills. I need you to provide feedback on which mistakes affect comprehension. Please identify mistakes related to a beginner level, not optional improvements. For each sentence, give me bullet points for the following types of mistakes: 1) vocabulary (meaning, word order, English words), 2) verb conjugations only for the present tense, 3) gender and number agreement, 4) spelling (orthography, punctuation and accents), 5) connection between ideas and words, 6) Sentence structure issues, 7) words that shouldn't be there and must be taken out. Do NOT correct the mistakes yourself and please do NOT give me the correct form directly, but instead list the mistakes out for me and specify based on the types of errors described above. No need to point out what is correct and only mention a category if there are mistakes in it within that sentence. For context, here is the summarized prompt that I was responding to: [“INSERT YOUR PROMPT HERE”.
Control |
Categorized AI |
Uncategorized AI |
||||
Count2 |
% |
Count |
% |
Count |
% |
|
Homework in general |
1 |
5% |
3 |
11% |
4 |
27% |
Playposit |
2 |
10% |
4 |
14% |
2 |
13% |
iSpraak |
4 |
19% |
1 |
4% |
1 |
7% |
Quizzes |
1 |
5% |
3 |
11% |
2 |
13% |
Studying on my own |
1 |
5% |
2 |
7% |
0 |
0% |
Writing Tasks (AI not mentioned) |
1 |
5% |
1 |
4% |
2 |
13% |
AI specifically [NA Control group] |
0 |
0% |
0 |
0% |
0 |
0% |
Speaking in class / In class activities |
9 |
43% |
13 |
46% |
4 |
27% |
Professor |
1 |
5% |
1 |
4% |
0 |
0% |
Other |
1 |
5% |
0 |
0% |
0 |
0% |
TOTAL COUNTS (MOST HELPFUL ACTIVITIES) |
21 |
28 |
15 |
|||
_______________________________
1 There was some attrition in these open-ended questions that some participants left blank or wrote NA.
2 The number of responses in the categories section may exceed the number of participants, as they could mention several in this open-ended comment.