Abstract
<jats:p>The study aims to develop and methodologically justify a comprehensive psychometric-didactic model for the validation and systematic integration of micro-assessment based on Large Language Models into the process of preparing international students for the Ukrainian Language Proficiency Test. This research specifically addresses the "psychometric gap" between traditional linear testing environments and the rapid, large-scale content generation capabilities of modern generative artificial intelligence. The research employs an integrated "DBR-ABV Loop" model, which synthesizes Design-Based Research for iterative task improvement and Argument-Based Validation for continuous evidence collection regarding assessment reliability. The methodological framework follows a rigorous four-stage cycle: Domain Definition through prompt design based on B1/B2 descriptors, Production of micro-tasks and distractors, empirical Testing with student interaction logs, and Reflection to update design principles. The implementation of the proposed model demonstrates that LLM-generated tasks, when supported by adaptive feedback and dynamic distractor generation based on interference errors, significantly enhance diagnostic accuracy and reduce random guessing. The study reveals that the system's ability to adjust linguistic complexity in real-time ensures an optimal level of cognitive load for each learner. Instant, explanatory feedback operates within the student’s Zone of Proximal Development, providing necessary scaffolding that fosters linguistic autonomy and reduces exam-related anxiety. The DBR-ABV Loop effectively bridges the "psychometric gap" between high-speed AI content generation and the requirement for scientific validity in language testing. The transition from static testing to adaptive micro-assessment transforms ULPT preparation from a stressful control mechanism into a supportive, personalized learning process. This model provides a solid foundation for personalized educational trajectories and creates new perspectives for scaling the system to assess productive speech skills such as writing and speaking.</jats:p>