Note on FLN assessments
Updated: Dec 19, 2022

Introduction
The learning crisis has existed in India’s education system, however, its measurement via student assessments has become extremely pertinent against the backdrop of NEP 2020 implementation and COVID-19-related learning losses. Even within this, in particular, the focus has been the measurement and benchmarking of Foundational Literacy and Numeracy (FLN) skills among students. Recently, two notable national reports have been made available to the public i.e. National Achievement Survey (NAS 2021) and the Foundational Learning Study (FLS, 2022). Along with this, LFE has also developed an FLN tool in the context of MH and gathers data from 5 districts with respective district administrations. All of these 3 tools, along with ASER reports, provide a clear snapshot of the current status of students’ learning, albeit slightly differently.
In this piece, we bring to the fore the differences and similarities between these four tools, the insights they each throw up, and understand key considerations that need to guide future interventions on the measurement of student learning.
Comparing four tools
On the face of it, all these four exercises have measured FLN, but when looking a little closer we find there are fundamental differences. While NAS, FLS, and LFE’s student assessment tools are administered in school, only ASER collects data at the household level. ASER administers the same tool to all grades to test reading and arithmetic skills, whereas NAS, FLS, and LFE tools administer different tools to different grades to test grade-appropriate competencies. The most recent data from ASER is from 2018 for Maharashtra. Each of these 4 tools defines FLN similarly, with overlapping categories, but organizes components differently with different scoring patterns. As a result, their raw score cannot be directly compared. Their differences are explored in depth here:

Learning from the data
A direct comparison of the four raw datasets will be technically incorrect. As they each present differences in competency frameworks, data collection timelines, differences in scales as well as sampling strategies to name a few. This kind of comparison will require the use of more rigorous quantitative methods that can control for differences and establish reliability and validity in each case. (like done here)
However, for the lay administrator and ed sector enthusiast, we looked at the data to understand the pictures of their broad trends revealed in their own right. To arrive at a broader understanding of trends by reading the data at its face value
Trends in averages
For the purposes of this note, we take the latest available data for Maharashtra. Given the differences in their scope, we only focus on data for grades 3 and 5. The data from the four sources are presented visually here. When comparing student literacy and numeracy data, all four tools show different patterns for both grade 3 and grade 5 even within their averages.

What does this data tell us?
It is worth asking what each of these data sets reveals for any data-based intervention. While broad trends are useful in informing opinion and debate, any shift in student learning will require targeted interventions to support teachers or provide additional resources to schools or even understand the impacts that these programs have. These will necessarily require the use of granular data and detailed analysis. Such data can empower stakeholders at all levels from teachers and parents at the school level to district and state administrators.
In order to be useful in such a scenario, we find that each of these reports provides granular details at different levels
While NAS provides data by competencies for each grade, it provides only average scores and does not show a distribution of students but competencies. This shows an average status but does not show where specifically students and teachers will need support.
Meanwhile, ASER does show the percentage of students at a specific skill level, it tests only a limited set of competencies.
On the other hand, the NFS only tests grade 3 students, where detailed competencies are checked as well as % of students against each category is shown as well.
While the LFE data does provide granular detail, it misses out on benchmarking that helps decide performance levels
On the positive side, we find that ASER also collects a plethora of information at the school level, village level, and household level that provides a depth of insights. For instance, this includes enrollment, access to resources, engagement with materials, tuition enrollment, and since COVID, the status of school re-opening and access to online learning among others. The National Foundational Learning Study 2022 (NCERT) provides insights against international benchmarking and is the only study that has been conducted in 20 different languages. While the LFE assessment identifies competencies specific to the grade level and one level below - indicating student performance at grade level and one-grade level lower as well.

What does that mean as we move forward?
What becomes clear is that each of these tools shows different indicators and can have contradictory results when read together. Here, based on discussions with experts and literature reviews we present some aspects that need to be considered for evidenced-based decision-making:
No tool is comprehensive in coverage of grades. This can be attributed to the high cost of student assessment, where testing every grade with a truly representative sample can be very expensive and technically challenging in many respects.
Further, competencies tested are limited to a few FLN aspects and do not cover all student competencies identified in NIPUN-based needs. Multiple competencies will require regular testing at the classroom level itself. Others aligning with social emotions and the well-being needs of students are yet to see standardized testing tools at scale.
Furthermore, the reliability of NAS and ASER has been called into question with state-level data and also with its district-level data. For instance, in a study, it was found that NAS 2017 averages are likely to be artificially high and might contain little information about the state’s relative performance. This study also found that “ASER data is most reliable for comparing state averages but less reliable for looking at changes in state averages, district averages, or changes in district averages”. Any large-scale testing requires a long-term process to create reliable tools and is a continuous process. While the quality of the data can be questioned, these central exercises are necessary to provide a national repository that other organisations may not have the resources to conduct. Efforts have to be directed to ensure that data quality is maintained.
Micro-school and block contests can also result in changes in student results that can have significant implications for learning. For instance, from the exercise we conducted at LFE, we learned that students in tribal development schools struggled a lot more in language than students in other local body schools. Intuitively this appears to be the result of language barriers and challenges with the Medium of Instruction that tribal students face. While national databases are useful, in identifying broader trends, these granular students' needs can only be captured with comprehensive testing.
Last, depending on the objective appropriate choices have to be made on which tools and data sets can be used. While a single report is useful in establishing a baseline, only when viewed against a midline/ endline will the data be truly reflective of change and aid in future planning.