Fairness in Dynamic Co-orchestrated Collaborative Learning AI Systems

Andey Ng
7 min readOct 7, 2020



This series is a documentation for my work-flow and thought process — from findings from the user studies to understanding its impact on my own personal life — in designing teacher dashboards and prototypes for a pairing algorithm for students in an AI-driven collaborative tutoring platform. This research explores the fairness and biases in educational technologies, taking an interdisciplinary perspective of design, technology, social, and ethical lens.


Hi, My name is Andey Ng, and this summer, I was a research assistant at Carnegie Mellon University for the Co- Augmentation, Learning, & AI Lab advised by post-doc Vanessa Echeverria and Professor Vincent Aleven.

This summer, I designed user studies for the APTA (Adaptive Peer Tutoring Assistant) software. APTA is a hybrid control between students, teachers, and AI systems over transitions as well as for the adaptivity and/or the
adaptability for different classrooms, teachers, and students’ prior knowledge. I was working on the pairing system’s algorithms find the optimal balance of the hybrid model between the AI, teachers, and students.


The concept of Co-orchestration is co-developing a tool with the users (teachers/students) and the developers prevents the misconduct of a third party placing tools and ideas that ultimately influences others’ lives without their say, which stems from the accessibility ideology of “Nothing About Us Without Us”.

The dynamic aspect allows for the program to uniquely tailoring to each teacher’s needs and wants, allowing the user to choose the level of AI involvement in the pairing system.

These ideas of co-orchestration and dynamic systems illustrate the importance of understanding that each user and each person is different. Co-designing tools with the developers and the users creates satisfied individuals who will adhere and use these tools.

These concepts have helped me implement better solutions in my own personal life. I realized that co-designing house rules and expectations at home with my parents where we compromise on decisions created a much happier environment where I’d be more than happy to adhere to their expectations. Also, by understanding that my siblings and I are all different, I realized that creating dynamic flexible rules between ourselves and our parents would allow for the algorithm (or in this case, our household expectations), would appease everyone’s wants.

Work Flow

1. Take initial data from the pilot studies to derive conclusions

By developing affinity diagrams and clustering similarities together, I found that students were most eager to collaboratively engage in peer tutoring when the tutees (the student receiving help) had competent tutors. At first I was hesitant on designing this dashboard and pairing algorithm, as how can you measure competency?

Affinity diagram of initial pilot study

However, I learned that by compartmentalizing the pilot studies through affinity diagramming, I was able to quantify qualitative data to gauge the level of interests/weightage of different features.

Affinity Diagram of Pairing algorithm

Because most of the students in the pilot studies preferred to be paired with other students of similar competencies, I realized that if algorithms were developed in an unregulated manner, it would create a disparity with two groups: one with proficient students, and one with struggling students.

Relating the educational disparity in classrooms to my own life, I realized that many students like myself would find ourselves feeding ourselves the narrative that “ we are building technology and products that help people”.

Being born and raised in Silicon Valley, I realized that we, too, feed ourselves similar narratives that keeps us highly-motivated and mentally sane. However, throughout this research and my time living in Pittsburgh, I realized that many high-tech products help people, but not all groups of people.

Through my explorations in Pittsburgh, I saw the large socio-economic disparity between the affluent areas near CMU and the outskirts of the city. On one hand Pittsburgh is the post-industrial city with a large population blue collar workers left in the dust from the loss of the steel industry. On the other hand, there’s new affluent neighborhoods emerging from Google/Apple/Facebook’s headquarters in Pittsburgh which is stimulating the economy in other directions. The introduction of these companies is now gentrifying the city, pushing blue collared workers into the outskirts of the city. The technology is helping people, a specific group of people. More specifically, it’s creating disparity where technology is helping the rich get richer, and the poor get poorer.

Thus, I found the agency to direct my research into exploring the social implications of technology, and how to create fair and equitable opportunities for students to mutually benefit and succeed with classroom technologies to close the educational disparity.

Logic flows and mind maps to case through scenarios for the user study to address

AI systems are simply enumerating the decisions in our minds, identifying patterns our behaviors, and amplifying those decisions into biases. The issue with yes or no’s, 1’s and 0’s, and binary questions and answers, is that it creates definitive decisions that become exponentially extreme.The dynamic aspect of this project leaves room for nuance rather than creating two polarizing decisions.

These user studies were designed to understand the social implications of my algorithm design in a simple task such as pairing students together, taking an ethical, technical, and social perspective in my design process.

2. Prototyping lo-fi wireframes for the user study.

The user study was developed to vet the human biases in teachers when they decide what a “fair” pairing among students are, and to test how much teachers trust the AI system.

Manual Pairing System User Test
  1. The first user study illustrated above, was a single-blind test where I introduced the prototypes in the guise of trying to reduce cognitive load, testing what teachers deemed as a “fair pairing” of students without AI suggestions, thus trying to explore biases in collaborative learning.
Suggestive AI Pairing System User Test

2. The second user study illustrated above, was to test how teachers felt about the suggestive AI system where they still have control of who they are choosing, however, the suggestions. This study was intended to vet their subconscious biases, explore how teachers are influenced by AI, and understand how they feel about it.

Completely Autonomous AI Pairing User Test

3. The third user study illustrated above, was to test whether teachers felt that they could trust the AI system when the AI system had completed the pairing system without the teachers’ input. The teacher was allowed to modify the pairing after the AI had chosen the pairs. Additionally, the student names were anonymized to test how teachers felt if the pairings were completely objective and based on competency.

3. Conclusions and Solutions

After drawing conclusions from the user study, uncovering subconscious biases — from the layout of what information is laid out and where it lies on the screen — my participants and I co-designed how to:

  • Mitigate the opportunities for harmful bias against students
  • Develop fair pairings among students in collaborative learning
  • Implementing safeguard methods to prevent disparity in classrooms
Pairing based on Complimentary Skills

Firstly, the figure above illustrates a pairing method that the user study participants and I developed, where we are pairing the students based on individualized subset of complimentary skills. Student 1’s strengths would align with where Student 2’s weaknesses are, and vise versa; thus, creating a space where both students could mutually benefit. Additionally, we would view this pairing as peer tutors, rather imposing a power dynamic with the titles of tutors and tutees.

Pairing based on Alternating Student Proficiency

Secondly, the diagram above is the safety guard method we co-designed for a pairing system to assure the knowledge disparity among students stays within a consistent range. In a sorted list of students, ranging from most proficient to least proficient (ranked by objective skills such as adding, subtracting, etc.), this system would pair the strongest student of the upper half of the most proficient students with the strongest student of the lower half of the least proficient students. Subsequently, it would pair the weakest student of the pool of more proficient students with the weakest student of the group of least proficient students. This method serves as a safety net that the disparity in the classrooms do not increase.

Personal Takeaways

My key takeaways from this project include:

  • Identifying and understanding what educational technologies are
  • Exploring social implication of technology in a classroom setting
  • Understanding the importance of designing/prototyping user studies to identify loopholes in the bigger picture

Ultimately this research has imparted with me a new approach to my work moving forward, sparking a new academic interest that I will continue exploring — the intersection between education, technology, advocacy, exploring nascent fields of tech ethics in fairness accountability.