Category: Documenting

  • We Built an AI Auto-Grader That Outperformed Humans. Here’s What It Means for the Future of College.

    We Built an AI Auto-Grader That Outperformed Humans. Here’s What It Means for the Future of College.

    As a Graduate Teaching Assistant at Arizona State University, my baseline job description was standard: teach labs, hold office hours, and grade. But if you put a team of software and AI engineers in a room with hundreds of repetitive worksheets, quizzes, and pre-labs, they’re going to do what they do best: build an automation engine.

    My team and I set out to build an end-to-end AI Auto-Grader System integrated directly with Canvas. By the time we were done, we hadn’t just saved ourselves hours of manual labor—we discovered that the AI actually graded better and more objectively than human graders ever could.

    But getting there required bypassing massive data privacy hurdles, re-engineering data pipelines, and confronting a glaring question about the fundamental value of a modern university degree.

    The Architecture: Canvas, OpenRouter, and Excalidraw

    The core premise was straightforward: pull student submissions from the Canvas LMS API, grade them using LLMs, and push the scores back to Canvas alongside detailed feedback.

    To maximize accuracy, we didn’t just throw raw prompts at an API. We built a structured grading schema engine. The system ingested the master answer key, examples of partially correct answers, and explicit rubrics on how to allocate partial credit. We primarily relied on OpenAI LLMs for inference, with a fallback routing mechanism to OpenRouter to handle rate limits and test alternative open-source models.

    To prevent the AI from being overly punitive, we also engineered a programmatic grading balancer. The function calculated the delta between the highest achieved mark and the maximum possible mark, automatically normalizing the curve across the cohort to ensure fair evaluation.

    Bypassing IT Hurdles Without Storing Data

    Our biggest bottleneck wasn’t the AI—it was compliance. The ASU IT software approval team enforces strict quality and privacy standards. The primary directive: The software absolutely could not store student information.

    Grading a student without holding state or keeping a database of their records forced us to build an ephemeral data pipeline. Initially, we had to resort to pulling raw PDFs from Canvas, running inference in memory, pushing the grades, and immediately wiping the data context.

    To bypass the brittle nature of OCR on random student PDFs, we built a custom frontend. Students entered their answers directly into structured fields, which supported digital drawings via an integrated Excalidraw canvas. When a student hit submit, this data was cleanly embedded into a standardized PDF format and auto-pushed to Canvas, giving the LLM a pristine, structured document to evaluate in real-time.

    The Discovery: Why AI Graded Better Than Us

    When we compared the AI’s performance against human grading, the results surprised our professors. The AI was objectively superior in three core areas:

    • Absolute Objectivity: Human graders are prone to fatigue, cognitive load, and accidental bias. An essay graded at 11:00 PM after a long day looks different than one graded at 9:00 AM. The AI evaluated the last paper with the exact same baseline logic as the first.
    • Hyper-Detailed Feedback: The bottleneck for human TAs is time. We can only write so many paragraphs of explanation per student. The AI, however, provided massive, highly nuanced, and descriptive feedback on why a mark was deducted and how to fix it.
    • An Actionable Feedback Loop: Because the comments were so detailed, students actually used them to improve on subsequent labs. It turned grading from a punitive metric into a genuine learning tool.

    The Existential Question: If the University is AI, Why Pay for the University?

    The success of this project was supported by ASU, and the faculty loved the efficiency. But as engineers building this reality, it forced us to look at the horizon.

    If an AI auto-grader can evaluate technical work more accurately and provide better mentorship via feedback than a human expert constrained by time, the role of the traditional educator changes fundamentally. Teachers and graders will either become prompt architects and supervisors, or find themselves increasingly obsolete in the administrative loop.

    This shifts the existential crisis down to the consumer—the student.

    If the primary value of higher education has historically been access to expert evaluation, structured feedback, and curriculum delivery, what happens when that entire stack can be run locally or via a cheap API? If a student can deploy an open-source agentic pipeline to guide them through a textbook, test them, grade them objectively, and explain their mistakes for pennies, why pay tens of thousands of dollars for a university degree?

    We built a tool to solve a logistics problem in a university lab. In doing so, we might have just caught a glimpse of how the traditional university model unbundles itself from the inside out.

  • Side-Track Is Live on the App Store

    I’ve been working on this app, part of a larger product, on and off for a few months. It wasn’t a straight sprint. Progress came in bursts between other responsibilities, moments of motivation followed by stretches where life simply got in the way. Still, slowly, it started to resemble something real.

    Right before Christmas, I finally felt it was ready enough to submit for App Store review. Hitting that submit button felt like crossing a small but meaningful threshold. Whatever happened next, at least the app had reached someone else’s hands.

    By the new year, I had a response.

    Rejected.

    The reason itself was frustrating in a very particular way. The app had been reviewed on a platform it wasn’t designed to support. Side-Track was built for iPhone. I had explicitly removed support for iPad and macOS. Yet the review feedback indicated it had been tested on iPad, where it understandably did not work.

    I replied, explained the situation, and asked for the app to be reviewed on the intended platform. And then I waited.

    Waiting is where things tend to unravel a bit. With no response, doubt started creeping in. Maybe I had missed something. Maybe there really was a bug I hadn’t caught. This was my first iOS app, after all, and it didn’t feel unreasonable to assume the mistake was mine.

    I was tired, juggling other work, and slowly made peace with the idea that this wasn’t shipping anytime soon. I braced myself for another rejection email and mentally pushed the app down my list of immediate priorities.

    Then today, I got an email I honestly did not see coming.

    “Congratulations! We’re pleased to let you know that your app, Side-Track, has been approved for distribution.”

    It took a moment to register.

    Relief came first. That quiet exhale you don’t realize you’re holding. I went back to what I was doing, trying not to make a big deal out of it. People ship apps every day. This wasn’t some monumental achievement.

    But a few minutes later, I stood up and realized I felt lightheaded.

    That’s when it clicked. I was genuinely happy. Elated, even. That slow, delayed payoff after weeks of uncertainty hit harder than I expected. Delayed gratification, it turns out, is pretty powerful.

    This isn’t a finish line. If anything, it feels like the very first marker on a long road. Maybe one percent in. There’s still a lot of work left to do, and many things I want to improve, rethink, or build from scratch. But this small moment of progress made something clear.

    If making progress feels this good, then maybe it’s worth sticking with it.

    Side-Track is now live on the App Store everywhere.

    If you give it a try, I’d really appreciate your thoughts and constructive feedback. There’s still plenty to build, and your input will help shape what comes next.

  • Enhancing Attention

    Over the years, I’ve tried several things to improve my attention and focus, all to achieve higher productivity and output.

    Here I list down several of those simple experiments and the results that I’ve collected.

    Listening to Podcasts you don’t understand:

    You can achieve higher productivity by cutting out all social media, but it was often too restraining to maintain, long term. It was much too easy for me to go back to binge watching Youtube, Anime, or Doom-Scrolling on Instagram. In an attempt to fix this, I tried listening to Ukrainian podcasts.. I don’t speak Ukrainian though. I understood 0% of these podcasts.

    But they did keep me at my workstation. I found it slightly hard to just walk away from my workstation while the podcast was going on. I could always Pause the podcast and walk away, but it felt wrong to leave something unfinished. I’ve always tried to finish the Youtube videos I’d started, even if I had to watch it 2x or 3x the speed. Perhaps that applies to Ukrainian podcasts as well.

    Colorful moving images are my Achilles’ heal. Pathetic!

    I learned that I will watch just about anything through two interactions.

    1. My ‘smol’ cousins brothers were watching cartoons on the TV, there were talking cars and firetrucks. When my uncle turned the TV off, I found myself being almost as irritated as the kids.
    2. I once before, switched my Youtube language settings to German. I didn’t really understand German, and I was also looking to learn. Maybe I’ll find it boring and stop watching Youtube I thought. Bro I was watching those German videos as though they were English. Understood almost 0% of the words, but enjoyed watching all the same.

    Watching Videos in Black and White:

    This kinda works, every video seem just a little more dull, more boring. Though the information was still there, so it impeded no learning. I don’t know why I stopped. I’m going back to this. Your friends will complain frequently though.

    MethyPhenidate:

    Methylphenidate hydrochloride is an ADHD medication prescribed to improve attention and help you stay motivated for longer. This definitely works, but you do run the risk of working on things for too long.

    If you hate your job, that’s your brain telling you that something isn’t working for you, with methylphenidate though, you can power through. You probably shouldn’t, but you can, and you likely will, and that’s not good.

    You can only get this with a doctor’s prescription.

    Another question I’ve had about Methylphenidate has been, is it actually improving focus, or is it just keeping you awake? You’ll have trouble going to sleep while the drug is in effect. Working on something boring makes you sleepy, ADHD meds can help with that. F*ck sleep, who needs it!

    But if it’s just drowsiness, there are other solutions.

    Reduced-Carb Diet:

    I’m pretty sure carbs are what make me drowsy. I can really tell since my present diet is almost devoid of carbs. Whenever I consume a significant amount of carbs, I immediately become drowsy and fall asleep.

    Since I pay attention to this point I can also observe that those around me also face this issue.

    I frequently see my friends working, then eating carbs, slacking off and then taking naps. It doesn’t always go in that order, but it’s evident enough to be observed, especially among students, since no one’s paying for their time.

    I do worry about the long term effects of such a diet though.

    Coffee:

    I’ve sworn by coffee several times, but I don’t really understand if there is a true net positive effect from coffee on your productivity. I couldn’t consistently observe it, is all. There is plenty of research suggesting coffee is excellent for productivity however.

    Good Sleep:

    While I wasn’t able to collect decent data in support of getting good sleep, I did read some splendid research suggesting getting decent sleep can be on par with using cognition enhancing nootropics.

    Paper is linked here: https://onlinelibrary.wiley.com/doi/full/10.1111/j.1365-2869.2005.00468.x

    Here is the take-away from the paper:

    All three tested stimulants have associated costs, particularly concerning side effects. However, caffeine is proposed as the reasonable ‘first line of defense’ due to its safety, proven effectiveness (enhanced by infrequent use), low cost, and wide availability. Modafinil should be the second option if caffeine is insufficient, as it is effective and has a good side-effect profile, though its scheduled status and cost limit initial use. Dextroamphetamine is strictly reserved as the third-line defense, only for acute use when caffeine or modafinil are expected to fail.

    However the amount of caffeine in question is 600mg. This is far higher than even a gym-bro would consume.

    Nootropics:

    Speaking of nootropics, some research indicates that they work best when they are used to Restore cognitive prowess instead of enhancing it. This could explain why some profess of their efficacy while others seem to think they’re a sham.

  • Hackathons might be Dying

    Hackathons might be Dying

    I’ve attended few hackathons, often through ASU, and they’ve painted a disappointing picture of what hackathons are or have become.

    Online, especially in memes, hackathons are often portrayed as high-energy events full of incredibly skilled, competitive developers building impressive prototypes in record time. In reality, many of the ones I’ve attended were filled with students still very early in their learning journeys, several struggling with basic remote deployment or project setup.

    Most recently, I attended Sunhacks, one of ASU’s larger hackathons. While I appreciate the effort that went into organizing it, I left unsure of what the event was really trying to achieve.

    The strong presence of sponsoring companies, Google, Amazon, Base44, and others, seemed to steer the event toward lame AI-related projects. I don’t think this was intentional; it’s just what happens when the showcased tools and challenges revolve around LLM APIs. As a result, many teams, including mine, ended up producing AI-driven web apps that all felt somewhat similar. Very few projects stood out as novel or experimental, and even the more creative ones didn’t seem to receive much recognition.

    The judging process also suffered from scaling issues. There were too few judges for the number of teams, which likely led to uneven evaluations. Early teams had a better chance of being seen thoroughly, while others may have been skipped or reviewed hastily. This kind of fatigue bias is well known, and should be easy to plan around, but somehow, the organizers missed it completely.

    That said, there were positives. The event offered great opportunities to socialize and meet new people, and I got to see several neat ideas and clever implementations from other teams, even if none of them ended up winning.

    Still, it’s hard not to notice the broader trend. With the economy tightening and companies hiring fewer students, there’s a growing sense of disengagement at these events. Many company representatives seemed to be there merely to maintain a presence, devoid of any real enthusiasm.

    It maybe suggests a larger trend, a waning trust in the economy at large, where both companies and students are becoming more cautious, more restrained, and less optimistic about the near-term and possibly long-term as well.

    If you’ve had a different experience, I’d love to hear your thoughts.

  • Feed My Starving Children: How Good Design and Engineering makes Goodwill Scalable

    Feed My Starving Children: How Good Design and Engineering makes Goodwill Scalable

    I had an opportunity to volunteer at Feed My Starving Children (FMSC), and I came away amazed not just by their achievements, but by how well they’ve built everything around their mission.

    Most nonprofits struggle to balance compassion with coordination, but FMSC has somehow mastered both. They’ve built a machine that blends technology, logistics, marketing, branding, capital, man-power, the spirit of competition and goodwill into something that feels more like a community celebration than charity work.

    Turning Labor Into Leverage

    Running any operation, even nonprofits, costs money. Labor, especially in developed countries, is expensive. But FMSC has flipped that challenge on its head. They use volunteers, and the volunteers are also part of their marketing.

    Almost everyone in their packing facilities is a volunteer, from the people sealing bags to those stacking boxes on pallets. And yet, it doesn’t feel like “work.” They’ve made volunteering fun by gamifying the whole process.

    The Joy of Packing Meals

    When I joined, our group had six packing stations competing to see who could pack the most meals. Every few minutes, someone would shout out, announcing that one more box had been packed, everyone would cheer, and we’d push to beat the other tables. In just two hours, our group packed over 44,000 meals, enough to feed thousands of children. This would not be possible without FMSC’s fantastic planning and execution.

    There’s music, energy, laughter, and a sense of friendly rivalry that makes time fly. It’s smart design: people want to help, but they also want to feel like they’re part of something exciting and effective. FMSC gives them exactly that.

    The Tech Behind the Impact

    What impressed me most, though, was the technology behind the experience. Everything runs smoothly because of their website and digital systems.

    You sign up online, pick a location and time slot, get email reminders leading up to your shift, and even receive a confirmation after you check in at the facility. The website isn’t just functional, it’s strategically built to eliminate friction at every step.

    It also handles:

    • Volunteer scheduling and time slot management
    • Group coordination (for schools, churches, or companies)
    • Donations and meal sponsorships
    • E-commerce for artisan goods made by communities in places like Haiti
    • Impact tracking and progress updates

    The entire volunteer experience — from sign-up to packing — feels like it’s been engineered for engagement. And that’s where FMSC really stands out. They’ve invested in systems that scale generosity.

    Trust and Transparency

    People give their time and money when they can trust it’ll make a difference. FMSC reinforces that trust beautifully. Every session begins and ends with real numbers, how many meals you packed, how many kids that feeds, and where it’s going.

    They also partner with schools, churches, and local organizations in Africa, Haiti, and other countries to ensure the food gets where it’s needed most. You can see the results right there on their website, stories of children who are healthier, growing, and even able to go to school because of these meals.

    Engineering Hope

    Their signature product, MannaPack Rice, is another brilliant example of practical engineering. It’s a carefully formulated blend that provides all the basic nutrition a child needs, easy to store, easy to ship, and hard to spoil.

    It’s so efficient that I found myself wondering if I could buy some for myself. It’s like a “universal meal,” designed purely for function.

    A Place for Everyone

    What struck me most during my visit was the mix of people. Seniors, kids, entire families, all working together. Some came with churches or schools, others with coworkers. For a few hours, everyone is focused on a shared mission.

    And because the process is so streamlined, both in person and online, the barrier to entry is almost nonexistent. You just sign up, show up, and make a real impact. That’s what good technology should do.

    Why FMSC Works

    Feed My Starving Children doesn’t just rely on compassion. It designs for it. From the way they automate volunteer scheduling to how they communicate results, every part of the experience is intentional.

    It’s not just a nonprofit, it’s a tech-enabled movement built around human connection. And it works.

    Volunteering at FMSC has been quite a learning experience for me. And I hope other firms can also replicate how FMSC is running the operation.

    I’ll definitely be going back. And I’ll be telling everyone I know to try it, not just because it feels good to help, but because it’s inspiring to see how smart design and engineering can turn goodwill into global change.