Tag: Productivity

  • We Built an AI Auto-Grader That Outperformed Humans. Here’s What It Means for the Future of College.

    We Built an AI Auto-Grader That Outperformed Humans. Here’s What It Means for the Future of College.

    As a Graduate Teaching Assistant at Arizona State University, my baseline job description was standard: teach labs, hold office hours, and grade. But if you put a team of software and AI engineers in a room with hundreds of repetitive worksheets, quizzes, and pre-labs, they’re going to do what they do best: build an automation engine.

    My team and I set out to build an end-to-end AI Auto-Grader System integrated directly with Canvas. By the time we were done, we hadn’t just saved ourselves hours of manual labor—we discovered that the AI actually graded better and more objectively than human graders ever could.

    But getting there required bypassing massive data privacy hurdles, re-engineering data pipelines, and confronting a glaring question about the fundamental value of a modern university degree.

    The Architecture: Canvas, OpenRouter, and Excalidraw

    The core premise was straightforward: pull student submissions from the Canvas LMS API, grade them using LLMs, and push the scores back to Canvas alongside detailed feedback.

    To maximize accuracy, we didn’t just throw raw prompts at an API. We built a structured grading schema engine. The system ingested the master answer key, examples of partially correct answers, and explicit rubrics on how to allocate partial credit. We primarily relied on OpenAI LLMs for inference, with a fallback routing mechanism to OpenRouter to handle rate limits and test alternative open-source models.

    To prevent the AI from being overly punitive, we also engineered a programmatic grading balancer. The function calculated the delta between the highest achieved mark and the maximum possible mark, automatically normalizing the curve across the cohort to ensure fair evaluation.

    Bypassing IT Hurdles Without Storing Data

    Our biggest bottleneck wasn’t the AI—it was compliance. The ASU IT software approval team enforces strict quality and privacy standards. The primary directive: The software absolutely could not store student information.

    Grading a student without holding state or keeping a database of their records forced us to build an ephemeral data pipeline. Initially, we had to resort to pulling raw PDFs from Canvas, running inference in memory, pushing the grades, and immediately wiping the data context.

    To bypass the brittle nature of OCR on random student PDFs, we built a custom frontend. Students entered their answers directly into structured fields, which supported digital drawings via an integrated Excalidraw canvas. When a student hit submit, this data was cleanly embedded into a standardized PDF format and auto-pushed to Canvas, giving the LLM a pristine, structured document to evaluate in real-time.

    The Discovery: Why AI Graded Better Than Us

    When we compared the AI’s performance against human grading, the results surprised our professors. The AI was objectively superior in three core areas:

    • Absolute Objectivity: Human graders are prone to fatigue, cognitive load, and accidental bias. An essay graded at 11:00 PM after a long day looks different than one graded at 9:00 AM. The AI evaluated the last paper with the exact same baseline logic as the first.
    • Hyper-Detailed Feedback: The bottleneck for human TAs is time. We can only write so many paragraphs of explanation per student. The AI, however, provided massive, highly nuanced, and descriptive feedback on why a mark was deducted and how to fix it.
    • An Actionable Feedback Loop: Because the comments were so detailed, students actually used them to improve on subsequent labs. It turned grading from a punitive metric into a genuine learning tool.

    The Existential Question: If the University is AI, Why Pay for the University?

    The success of this project was supported by ASU, and the faculty loved the efficiency. But as engineers building this reality, it forced us to look at the horizon.

    If an AI auto-grader can evaluate technical work more accurately and provide better mentorship via feedback than a human expert constrained by time, the role of the traditional educator changes fundamentally. Teachers and graders will either become prompt architects and supervisors, or find themselves increasingly obsolete in the administrative loop.

    This shifts the existential crisis down to the consumer—the student.

    If the primary value of higher education has historically been access to expert evaluation, structured feedback, and curriculum delivery, what happens when that entire stack can be run locally or via a cheap API? If a student can deploy an open-source agentic pipeline to guide them through a textbook, test them, grade them objectively, and explain their mistakes for pennies, why pay tens of thousands of dollars for a university degree?

    We built a tool to solve a logistics problem in a university lab. In doing so, we might have just caught a glimpse of how the traditional university model unbundles itself from the inside out.

  • Enhancing Attention

    Over the years, I’ve tried several things to improve my attention and focus, all to achieve higher productivity and output.

    Here I list down several of those simple experiments and the results that I’ve collected.

    Listening to Podcasts you don’t understand:

    You can achieve higher productivity by cutting out all social media, but it was often too restraining to maintain, long term. It was much too easy for me to go back to binge watching Youtube, Anime, or Doom-Scrolling on Instagram. In an attempt to fix this, I tried listening to Ukrainian podcasts.. I don’t speak Ukrainian though. I understood 0% of these podcasts.

    But they did keep me at my workstation. I found it slightly hard to just walk away from my workstation while the podcast was going on. I could always Pause the podcast and walk away, but it felt wrong to leave something unfinished. I’ve always tried to finish the Youtube videos I’d started, even if I had to watch it 2x or 3x the speed. Perhaps that applies to Ukrainian podcasts as well.

    Colorful moving images are my Achilles’ heal. Pathetic!

    I learned that I will watch just about anything through two interactions.

    1. My ‘smol’ cousins brothers were watching cartoons on the TV, there were talking cars and firetrucks. When my uncle turned the TV off, I found myself being almost as irritated as the kids.
    2. I once before, switched my Youtube language settings to German. I didn’t really understand German, and I was also looking to learn. Maybe I’ll find it boring and stop watching Youtube I thought. Bro I was watching those German videos as though they were English. Understood almost 0% of the words, but enjoyed watching all the same.

    Watching Videos in Black and White:

    This kinda works, every video seem just a little more dull, more boring. Though the information was still there, so it impeded no learning. I don’t know why I stopped. I’m going back to this. Your friends will complain frequently though.

    MethyPhenidate:

    Methylphenidate hydrochloride is an ADHD medication prescribed to improve attention and help you stay motivated for longer. This definitely works, but you do run the risk of working on things for too long.

    If you hate your job, that’s your brain telling you that something isn’t working for you, with methylphenidate though, you can power through. You probably shouldn’t, but you can, and you likely will, and that’s not good.

    You can only get this with a doctor’s prescription.

    Another question I’ve had about Methylphenidate has been, is it actually improving focus, or is it just keeping you awake? You’ll have trouble going to sleep while the drug is in effect. Working on something boring makes you sleepy, ADHD meds can help with that. F*ck sleep, who needs it!

    But if it’s just drowsiness, there are other solutions.

    Reduced-Carb Diet:

    I’m pretty sure carbs are what make me drowsy. I can really tell since my present diet is almost devoid of carbs. Whenever I consume a significant amount of carbs, I immediately become drowsy and fall asleep.

    Since I pay attention to this point I can also observe that those around me also face this issue.

    I frequently see my friends working, then eating carbs, slacking off and then taking naps. It doesn’t always go in that order, but it’s evident enough to be observed, especially among students, since no one’s paying for their time.

    I do worry about the long term effects of such a diet though.

    Coffee:

    I’ve sworn by coffee several times, but I don’t really understand if there is a true net positive effect from coffee on your productivity. I couldn’t consistently observe it, is all. There is plenty of research suggesting coffee is excellent for productivity however.

    Good Sleep:

    While I wasn’t able to collect decent data in support of getting good sleep, I did read some splendid research suggesting getting decent sleep can be on par with using cognition enhancing nootropics.

    Paper is linked here: https://onlinelibrary.wiley.com/doi/full/10.1111/j.1365-2869.2005.00468.x

    Here is the take-away from the paper:

    All three tested stimulants have associated costs, particularly concerning side effects. However, caffeine is proposed as the reasonable ‘first line of defense’ due to its safety, proven effectiveness (enhanced by infrequent use), low cost, and wide availability. Modafinil should be the second option if caffeine is insufficient, as it is effective and has a good side-effect profile, though its scheduled status and cost limit initial use. Dextroamphetamine is strictly reserved as the third-line defense, only for acute use when caffeine or modafinil are expected to fail.

    However the amount of caffeine in question is 600mg. This is far higher than even a gym-bro would consume.

    Nootropics:

    Speaking of nootropics, some research indicates that they work best when they are used to Restore cognitive prowess instead of enhancing it. This could explain why some profess of their efficacy while others seem to think they’re a sham.