← Back to All Tutors

Tutor Profile

James Okafor

BS Computer Science from Purdue University. Specializes in x86-64 assembly and ARM64 assembly.

James Okafor profile card with credential BS Computer Science and 620+ assignments delivered
620+ Assignments Delivered Across CSHH and prior tutoring
7 Years Tutoring Since first paid teaching role
3 Languages Covered C, C++, Assembly
8 Documented Specialties Each with a diagnostic playbook

About the Tutor

About James

James finished his CS bachelors at Purdue, then spent five years writing low-level performance code for a kernel-security shop where the day job was reading Intel and AMD optimization manuals end to end, writing assembly probes that measured pipeline behavior at single-cycle resolution, and reproducing the silicon errata vendors themselves had documented. None of that lands on a typical undergraduate transcript. All of it lands on the kind of architecture assignments students get stuck on. Seven years into tutoring and 620+ CSHH assignments later, he still teaches the same way: from the instruction set up, not from the high-level language down.

His tutoring is heavy on x86-64 and ARM64 assembly because those are the two architectures students actually encounter. He covers RISC-V for courses that teach it. The recurring student frustration is calling-convention bugs: a function that compiles and links but corrupts the stack, returns garbage, or segfaults the moment another function is called. The bug is almost always a register-preservation violation. The student wrote a function that clobbers rbx or r12 (callee-saved under System V AMD64) without pushing them first, the caller relied on those registers still holding their values, and the corruption surfaces three calls later. James traces these by reading the disassembly with the ABI doc open. Students learn to do the same.

A representative case from last semester. A systems-course student submitted a hand-written assembly implementation of memcpy that passed the functional tests but mysteriously broke the test harness on the second invocation. The function preserved the right callee-saved registers, returned the right value, and handled alignment correctly. The bug was that the student had used the red zone (the 128 bytes below rsp under System V AMD64) for a temporary buffer, which is legal in a leaf function but illegal when the function calls anything else, because signal handlers can scribble on it. James found the bug by stepping through with gdb and noticing rsp had not been adjusted before a downstream call. The fix was three instructions to allocate proper stack space. The student went from a failing grade to full credit on the lab and understood why the red zone existed for the first time.

On the architecture side his teaching priority is the memory hierarchy. Most students learn that L1 is fast and DRAM is slow without ever measuring the gap. James walks them through a cycle-accurate model: 4 cycles for L1 hit, 12 for L2, 40 for L3, 200 for DRAM on a modern Intel chip. Then they implement matrix multiplication two ways: a naive triple loop that misses cache constantly and a blocked version that respects the L1 working set. The measured speedup is usually 4 to 10x on the same algorithm. Students who experience this once stop writing code that ignores the cache.

His CSHH workflow is methodical. Brief arrives, he reads the ISA spec section relevant to the assignment first, drafts the assembly by hand on paper, then types it in and runs it under gdb with set disassembly-flavor intel for x86 or set disassembly-flavor att depending on what the course uses. Every instruction in the delivered solution has a comment explaining what it does at the architectural level: which register it reads, which it writes, which flags it sets, why it appears at that point in the code. A "good" student question for James is one where the student can show the disassembly of their compiled code and point at the specific instruction sequence they do not understand. With that, the lesson starts at the actual confusion instead of three layers above it. The pset gets done. The mental model also gets built, and that one carries forward to the next assignment.

Documented Specialties

What James Specializes In

x86-64 assembly

System V AMD64 calling convention, SSE/AVX vector instructions.

ARM64 assembly

AArch64 calling convention, NEON intrinsics.

RISC-V assembly

RV32I, RV64I, M and F extensions.

GDB disassembly workflow

Layout asm, stepi, info registers, x/Nxw.

Memory hierarchy analysis

L1/L2/L3/DRAM, TLB, cache blocking.

Pipeline hazards (data, control, structural) and microarchitectural state

James handles pipeline hazards (data, control, structural) and microarchitectural state as a recurring CSHH workload, with documented patterns and reference solutions.

Sample Reviewed Code

Code James Has Reviewed

A representative snippet from James's workflow. Pulled from the diagnostic playbook James runs on incoming CSHH assignments in this language.

x86-64 Assembly callee_saved.s

          
          # System V AMD64 callee-saved set: rbx, rbp, r12-r15, rsp.
        
          
          # Push every callee-saved register you touch. Pop in reverse order.
        
          
          # CMU 15-213 attacklab + bomblab top deduction: missing this prologue.
        
          
           
        
          
          .globl my_function
        
          
          my_function:
        
          
              pushq   %rbp                 # save caller's frame pointer
        
          
              movq    %rsp, %rbp           # establish our own
        
          
              pushq   %rbx                 # we will use rbx below
        
          
              pushq   %r12                 # and r12
        
          
              subq    $32, %rsp            # 32B local frame (NOT the red zone -
        
          
                                           # we call printf below, red zone is invalid)
        
          
           
        
          
              # ... function body uses rbx, r12 freely ...
        
          
              call    printf@PLT
        
          
           
        
          
              addq    $32, %rsp            # tear down local frame
        
          
              popq    %r12                 # restore in reverse push order
        
          
              popq    %rbx
        
          
              popq    %rbp
        
          
              ret
        

Coverage Map

Subjects and Languages James Covers

Course Matches

Courses James Specializes In

6.006 Massachusetts Institute of Technology

MIT 6.006: Introduction to Algorithms

MIT 6.006 introduces algorithms across 13 weeks with 26 lectures, 13 recitations, and 7 problem sets. The Spring 2020 redesign by Erik Demaine, Jason Ku, and Justin Solomon...

8 recurring assignments covered

Get help with 6.006

FAQ

Frequently Asked Questions

Why does my hand-written assembly function corrupt the stack on return?
Almost always a callee-saved register violation. Under System V AMD64 (Linux, macOS) the callee must preserve rbx, rbp, r12-r15, plus rsp. If your function writes to any of those without pushing the original value first, the caller gets back garbage and may dereference a corrupted pointer three function calls later. Walk through the function and confirm every callee-saved register you touch has a matching push at entry and pop at exit.
What is the red zone and when is it safe to use?
The 128 bytes below rsp on System V AMD64. Leaf functions (functions that call nothing) may use it as scratch without adjusting rsp. The moment your function calls anything (including syscalls or signal handlers), the red zone is invalidated and you must allocate proper stack space with sub rsp, N before the call. Violating this produces extremely hard-to-reproduce bugs because the corruption only happens when a signal arrives mid-function.
How do I read GDB disassembly output for x86 vs ARM?
For x86 set disassembly-flavor intel inside gdb (or set to att if your course uses AT&T syntax, which inverts operand order). For ARM, gdb defaults to the architecture-native syntax. Use layout asm to get a split view of source and assembly side by side, stepi to advance one machine instruction at a time, info registers to dump the architectural state, and x/Nxw $rsp to inspect the stack.
Why does the same algorithm run 5x faster with a different loop order?
Cache behavior. Modern CPUs prefetch sequential memory access patterns aggressively; non-sequential patterns miss the prefetcher and incur the full DRAM latency cost on every miss. Matrix multiplication is the canonical example. ijk loop order vs ikj order traverses the same data but with completely different cache-line access patterns. Measure with perf stat -e cache-misses to confirm.
Do you cover CMU 15-213 (CSAPP), CS61C, and CS107 lab formats?
Yes. The 15-213 bomblab, attacklab, and proxylab use specific autograders against compiled binaries; CS61C labs use Logisim plus Venus for the early MIPS work and bare-metal RISC-V for the later projects; CS107 is C-centric with the SunOS-style starter codebase. I match the exact toolchain version, calling convention, and submission format the course uses.
When should I write inline assembly in C versus pure C with compiler intrinsics?
Inline assembly is the right call when you need a specific instruction the compiler will not emit (rdtsc, cpuid, lock prefix on a custom operation). For SIMD work, prefer compiler intrinsics (immintrin.h for x86 AVX, arm_neon.h for ARM NEON). The compiler can schedule intrinsic calls across surrounding C code; inline asm is a black box the optimizer cannot reorder around.

Work With James?

Submit your assignment with James in mind. We will route the request to the best-fit tutor based on subject, language, and current load. Average first reply inside 30 minutes during business hours.

Submit for James