Tutor Profile

James Okafor

BS Computer Science. Specializes in x86-64 assembly and ARM64 assembly.

Work with James All Tutors

620+ Assignments Delivered Across CSHH and prior tutoring

7 Years Tutoring Since first paid teaching role

3 Languages Covered C, C++, Assembly

8 Documented Specialties Each with a diagnostic playbook

About the Tutor

About James

James finished his CS bachelors, then spent five years writing low-level performance code for a kernel-security shop where the day job was reading Intel and AMD optimization manuals end to end, writing assembly probes that measured pipeline behavior at single-cycle resolution, and reproducing the silicon errata vendors themselves had documented. None of that lands on a typical undergraduate transcript. All of it lands on the kind of architecture assignments students get stuck on. Seven years into tutoring and 620+ CSHH assignments later, he still teaches the same way: from the instruction set up, not from the high-level language down.

His tutoring is heavy on x86-64 and ARM64 assembly because those are the two architectures students actually encounter. He covers RISC-V for courses that teach it. The recurring student frustration is calling-convention bugs: a function that compiles and links but corrupts the stack, returns garbage, or segfaults the moment another function is called. The bug is almost always a register-preservation violation. The student wrote a function that clobbers rbx or r12 (callee-saved under System V AMD64) without pushing them first, the caller relied on those registers still holding their values, and the corruption surfaces three calls later. James traces these by reading the disassembly with the ABI doc open. Students learn to do the same.

A representative case from last semester. A systems-course student submitted a hand-written assembly implementation of memcpy that passed the functional tests but mysteriously broke the test harness on the second invocation. The function preserved the right callee-saved registers, returned the right value, and handled alignment correctly. The bug was that the student had used the red zone (the 128 bytes below rsp under System V AMD64) for a temporary buffer, which is legal in a leaf function but illegal when the function calls anything else, because signal handlers can scribble on it. James found the bug by stepping through with gdb and noticing rsp had not been adjusted before a downstream call. The fix was three instructions to allocate proper stack space. The student went from a failing grade to full credit on the lab and understood why the red zone existed for the first time.

On the architecture side his teaching priority is the memory hierarchy. Most students learn that L1 is fast and DRAM is slow without ever measuring the gap. James walks them through a cycle-accurate model: 4 cycles for L1 hit, 12 for L2, 40 for L3, 200 for DRAM on a modern Intel chip. Then they implement matrix multiplication two ways: a naive triple loop that misses cache constantly and a blocked version that respects the L1 working set. The measured speedup is usually 4 to 10x on the same algorithm. Students who experience this once stop writing code that ignores the cache.

His CSHH workflow is methodical. Brief arrives, he reads the ISA spec section relevant to the assignment first, drafts the assembly by hand on paper, then types it in and runs it under gdb with set disassembly-flavor intel for x86 or set disassembly-flavor att depending on what the course uses. Every instruction in the delivered solution has a comment explaining what it does at the architectural level: which register it reads, which it writes, which flags it sets, why it appears at that point in the code. A "good" student question for James is one where the student can show the disassembly of their compiled code and point at the specific instruction sequence they do not understand. With that, the lesson starts at the actual confusion instead of three layers above it. The pset gets done. The mental model also gets built, and that one carries forward to the next assignment.

Documented Specialties

What James Specializes In

x86-64 assembly

System V AMD64 calling convention, SSE/AVX vector instructions.

ARM64 assembly

AArch64 calling convention, NEON intrinsics.

RISC-V assembly

RV32I, RV64I, M and F extensions.

GDB disassembly workflow

Layout asm, stepi, info registers, x/Nxw.

Memory hierarchy analysis

L1/L2/L3/DRAM, TLB, cache blocking.

Pipeline hazards (data, control, structural) and microarchitectural state

James handles pipeline hazards (data, control, structural) and microarchitectural state as a recurring CSHH workload, with documented patterns and reference solutions.

Sample Reviewed Code

Code James Has Reviewed

A representative snippet from James's workflow. Pulled from the diagnostic playbook James runs on incoming CSHH assignments in this language.

x86-64 Assembly callee_saved.s


          1
          # System V AMD64 callee-saved set: rbx, rbp, r12-r15, rsp.
        
          2
          # Push every callee-saved register you touch. Pop in reverse order.
        
          3
          # A binary-exploitation lab top deduction: missing this prologue.
        
          4
           
        
          5
          .globl my_function
        
          6
          my_function:
        
          7
              pushq   %rbp                 # save caller's frame pointer
        
          8
              movq    %rsp, %rbp           # establish our own
        
          9
              pushq   %rbx                 # we will use rbx below
        
          10
              pushq   %r12                 # and r12
        
          11
              subq    $32, %rsp            # 32B local frame (NOT the red zone -
        
          12
                                           # we call printf below, red zone is invalid)
        
          13
           
        
          14
              # ... function body uses rbx, r12 freely ...
        
          15
              call    printf@PLT
        
          16
           
        
          17
              addq    $32, %rsp            # tear down local frame
        
          18
              popq    %r12                 # restore in reverse push order
        
          19
              popq    %rbx
        
          20
              popq    %rbp
        
          21
              ret

Coverage Map

Subjects and Languages James Covers

Subjects

Operating Systems Data Structures

Languages

C C++Assembly

FAQ

Frequently Asked Questions

Why does my hand-written assembly function corrupt the stack on return?

Almost always a callee-saved register violation. Under System V AMD64 (Linux, macOS) the callee must preserve rbx, rbp, r12-r15, plus rsp. If your function writes to any of those without pushing the original value first, the caller gets back garbage and may dereference a corrupted pointer three function calls later. Walk through the function and confirm every callee-saved register you touch has a matching push at entry and pop at exit.

What is the red zone and when is it safe to use?

The 128 bytes below rsp on System V AMD64. Leaf functions (functions that call nothing) may use it as scratch without adjusting rsp. The moment your function calls anything (including syscalls or signal handlers), the red zone is invalidated and you must allocate proper stack space with sub rsp, N before the call. Violating this produces extremely hard-to-reproduce bugs because the corruption only happens when a signal arrives mid-function.

How do I read GDB disassembly output for x86 vs ARM?

For x86 set disassembly-flavor intel inside gdb (or set to att if your course uses AT&T syntax, which inverts operand order). For ARM, gdb defaults to the architecture-native syntax. Use layout asm to get a split view of source and assembly side by side, stepi to advance one machine instruction at a time, info registers to dump the architectural state, and x/Nxw $rsp to inspect the stack.

Why does the same algorithm run 5x faster with a different loop order?

Cache behavior. Modern CPUs prefetch sequential memory access patterns aggressively; non-sequential patterns miss the prefetcher and incur the full DRAM latency cost on every miss. Matrix multiplication is the canonical example. ijk loop order vs ikj order traverses the same data but with completely different cache-line access patterns. Measure with perf stat -e cache-misses to confirm.

Do you cover the standard systems and computer-architecture lab formats?

Yes. The classic systems labs (a binary-bomb exercise, an attack lab, a proxy server) use specific autograders against compiled binaries; computer-architecture labs use circuit simulators plus an assembler for the early MIPS work and bare-metal RISC-V for the later projects; the C-systems track ships a Unix-style starter codebase. I match the exact toolchain version, calling convention, and submission format the course uses.

When should I write inline assembly in C versus pure C with compiler intrinsics?

Inline assembly is the right call when you need a specific instruction the compiler will not emit (rdtsc, cpuid, lock prefix on a custom operation). For SIMD work, prefer compiler intrinsics (immintrin.h for x86 AVX, arm_neon.h for ARM NEON). The compiler can schedule intrinsic calls across surrounding C code; inline asm is a black box the optimizer cannot reorder around.

A Few More CSHH Tutors

Four tutors keep public profiles. The rest of the bench stays off the public site so student-tutor matches stay confidential.

Sarah C., PhD

PhD CS

graph algorithms (BFS, DFS, Dijkstra, Bellman-Ford, MST)dynamic programming (top-down memoization, bottom-up tabulation)PyTorch autograd debugging +4 more

1,200+ assignments completed

Marcus W., MS CS

MS CS

C memory management (malloc/free discipline, valgrind traces)C++ RAII and modern ownership patterns (unique_ptr, shared_ptr, move semantics)pthreads concurrency (mutex, condvar, rwlock, race-condition isolation) +5 more

980+ assignments completed

Active Bench

20+ verified CS graduates

Behind the four named profiles is a wider matching bench. Submissions auto-route by subject, language, and timezone. The public profiles cover the most-requested specializations; the rest of the roster stays unpublished so student-tutor pairings stay private.

Get matched to a tutor

View all CSHH tutors →

Work With James?

Submit your assignment with James in mind. We will route the request to the best-fit tutor based on subject, language, and current load. Average first reply inside 30 minutes during business hours.

Submit for James