Paid Summer 2026 role building probes, receipts, and environments for compute, cyber, and agent-infrastructure claims.
Stipend: $7,500. Location: New York City or San Francisco preferred, or willing to relocate for the summer.
Ashiba Research is building probes and receipts for meaningful technical claims: evidence systems that make compute, cyber, and agent-infrastructure claims specific, testable, and hard to overread.
We are looking for a strong CS undergraduate or early graduate student to help develop environments and probes for claims at the boundary of AI systems, cybersecurity telemetry, performance engineering, and distributed systems.
Claims we care about
"This tool action was authorized."
"This agent trace preserves the evidence needed for incident review."
"This kernel is faster than the baseline without violating correctness."
"This GPU node is safe to rerun on."
"This compute substrate behaves like the one the vendor claimed."
The work is to turn claims like these into concrete artifacts: logs, fixtures, checks, verdicts, probes, and bounded receipts.
How the work will feel
The work will be dynamic. We have a core direction, but the best project may move toward whichever research area produces the strongest artifact, user signal, or commercial wedge during the summer.
You will have wide latitude to work on commercially useful, potentially profitable areas of research such as:
data and evidence artifacts for high-value operational claims;
RL/eval environments where agents must preserve or repair evidence;
policies and receipts for agents, robots, and other systems that take external actions;
GPU and compute probes for acceptance, health, performance, and settlement;
kernel-level correctness and performance experiments, including genuinely sick kernel tricks when they clarify a real claim.
The common thread is not a domain label. The common thread is turning an important technical claim into something measurable, bounded, and hard to fool.
You might work on
Agent/cyber telemetry environments for checking authorization, parser repair, prompt continuity, and hidden operational failures.
Compute probes for GPU health, kernel correctness, performance claims, and substrate behavior.
Deterministic scorers and validators for evidence bundles.
Small benchmark environments where agents learn to generate or evaluate probes.
Tools that turn messy logs into bounded receipts: supported, contradicted, unknown, or not applicable.
Good fit
Strong CS fundamentals.
Experience in AI/ML, systems, security, distributed systems, compilers, performance engineering, or competitive programming.
Comfort reading logs, writing tests, and being precise about what evidence does and does not prove.
Taste for small, sharp tools over vague demos.
A desire to work near frontier AI without just building another chatbot wrapper.
Nice to have
Python fluency.
Familiarity with GPUs, CUDA, PyTorch, Triton, distributed training, agent frameworks, evals, observability, or security telemetry.
Competitive programming, CTF, systems research, or serious open-source experience.
How to apply
Send a short note, resume or GitHub, and one example of a technical project you are proud of. Especially useful: something where you had to make a system measurable, debuggable, correct, or hard to fool.