Ahmet Baris
Gunaydin
Senior full-stack consultant by day.
Independent researcher by night.
Consulting at Publicis Media (via bi:procsi / Azendo) on the dotstudio platform. Shipping research in the off-hours.
- 6projects live
- 2preprints
- 6weeks, nights & weekends
Six weeks, six shipped — nights & weekends.
Day job is senior full-stack consulting. Everything below was built in the off-hours. Preprints on Zenodo, sites on Vercel, all source on GitHub — dates are first commits, not marketing.
-
markview first release
Embeddable markdown rendering stack goes public on GitHub — Next.js web app, React SDK,
<mark-view>Web Component. -
WebGPU benchmark codebase starts
First commits of what becomes gpubench.dev — browser-based GPU benchmark suite, one dispatch per test.
-
Kernel-fusion research code lands
Single-kernel fusion for sequential fitness evaluation — first working WGSL implementation, 720× over PyTorch baseline.
-
Preprints + sites — four things in one day
Two Zenodo preprints (720× kernel fusion, 458× transformer fusion) plus the kernelfusion.dev and gpubench.dev site launches — all first-committed the same day.
-
zero-tvm boots
Phi-3-mini chat working end-to-end at 27 tok/s — first WebLLM-compatible inference over hand-written WGSL.
-
safenpm first commit
Drop-in
npm installreplacement — sandboxed postinstall, supply-chain detection, decentralized alerts. -
webgpu-dna monorepo opens
Geant4-DNA Monte Carlo ported to WebGPU — modular TypeScript + WGSL. CSDA within 2%, 46/46 unit tests pass.
-
This site
Hub goes live. Everything above shipped, indexed, cross-linked. Still shipping.
Six live projects.
Open source, open data, open DOIs. Everything shipped runs in the browser or as a drop-in tool.
kernelfusion.dev
Single-kernel GPU fusion. Two published preprints. 720× sequential, 458× transformer.
- 720×sequential
- 458×transformer
gpubench.dev
Real WebGPU benchmarks, live and growing. 592 compute runs + 170 transformer-fusion runs across 7 GPU vendors. Updated every submission.
- 592devices
- 2,865×Apple avg
zerotvm.com
Phi-3-mini in the browser on 10 hand-written WGSL kernels across 27 files. No TVM, no WebLLM, no compiler.
- 10kernels
- 27WGSL files
webgpu-dna
Geant4-DNA radiobiology port to WebGPU. CSDA 0.985× Geant4 reference. 46/46 unit tests pass.
- 0.985×vs Geant4
- 46/46tests
markview.ai
Embeddable markdown rendering stack. React SDK, Web Component, native macOS app. Shiki · Mermaid · KaTeX · MCP.
- 3distributions
- 4engines
safenpm.dev
Drop-in npm install that sandboxes postinstall, detects supply-chain attacks, shares alerts across a decentralized threat-intel network.
- 0dependencies
- ∞postinstalls sandboxed
Two preprints, public.
Zenodo DOIs. Reviewed-as-you-read style — source, data, and reproducers linked from each.
Single-kernel GPU fusion: 720× over PyTorch on sequential compute
Evolutionary-compute benchmark, M2 Pro + T4. WebGPU is 159×, JAX 172×, Triton 27×. CUDA hits 720× when the full history fuses into one kernel.
read → doi:10.5281/zenodo.19344277Transformer fusion: 458× inference speedup via fused kernel dispatch
End-to-end transformer inference in one compute-shader dispatch. 458× over PyTorch eager, 1.92× browser overhead vs native.
read →The language doesn't matter. The result does.
Listing frameworks is a 2015 résumé. The work on this page spans TypeScript, WGSL, Python, Rust, CUDA, and shell — I don't care which one a problem wants. I care whether the benchmark runs, the tests pass, and the domain resolves.
Code → benchmark → paper → domain, same week.
Two Zenodo preprints, six live sites, eight GitHub repos — first commit to public URL is typically under a week. Research without a deployed artifact is a claim, not a result.
I own the whole surface.
DNS, CI, edge caching, preview deploys, SEO, analytics, observability, SSO, billing — I've wired it all on this stack of projects and in prior senior roles. Nothing here sits behind an "it works on my laptop" excuse.
Numbers on this page are reproducible.
592 compute runs + 170 transformer runs live on gpubench, growing by the hour. Apple Silicon averages 2,865× fusion speedup. 46/46 tests on webgpu-dna. CSDA within 2% of Geant4 reference. If it's on this site, the code is on GitHub and the benchmark runs in your browser.
Where I've been shipping.
Day-job progression: founding engineer → senior consultant at a Big Six media holding group. The research line above happens alongside — nights, weekends, DOIs.
Senior Full-Stack Consultant · dotstudio, Publicis Media
Consulting on the dotstudio platform at Publicis Media — one of the Big Six global media / advertising holding groups. Placed via bi:procsi; employed by Azendo. Full-stack engineering across a multi-tenant ad-tech platform.
Founding Engineer · Balance Cash (balancecash.io ↗)
First engineering hire at a seed-stage treasury fintech in San Francisco. Built SmartSweeps — automated cash sweeps into U.S. Treasury-backed, AAAm money-market funds for modern finance teams and CFOs. SEC-regulated investment advisor, SOC 2 Type II, SIPC-insured to $500k. Backend, frontend, infra, and the compliance surface that made all three audit-passable.
Let's build something — or write a paper together.
Currently consulting full-time at Publicis Media on dotstudio (via bi:procsi / Azendo). Open to research collaboration and the right senior staff / principal conversation — for everything else, I reply within a day.