Category: Development
From Autocomplete to Apprentice: Training AI to Work in Our Codebase
5 Jun2025

I’ve spent the last year turning large‑language‑model agents into productive teammates. Around 80 % of the code I ship nowadays is written by an LLM, yet it still reflects the unspoken project rules, patterns, and habits our team has baked into the environment. This post documents the recipe that took me there.


From Tab Autocomplete to Autonomous Agents

It started with a curiosity.

July 2024 – I installed Cursor and quickly fell in love with the autocomplete. After a year on GitHub Copilot, the Tab‑Tab‑Tab feature felt like a step change. It was like having a sharp intern finish your thoughts. The suggestions were fast, helpful, and mostly correct—as long as you gave it enough local context to make an educated guess.

Early 2025 – I gave Chat a try on a personal project. For the first time, I saw what it could do across a whole file. I had to tightly manage the context and double‑check everything, but something clicked. It wasn’t just useful—it was promising, and I felt in control of the results produced by the model.

March 2025 – I turned on Agent Mode, and my process fell apart at once. The Sonnet 3.7 model would charge ahead, cheerfully rewriting parts of the system it barely understood, hallucinating non‑existing APIs. It was chaos—the kind that feels overwhelming at first, but also oddly instructive if you paid attention. Debugging became a game of whack‑a‑mole. Some days, I spent more time undoing changes than moving forward. But under the mess, I saw potential. I started to understand the reasons why the model failed—and it all boiled down to context.

April 2025 to now – That’s when I discovered Cursor Rules. One by one, I started adding bits of project‑specific context to the system: naming conventions, testing quirks, deployment rituals. And just like that, the agent stopped acting like a rogue junior developer with full access and no supervision. It started to feel like a teammate with some tenure and reasonable understanding of the system, capable of implementing large, complex changes end‑to‑end without much involvement from my side.


Why Cursor Rules Matter

LLMs arrive pre‑trained on the internet. They don’t know your domain language, naming conventions, or deployment rituals. Cursor rules are small Markdown files that pin that tribal knowledge right next to the code. Add them, and the agent’s context window is always seeded with the right cues, ensuring your LLM partner starts each task aligned with your preferences.


My Six‑Step Onboarding Recipe

Step 1 – Start With an Empty Rulebook

When our team onboards an agent into our application, we skip the generic rulepacks. Every mature codebase bends the rules somewhere, and starting with an existing ruleset cements someone else’s preferences into your agent’s behavior, making it harder to steer.

Step 2 – Dump Context Into Chat

Open a fresh chat with a capable model (o3, Gemini 2.5, Anthropic Opus). Brain‑dump everything you know:

  • project purpose
  • domain terminology
  • architectural quirks
  • links to docs (@docs/…), READMEs, dashboards

Keep talking until you run dry, don’t worry about structure. I’ve spent over an hour at times, speaking nonstop into MacWhisper.

Step 3 – Generate the 000‑project‑info Rule

Ask the model to condense that chat into .cursor/rules/000-project-info.mdc and mark it Always. I use the numeric prefix so @0 autocompletes it later.

Step 4 – Keep a Living Knowledge Base

If you’re still onboarding yourself into the project, this is where AI shines. Ask it every question you can think of: what does this part do, how are things usually named, why is this structured that way? Every time you discover something new together that feels like it might help an agent make better decisions, capture it. Either update your main project info file or create a new rule file for it.

Here are some rules I have created in most of my projects:

  • 001-tech-guidelines.mdc – languages, frameworks, linters, dependency conventions.
  • 002-testing-guidelines.mdc – how to run all tests, a single file, or one example; test types; preferred TDD style.
  • 003-data-model.mdc (Agent‑requested) – list of models, relationships, invariants (generated by having the model parse schema.rb and the app/models folder).

Mark the first two Always, the rest Agent‑requested so they load on demand. Some other things I find useful to include (in agent-requested mode):

  • Show a page with API docs for an obscure dependency to the agent, ask it to generate a rule explaining the usage of that API.
  • For any unusual pattern within the codebase like an internal abstraction layer for a database or an external service, an internal library, etc explain to the agent why the abstraction exists and how it is used, then point it at important pieces of relevant code (both implementation and usage), then ask for a rule guiding an AI model in using that piece of technology.
  • Internal tooling: explain all the tools you have available for the agent to do its job and when and how to use those. Think linters, code quality and coverage controls, different types of tests and other ways to get feedback on the quality of AI’s solution.

Step 5 – Let the Agent Struggle (Then Capture the Lesson)

Pick a trivial task you already know how to implement. Let the agent attempt. When it stumbles, nudge it forward. Not by coding, but by asking questions and pointing to clues. After the fix ships, ask:

Look back at the history of our conversation and see if you notice any patterns that would be helpful for an AI coding agent to know next time we work on something similar. Update your existing cursor rules or create new ones to persist your findings.

Then review and commit the changes. This will help the model get better at solving problems similar to what you have just done.

Step 6 – Rinse, Repeat, Refine

Repeat this for a few weeks and the model will start making fewer obvious mistakes, build things in ways that match your expectations, and often pre‑empt your next move before you’ve even typed it out.


One surprising effect of this process is how much of my tacit knowledge I’ve had to put into words —decades of habits, intuition, and project‑specific judgment calls now live in Markdown files. That knowledge doesn’t just help my agent; it lifts the whole team. As we all work with our agents, we start seeing them act on rules someone else introduced, surfacing insights and patterns we hadn’t shared before. It’s low‑friction knowledge transfer, and it works.


The Apprenticeship Model

At some point while documenting this process, I realized what it resembled: an apprenticeship. You’re bringing on a new team member, and instead of throwing manuals at them, you teach by pairing on real tasks. You guide, correct, explain. The model’s pre-training is its education, sure — but adapting it to your environment, your tools, your expectations — that part is still on us. That’s the job of a mentor, and that’s how I see this work now. Our job is changing and we may all eventually become PMs managing teams of AI agents, but today we need to be mentors first.


Interview: Inside Shopify’s Modular Monolith
16 Jun2024

This is my interview with Dr. Milan Milanovic originally published on his newsletter Tech World With Milan where we discussed Shopify  architecture, tech stack, testing, culture, and more.

1.  Who is Oleksiy?

I have spent most of my career in technical operations (system administration, later called DevOps, nowadays encompassed by platform engineering and SRE disciplines). Along the way, I worked at Percona as a MySQL performance consultant and then operated some of the largest Ruby on Rails applications in the world, all the while following the incredible story of Shopify’s development and growth.

Finally, after decades of work in operations, when a startup I was at got acquired by Elastic, I decided to move into software engineering. After 5 years there, I needed a bigger challenge, which felt like the right moment to join Shopify.

I started with the Storefronts group (the team responsible for Storefront themes, all the related infrastructure, and the Storefront rendering infrastructure) at Shopify at the beginning of 2022. Two years later, I can confidently say that Shopify’s culture is unique. I enjoy working with the team here due to the incredible talent density I have never encountered. Every day, I am humbled by the caliber of people I can work with and the level of problems I get to solve.

2.  What is the role of the Principal Engineer at Shopify?

Before joining Shopify, I was excited about all the possibilities associated with the Principal Engineer role. Immediately, I was surprised at how diverse the Principal Engineering discipline was at the company. We have a range of engineers here, from extremely deep and narrow experts to amazing architects coordinating challenging projects across the company. Even more impressive is that you have a lot of agency in the shape of a Principal Engineer you will be, provided that the work aligns with the overarching mission of making commerce better for everyone. After 2 years with the company, I found myself in a sweet spot of spending ~75% of my time doing deep technical work across multiple areas of Storefronts infrastructure, and the rest is spent on project leadership, coordination, etc.

3.  The recent tweet by Shopify Engineering shows impressive results achieved by your system. What is Shopify’s overall architecture?

The infrastructure at Shopify was one of the most surprising parts of the company for me. I have spent my whole career building large, heavily loaded systems based on Ruby on Rails. Joining Shopify and knowing upfront a lot about the amount of traffic they handled during Black Friday, Cyber Monday (BFCM), and flash sales, I was half-expecting to find some magic sauce inside. But the reality turned out to be very different: the team here is extremely pragmatic when building anything. It comes from Shopify’s Founder and CEO Tobi Lütke himself: if something can be made simpler, we try to make it so. As a result, the whole system behind those impressive numbers is built on top of fairly common components: Ruby, Rails, MySQL/Vitess, Memcached/Redis, Kafka, Elasticsearch, etc., scaled horizontally.

Shopify Engineering Tweet about the amount of traffic they handled during Black Friday

What makes Shopify unique is the level of mastery the teams have built around those key components: we employ Ruby core contributors (who keep making Ruby faster), Rails core contributors (improving Rails), MySQL experts (who know how to operate MySQL at scale), and we contribute to and maintain all kinds of open-source projects that support our infrastructure. As a result, even the simplest components in our infrastructure tend to be deployed, managed, and scaled exceptionally well, leading to a system that can scale to many orders of magnitude over the baseline capacity and still perform well.

4.  What is Shopify’s tech stack?

Given that databases (and stateful systems in general) are the most complex components to scale, we focus our scaling on MySQL first. All shops on the platform are split into groups, each hosted on a dedicated set of database servers called a pod. Each pod is wholly isolated from the rest of the database infrastructure, limiting the blast radius of most database-related incidents to a relatively small group of shops. Some more prominent merchants get their dedicated pods that guarantee complete resource isolation.

Over the past year, some applications started relying on Vitess to help with the horizontal sharding of their data.

On top of the database layer is a reasonably standard Ruby on Rails stack: Ruby and Rails applications running on Puma, using Memcached for ephemeral storage needs and Elasticsearch for full-text search. Nginx + Lua is used for sophisticated tasks, from smart routing across multiple regions to rate limiting, abuse protection, etc.

This runs on top of Kubernetes hosted on Google Cloud in many regions worldwide, making the infrastructure extremely scalable and responsive to wild traffic fluctuations.

Check the full Shopify tech stack at Stackshare.

A Pods Architecture To Allow Shopify To Scale (Source: Shopify Engineering)

What are Pods exactly?

The idea behind pods at Shopify is to split all of our data into a set of completely independent database (MySQL) clusters using shop_id as the sharding key to ensure resource isolation between different tenants and localize the impact of a “noisy neighbor” problem across the platform. 

Only the databases are podded since they are the hardest component to scale. Everything else that is stateless is scaled automatically according to the incoming traffic levels and other load parameters using a custom Kubernetes autoscale.

5. Is the monolith going to be broken into microservices?

Shopify fully embraces the idea of a Majestic Monolith—most user-facing functionality people tend to associate with the company is served by a single large Ruby on Rails application called “Shopify Core.” Internally, the monolith is split into multiple components focused on different business domains. Many custom (later open-sourced) machinery have been built to enforce coding standards, API boundaries between components, etc.

The rendering application behind all Shopify storefronts is completely separate from the monolith. This was one of the cases where it made perfect sense to split functionality from Core because it is relatively simple. Load data from a database, render Liquid code, and send the HTML back to the user – the absolute majority of requests it handles. Given the amount of traffic on this application, even a small improvement in its efficiency results in enormous resource savings. So, when it was initially built, the team set several strict constraints on how the code is written, what features of Ruby we prefer to avoid, how we deal with memory usage, etc. This allowed us to build a pretty efficient application in a language we love while carefully controlling memory allocation and the resources we spend rendering storefronts.

Shopify application components

In parallel with this effort, the Ruby infrastructure team (working on YJIT, among other things) has made the language significantly faster with each release. Finally, in the last year, we started rewriting parts of this application in Rust to improve efficiency further.

Answering your question about the future of the monolith, I think outside of a few other localized cases, most of the functionality of the Shopify platform will probably be handled by the Core monolith for a long time, given how well it has worked for us so far using relatively standard horizontal scalability techniques.

6. How do you do testing?

Our testing infrastructure is a multi-layered set of checks that allows us to deploy hundreds of times daily while keeping the platform safe. It starts with a set of tests on each application: your typical unit/integration tests, etc. Those are required for a change to propagate into a deployment pipeline (based on the Shipit engine, created by Shopify and open-sourced years ago.

Shopify overall infrastructure

During the deployment, a very important step is canary testing: a change will be deployed onto a small subset of production instances, and automation will monitor a set of key health metrics for the platform. If any metrics move in the wrong direction, the change is automatically reverted and removed from production immediately, allowing developers to figure out what went wrong and try again when they fix the problem. Only after testing a change on canaries for some time the deployment pipeline performs a full deployment. The same approach is used for significant schema changes, etc.

7. How do you do deployments?

All Shopify deployments are based on Kubernetes (running on GCP), so each application is a container (or a fleet of containers) somewhere in one of our clusters. Our deployment pipeline is built on the Shipit engine (created by Shopify and open-sourced years ago). Deployment pipelines can get pretty complex, but it mostly boils down to building an image, deploying it to canaries, waiting to ensure things are healthy, and gradually rolling out the change wider across the global fleet of Kubernetes clusters.

Shipit also maintains the deployment queue and merges multiple pull requests into a single deployment to increase the pipeline’s throughput.

Shipit open-source deployment tool by Shopify (Source)

8. How do you handle failures in the system? 

The whole system is built with many redundancy and horizontal auto-scaling (if possible), which helps prevent large-scale outages. But there are always big and small fires to handle. So, we have a dedicated site reliability team responsible for keeping the platform healthy in the face of constant change and adversarial problems like bots and DDoS attacks. They have built many automated tools to help us handle traffic flashes and, if needed, degrade gracefully. Some interesting examples: they have automated traffic analysis tools helping them scope ongoing incidents down to specific pods, shops, page types, or traffic sources; then the team can control the flow of traffic by pod or shop, re-route traffic between regions, block or slow down requests from specific parts of the world, prioritize particular types of traffic and apply anti-adversarial measures across our network to mitigate attacks.

Finally, each application has an owner team (or a set of teams) that can be paged if their application gets unhealthy. They help troubleshoot and resolve incidents around the clock (being a distributed company helps a lot here since we have people across many time zones).

9. What challenges are you working on right now in your team?

We have just finished a large project to increase the global footprint of our Storefront rendering infrastructure, rolling out new regions in Europe, Asia, Australia, and North America. The project required coordination across many different teams (from networking to databases to operations, etc.) and involved building completely new tools for filtered database replication (since we cannot replicate all of our data into all regions due to cost and data residency requirements), making changes in the application itself to allow for rendering without having access to all data, etc. This large effort has reduced latency for our buyers worldwide and made their shopping experiences smoother.

Next on our radar are further improvements in Liquid rendering performance, database access optimization, and other performance-related work.


Edge Web Server Testing at Swiftype
28 Apr2018

This article has been originally posted on Swiftype Engineering blog.


For any modern technology company, a comprehensive application test suite is an absolute necessity. Automated testing suites allow developers to move faster while avoiding any loss of code quality or system stability. Software development has seen great benefit come from the adoption of automated testing frameworks and methodologies, however, the culture of automated testing has neglected one key area of modern web application serving stack: web application edge routing and multiplexing rulesets.

From modern load balancer appliances that allow for TCL based rule sets; local or remotely hosted varnish VCL rules; or in the power and flexibility that Nginx and OpenResty make available through LUA, edge routing rulesets have become a vital part of application serving controls.

Over the past decade or so, it has become possible to incorporate more and more logic into edge web server infrastructures. Almost every modern web server has support for scripting, enabling developers to make their edge servers smarter than ever before. Unfortunately, the application logic configured within web servers is often much harder to test than that hosted directly in application code, and thus too often software teams resort to manual testing, or worse, customers as testers, by shipping their changes to production without edge routing testing having been performed.

In this post, I would like to explain the approach Swiftype has taken to ensure that our test suites account for our use of complex edge web server logic
to manage our production traffic flow, and thus that we can confidently deploy changes to our application infrastructure with little or no risk.

Read the rest of this entry

DbCharmer Development: I Give Up
14 Nov2014

About 6 years ago (feels like an eternity in Rails world) working at Scribd I’ve started working on porting our codebase from some old version or Rails to a slightly newer one. That’s when I realized, that there wasn’t a ruby gem to help us manage MySQL connections for our vertically sharded databases (different models on different servers). I’ve started hacking on some code to replace whatever we were using back then, finished the first version of the migration branch and then decided to open the code for other people to use. That’s how the DbCharmer ruby gem was born.

For the next few years a lot of new functionality we needed has been added to the gem, making it more complex and immensely more powerful. I’ve enjoyed working on it, developing those features, contributing to the community. But then I left Scribd, stopped being a user of DbCharmer and the situation drastically changed. For quite some time (years) I would keep fighting to make the code work with newer and newer versions of Rails, struggling to wrap my head around more and more (sometimes useless) abstractions Rails Core team decided to throw into ActiveRecord.

Finally, in the last 2 years (while trying to make DbCharmer compatible with Rails 4.0) it has become more and more apparent, that I simply do not want to do this anymore. I do not need DbCharmer to support Rails 4.0+, while it is very clear that many users need it and constant nagging in the issues and the mailing list, asking for updates generated a lot of anxiety for me, anxiety I couldn’t do much about (the worst kind). As the result, since I simply do not see any good reasons to keep fighting this uphill battle (and developing stuff like this for ActiveRecord IS a constant battle!) I officially give up.

Read the rest of this entry


Adding Custom Hive SerDe and UDF Libraries to Cloudera Hadoop 4.3
26 Jul2013

Yet another small note about Cloudera Hadoop Distribution 4.3.

This time I needed to deploy some custom JAR files to our Hive cluster so that we wouldn’t need to do “ADD JAR” commands in every Hive job (especially useful when using HiveServer API).

Here is the process of adding a custom SerDE or a UDF jar to your Cloudera Hadoop cluster:

  • First, we have built our JSON SerDe and got a json-serde-1.1.6.jar file.
  • To make this file available to Hive CLI tools, we need to copy it to /usr/lib/hive/lib on every server in the cluster (I have prepared an rpm package to do just that).
  • To make sure Hive map-reduce jobs would be able to read/write JSON tables, we needed to copy our JAR file to /usr/lib/hadoop/lib directory on all task tracker servers in the cluster (the same rpm does that).
  • And last, really important step: To make sure your TaskTracker servers know about the new jar, you need to restart your tasktracker services (we use Cloudera Manager, so that was just a few mouse clicks ;-))

And this is it for today.