How to migrate large projects with AI without context rot

Teaching AI coding assistants about giant projects without hitting context limits. Reframing from a prompt problem to a knowledge encapsulation problem through declarative contracts.

The Problem: Why Your LLM Can't Just "Read" Your Repo

You have a big project with features you want to replicate on a new app.

How do you teach an AI coding assistant (like Cursor, Gemini, or ChatGPT) to reproduce some of those features?

Suppose you take the whole project structure of the original and feed it to a prompt, you will hit context limit and context rot.

Suppose you take the snippet of the file feature and ask it to replicate just the file, you run into couple of problems:

  • you would need to know the file and all of its dependencies.
  • you would also need to know all of the references for the file: in terms of file tree placement and data architecture.

Attempting to teach an AI coding assistant (like Cursor, Gemini, or ChatGPT) about a giant project by pasting the entire codebase, or even just large snippets, fails predictably due to three constraints:

  1. Context Limit: The AI hits the maximum token limit, leaving most of your project unseen.
  2. Context Rot (Semantic Drift): Even if the code fits, the volume dilutes the signal, making the AI lose focus on the core task. 1
  3. Dependency Hell: A single file snippet is useless without all its dependencies, references, architectural patterns, and file tree placement.

What to do?

Never dump raw project context (aka repo2txt) unless the repo is small enough (almost never the case)

The scalable, industry-grade method is to create declarative "contracts" of knowledge, not monolithic source code snapshots.


Part 1: Formalizing Architectural Awareness

The "teaching Cursor" step should be reframed from a prompt problem to a knowledge encapsulation problem.

What this means is you should define the system's intent and boundaries before showing the code. Let me show you what I mean by that:

1. Formalize the Feature Capsule

A feature should be taught to the AI as a high-level contract, not a raw implementation. This capsule carries the semantic boundaries without the raw code volume.

LayerContentAnalogy
Interface SpecificationThe feature's inputs, outputs, and side effects.An API specification.
DependenciesWhat external modules, services, or libraries it calls.The list of external services in a microservice contract.
Implementation NotesA brief summary of why it behaves that way (e.g., "uses Redux for state," "implements optimistic locking").The design document.

The resulting artifact should be stored as plain text, like /ai_docs/features/login.md, ready for AI consumption.

2. Create a Code Map Manifest

OK. How you gonna do that?

One way to do it is to use static analysis tools (e.g., depcruise for JS/TS) to generate a dependency graph out of your repo.

Then, pass the generated image to a multimodal AI that can handle images:

PROMPT:
Convert this graph into a simplified JSON manifest that summarizes:

* Which modules import which (the structural relationships).
* Which files are pure utilities versus those that touch the DB or UI.

This manifest gives the AI context on placement logic and dependency paths without needing the full source code.

It is just an approximation and will certainly not guarantee it captures all dependency, nor will it understand the sensibility of those dependencies for each component and how it will fit into your new project, but it will be a start.

3. Extract Architectural Pattern Rules

Manually define your project's conventions in a single file, eg: /architecture.md.

  • Example Rules:
    • UI components live in /src/components.
    • Business logic is isolated in /src/core.
    • All features expose useFeatureX() entrypoints.

This allows the AI to correctly infer where a new feature goes, preventing long-term maintenance problems (entropy from ad hoc additions, if you even understand what it means).

4. Encode Codegen Policies

Create a dedicated policy file (e.g., .cursorrules or .prompt-template) that injects declarative guidelines:

rules:
  - prefer functional components with hooks
  - follow existing folder conventions defined in architecture.md
  - all features must have matching tests under /__tests__
context:
  - architecture.md
  - deps.json

This transforms the AI from a code copier into a procedural generator that adheres to your established internal standards.


Part 2: Sequential Extraction for Scale

Generating the knowledge base is itself a scaling problem. You can attempt to summarize everything manually by breaking each component for more control, but for more experienced developers, you should look into automating the process of progressive decomposition.

1. Programmatic Boundary Detection

Instead of manual guesswork, use static analysis to detect cohesive module clusters. Basically, going back to the previously suggested idea of using depcruise on repo.

npx depcruise src --output-type json > deps.json
# Run a script to cluster files based on intra-dependencies (e.g., 60% internal links).

Each resulting cluster represents a feature capsule candidate, marking a clear functional boundary.

2. Auto-Summarize Each Cluster (The Sequential Loop)

For each detected cluster (a small, cohesive set of files), run an automated process:

  • Isolate: Load only the files belonging to the current cluster.

  • Extract: Pull top-level function signatures and module exports (the "interface surface").

  • Feed the extracted interface and a fixed template into a powerful LLM:

Prompt: 

Summarize this module group into a Feature Capsule. Include the Name, Purpose, Main Input/Output, Key Dependencies, Side Effects, and an Example Usage Pattern.
  • Store: Save the resulting markdown to /ai_docs/features/<slug>.md.

The intention is to let the LLM encodes system knowledge without embedding the code itself. Generating code later is the easier part. The hard part is getting the system knowledge and design right and within context limits.

3. Continuous Maintenance and Indexing

  • Indexing: Generate an index.json file that maps each feature capsule ID to its corresponding source files and dependencies. This allows the AI tooling to quickly search and retrieve only the necessary capsules for any given task.

  • Verification: Manual verification passes are performed only on the most critical capsules (auth, billing). Mark these as verified: true in their frontmatter.

CI/CD Integration: Integrate the capsule refresh process into your Continuous Integration (CI/CD) pipeline.

jobs:
  update-capsules:
    runs-on: ubuntu-latest
    steps:
      - name: Generate Dependency Graph
        run: npx depcruise src --output-type json > deps.json
      - name: Update Feature Capsules
        run: node updateFeatureCapsules.js

This transforms your AI documentation layer into a dynamic knowledge graph that evolves automatically with your codebase, ready to be queried by an LLM at any time without overloading its context.


Disclaimer

AI is used to assist with the writing of this article. Mainly its purpose to to help me with more precise descriptions for the audience when it comes to writing a technical article.

Certain practices here are suggested by AI (eg: depcruise) and I have not personally simulated enough projects to know the technical limits of those suggestions. All I know is that it has helped me with my problem, and I'm writing mainly as a reference to myself if I ever encounter the same problem again in the future.

Experiment to test out this method's power

Suppose you want to truly test out the power and limits of this method, my suggestion is to:

  1. go to Youtube, search for app tutorial
  2. go to the final repo, download it and follow the instructions suggested here to generate /ai_docs/features and architecture.md.
  3. from /ai_docs/features and architecture.md that is generated by AI, attempt to reproduce the app WITHOUT referencing Youtube.
  4. Verify if the vibe coded app matches in functionality and ease of maintenance when compared to the app that is referenced directly from Youtube step-by-step.

Footnotes

  1. If you ever had any doubts in the past, now computer science theory has confirmed that even for bots, multi-tasking is as unproductive for it as it is for humans.