The Squid Problem

I’m a polymer science PhD student. I started off in 2021 at an analytical chemistry company called AltaSci Labs. I ran into this problem where I would perform experiments and then months later would find suspicious data on a computer I hadn’t been on before and realize the experiment I did was already done years prior and got the same result as me. It was buried in a pile of lab notebooks. Nobody has time at work to read every single paper notebook and memorize every experiment that has ever been done. It was a failure of institutional memory in a high turnover environment. This ended up wasting a ton of time and resources.

Chiroteuthis veranyi, a deep-sea squid, illustrated by Ernst Haeckel in 1904 — *Chiroteuthis veranyi.* From Ernst Haeckel's *Kunstformen der Natur* (1904).

I call this the “squid problem” in business management. In their lives, squid (more accurately cuttlefish) reach the intelligence level of human children in some respects. You would think a species so intelligent would have some sort of civilization under the sea. They can use tools, they can delay gratification. However, the reality is that we don’t see this, there is no Atlantis-style squid city. The lifecycle of a squid may be to blame: when they give birth, it is a death sentence. The parent can never pass on knowledge to their offspring.

If your business operates this way, you are in serious trouble. Many can scrape by getting lucky with lifer employees, but in the modern age where job hopping is more prevalent than ever, you need to make institutional knowledge as accessible as possible.

The Journey

2021Problem

2021OneNote

2022Grad School

2022Notion

2022Obsidian

OneNote: The First Attempt

I ended up creating a master lab notebook for the whole team with OneNote (using this paper), leveraging our 365-based environment at the company. Everyone transitioned to the ELN solution, and we rapidly got a situation where experiments were findable with Ctrl+F and what would have taken days of searching could be done in seconds. Suddenly instrument repairs/maintenance, experiment results, reports, goals were all searchable and accessible. I left that company and unfortunately my lab notebook with it, but these new data practices kept going in my absence.

What Does the Perfect ELN Need?

I started grad school in 2022. I had my gripes with the OneNote system. The files are sort of locked into the OneNote cloud. You can export PDFs or local copies, but a lab notebook needs to be robust. If Microsoft decides to shutter OneNote, your notes are gone. There is no alternative. Best case scenario, they give you a generous grace period to offload everything to a new software, but this will be highly error-prone and tedious work. I needed to come up with a system, and fast. I needed to create something that would leave me with absolutely no regrets in 4.5 years when I’m writing my thesis. Even a system that I can onboard companies to after I graduate.

So what does the perfect ELN have that OneNote doesn’t? Click each tool to compare:

True Ownership Files are 100% yours, zero vendor lock-in

Version Control See who changed what, when, and why

Linking & Backlinks Connect experiments, chemicals, papers

Extensibility Plugins and customization

Cross-Platform Works on mobile and desktop

5/5 Obsidian meets all requirements

Trying Notion

My first stop was Notion. It’s quite a beautiful piece of software, very nice looking and modern. They have a free premium account for EDU users. However, there were cracks. Notion suffers from a cloud-first approach. While you can manually export your data, there’s no officially supported way to do it automatically. In fact there are scary instances online of people suddenly getting banned from Notion with little communication and losing all their notes in the process. Even if you follow the rules, if someone were to hack your account and violate Notion’s ToS, all your notebook is at risk. Also, dormant notebooks are deleted after 5 years. This means that the normal practice of archiving a notebook after an employee or student leaves is unsupported.

Why Obsidian

Now that Notion was out, I looked to a budding personal notes application called Obsidian. Released in 2020, it met all of the criteria. It supported Markdown for rich formatting that is both easy to write and fully readable by a text editor from Windows 98. It is local-first, cloud optional. It supports linking, which is great for a lab notebook. It supports community plugins for extending functionality. And it supports Git for version control.

Architecture

I quickly got to work gaming this out and established a methodology.

Sync & Backup Architecture — v1

%%{init: {'theme':'dark'}}%%
flowchart LR
    iPhone[fa:fa-mobile-alt iPhone]:::device
    Mac[fa:fa-laptop-code MacBook]:::hub
    GitHub[fa:fa-code-branch GitHub]:::cloud
    NAS[fa:fa-server Group NAS]:::local

    iPhone <-->|iCloud| Mac
    Mac -->|git push| GitHub
    Mac -->|rsync| NAS

    classDef hub fill:#0EA5E9,stroke:#0284C7,color:#fff,font-weight:bold
    classDef device fill:#334155,stroke:#475569,color:#E2E8F0
    classDef cloud fill:#8B5CF6,stroke:#7C3AED,color:#fff
    classDef local fill:#059669,stroke:#047857,color:#fff

This architecture follows the 3-2-1 rule for data integrity. Minimum 3 backups, stored in two different media, with one backup offsite. This makes your data effectively impossible to lose.

In this case we had two cloud storage solutions (iCloud + Git) and 3 local storage (Mac + phone + NAS). An important thing to remember here is that backup ≠ sync. If you have sync but no backup, any data issue will be synced to all of your devices and overwrite your valuable data.

This architecture remained mostly unchanged. The only change I’ve implemented is subscribing to Obsidian Sync for convenience. This is because this system predates iOS’ “keep on device” feature, causing notes to get offloaded to the cloud and causing issues on my then-aging iPhone 12 Pro Max 128GB.

Sync & Backup Architecture — v2 (Current)

%%{init: {'theme':'dark'}}%%
flowchart LR
    iPhone[fa:fa-mobile-alt iPhone]:::device
    Mac[fa:fa-laptop-code MacBook]:::hub
    GitHub[fa:fa-code-branch GitHub]:::cloud
    NAS[fa:fa-server Group NAS]:::local

    iPhone <-->|Obsidian Sync| Mac
    Mac -->|git push| GitHub
    Mac -->|rsync| NAS

    classDef hub fill:#8B5CF6,stroke:#7C3AED,color:#fff,font-weight:bold
    classDef device fill:#334155,stroke:#475569,color:#E2E8F0
    classDef cloud fill:#0EA5E9,stroke:#0284C7,color:#fff
    classDef local fill:#059669,stroke:#047857,color:#fff

Plugins

In terms of addons, a few work well for me. Click to expand:

Citations Essential

Hooks up my Zotero library for in-text citations. "I did it this way because of X paper" becomes a clickable link to the paper's metadata page.

Custom Attachment Location Essential

Organizes images and attachments into tidy folders per note. No more dumping everything into one chaotic folder.

Git Essential

The backbone of version control. Auto-commits every minute, so every change becomes a diff. The repo doubles as a minute-resolution journal of what I touched and when.

Homepage

Sets a specific note as the landing page when you open the vault. Powers my dashboard with live queries.

Image Toolkit

Click-to-zoom, easy copying for presentations. Makes working with SEM images and plots much smoother.

Various Complements Essential

IDE-like autocomplete for everything. Start typing [[Chem and it suggests all your chemical pages. Huge productivity unlock.

Templater Essential

Templates with JavaScript in them. My recipe calculator asks for a target batch volume and emits a scaled reagent table on the fly. My group meeting template prompts me to pick presenters from a roster, then generates one subsection per name.

Dataview Essential

SQL-lite queries you embed directly in notes. The dashboard uses it to surface recently-modified projects; the group meeting digest uses it to aggregate tasks I've completed over the last N weeks.

Meta Bind

Buttons, toggles, and inputs that live inside notes. Paired with Templater, one click on the dashboard scaffolds a new meeting note, files it in the right folder, and opens it in the active tab.

Tasks

Extends Obsidian's - [ ] checkboxes with query syntax. The dashboard pulls every open task across the vault; the digest shows what I completed since the last group meeting.

The Dashboard

The Homepage plugin lets me build a landing page with live queries. Mine shows recent project activity, a to-do list pulled from meeting notes, and quick-create buttons for new meetings.

Dashboard.md

Recent Active Project Notes

GEL-014 Gelest silicone cure study2026-02-10

SYL-008 Sylgard thermal series2026-02-07

PA-023 Crosslink density2026-02-05

OO-061 Quick viscosity check2026-02-03

To-Do

Rerun DSC on GEL-014 samples Due 2026-02-14

Order more Sylgard 184 Due 2026-02-12

Send SEM images to collaborator

Recent Meetings

2026-02-07 Adamson Meeting Adamson Meetings

2026-02-05 Group Meeting Group Meeting

2026-01-31 Adamson Meeting Adamson Meetings

New Meeting Notes

The two buttons at the bottom of that dashboard (New Advisor Meeting and New Group Meeting) automatically set up meeting notes in one click. Each one is about six lines of YAML:

type: templaterCreateNote
templateFile: "Templates/Advisor Meeting.tmpl.md"
folderPath: "Meetings/Advisor"
fileName: "temp-advisor"
openNote: true

That’s the Meta Bind plugin wiring a button to a Templater template. Click, the template fires, the new note lands in the right folder, gets auto-renamed to YYYY-MM-DD Advisor Meeting by logic inside the template, and opens in the active tab. Group meeting notes ask me who is talking and automatically set up per-person sections.

I use Meta Bind + Templater it for meeting notes, weekly reviews, and the recipe calculator.

Bases: Finding What I’ve Abandoned

Obsidian added a feature called Bases in 2025 that turns any folder of notes into a database. Filters, formulas, sortable tables. No plugin to install, no SQL to learn. I haven’t seen anyone writing about it in an ELN context yet, but it’s the single biggest addition to the app in a year besides the CLI.

I use it for exactly one job so far: surfacing work I’ve quietly abandoned.

The formula is a single line:

formulas:
  days_idle: '(today() - file.mtime).days'

That column gets added to every row. Sort descending and the top of the list is a lineup of experiments I started with enthusiasm and forgot about six weeks later. Manuscripts that haven’t moved. Side projects that never got prototyped.

Grad school is mostly a series of things you meant to come back to. No plugin will force discipline, but a dashboard that shows you haven’t touched this in 83 days removes the excuse of not noticing.

The companion view is a plain project-activity table sorted by file.mtime. It answers the question your advisor will actually ask you: what have you been doing this month? Nothing else in Obsidian answers it cleanly.

How the Vault Is Organized

I spent a lot of time refining the folder structure to be as organized as possible. Luckily both Obsidian and Git handle reorganization gracefully. Click folders below to explore:

Lab-notebook 0/12 explored

Active Projects Hot

Where the magic happens. Each project gets its own folder with sub-series organized by chemistry or approach.

One-off/

OO-055 viscosity check.md

OO-056 solvent test.md

OO-057 quick cure.md

Silicone-Composites/ Main thesis

SYL-008 thermal series.md

SYL-009 filler loading.md

GEL-014 cure kinetics.md

GEL-015 rheology.md

Polymer-Additives/

PA-023 crosslink density.md

PA-024 aging study.md

SEM/ Microscopy

2026-02-03 SEM session.md

2026-01-15 SEM session.md

Ideas.md Scratchpad

Secondary Projects

Lower priority work, side projects, and industry collaborations that don't need daily attention.

Hydrogels/

HG-003 swelling test.md

Industry-Collab/ NDA work

ACME-001 formulation.md

ACME-002 scale-up.md

Archived Completed

Completed or discontinued work. Never deleted—you never know when old data becomes relevant again.

Epoxy-Project/ 2023

Failed-Catalyst-Series/ 2024

Undergrad-Mentee-Work/

Manuscripts

Paper drafts, figures, analysis notes. Also where I keep peer reviews of others' work.

Thermal-Conductivity-Paper/

Draft-v3.md

Figure-notes.md

Reviewer-responses.md

Peer-Reviews/

ACS-Macro-2025-review.md

Polymer-2024-review.md

Meetings

Every meeting gets a note. Templates auto-fill date, attendees, and action item sections.

Advisor/

2026-02-07 Advisor Meeting.md

2026-01-31 Advisor Meeting.md

2026-01-24 Advisor Meeting.md

Group-Meeting/

2026-02-05 Group Meeting.md

2026-01-29 Group Meeting.md

Seminars/

To-Do.md Aggregated

Literature 142 papers

Zotero integration. Each paper gets a note with metadata—linking lets me see everywhere I've cited it.

@smithThermalConductivity2023.md

@chenSiliconeComposites2022.md

@patelCrosslinkedNetworks2024.md

@jonesPolymerRheology2021.md

@wangFillerDispersion2023.md

... 137 more

Atomic Concepts Zettelkasten

Reusable knowledge snippets. One concept per note—link them everywhere for emergent connections.

Crosslink density.md

Percolation threshold.md

Glass transition.md

Platinum cure mechanism.md

Filler aspect ratio.md

Technical Info

Reference library. Every chemical, instrument, and software tool I use has a dedicated page.

Chemicals/

Sylgard 184.md + aliases

Heptane.md

Karstedt catalyst.md

DMS-V21.md

Instruments/

DSC Q2000.md

Blue oven.md 70C oven

Rheometer AR-G2.md

SEM Quanta 250.md

Software/

ImageJ macros.md

TRIOS tips.md

Lab Management

Equipment inventory, group member pages, and lab inspection checklists.

Equipment/

Vacuum oven.md

Speedmixer DAC 150.md

Fume hood 3.md

Group-Members/

Alice (undergrad).md

Bob (postdoc).md

Safety inspection checklist.md

Templates Power User

Pre-built templates that auto-fill metadata. The recipe calculator even scales formulations automatically.

Advisor Meeting.tmpl.md

Group Meeting.tmpl.md

Experiment Note.tmpl.md

Recipe Calculator.tmpl.md

Chemical Page.tmpl.md

Dashboard.md Start Here

The landing page. Live queries show recent projects, to-dos, and quick-create buttons.

The general lifecycle of a project is to start in either Active or Secondary Projects depending on priority. The project gets its own folder. Inside that folder are different variants of that project, represented by a 3-letter 3-digit sample code.

This is typically organized by chemistry, vibe, or how I’m seeing the project. For example, SYL for Sylgard-based silicones. GEL for Gelest.

Example sample codes: GEL-012 being the 12th in a series of Gelest silicone experiments. OO-055 being the 55th one-off experiment I’ve done.

Anatomy of a Note

Notes can take many shapes and sizes in Obsidian. Some of my notes are quick and some are like books. If I make samples and test them in a variety of ways, the note will be very long with multiple sections denoted by hashtags and separated with line breaks.

A quick one-off might look like this:

Markdown Preview

# OO-001 quick test

Testing effect of additive X on reaction Y

5.00g monomer
80mg initiator
~3 drops catalyst

Left in hood at RT at 2024-02-07 13:28

OO-001 quick test

Testing effect of additive X on reaction Y

5.00g monomer
80mg initiator
~3 drops catalyst

Left in hood at RT at 2024-02-07 13:28

A common strategy I use is a formulation table. This can be trivially generated using an LLM and pasted into a notebook and filled out as ingredients are measured:

Markdown Preview

| Chemical      | Target (g) | Actual (g) |
| ------------- | ---------- | ---------- |
| Component A   | 5.00       |            |
| Component B   | 0.54       |            |
| Solvent       | 0.50       |            |
| DI Water      | 6.96       |            |

Chemical	Target (g)	Actual (g)
Component A	5.00
Component B	0.54
Solvent	0.50
DI Water	6.96

Linking

One of the benefits of Obsidian is the linking. I can not only link related experiments to provide context, but chemicals as well. In my Technical Info folder I have a note of every chemical I use with all details including a picture of the bottle. When I write an experiment note, I link [[Chemical X]] or [[Solvent Y]]. This allows me to see interactively, every single time I have used Chemical X or Solvent Y. If I wanted a list of every experiment where I used heptane in my graduate career, it would take me around 10 seconds to get it. This also provides provenance for future researchers reading my notebook, if they can see exactly what chemical I used from exactly what source it might make their lives a lot easier.

Literature references work in a similar self-contained manner. My Zotero library syncs into Obsidian through the Citations plugin, and I can cite papers inline: “Added reagent X to modify the system as described in [[@authorPaperTitle2024]].” The paper has a page in the Literature folder with metadata including title, authors, and DOI. Clicking it shows me every note where I’ve referenced it.

Equipment also gets linked. For example, I have a page for our polymerization oven in the lab. Both “Blue oven” and “70C oven” link here, making it totally unambiguous what I’m writing about. A student in 20 years could read this note and see exactly what was meant by “blue oven” and see a photo of it.

Click a chemical below to see its backlinks—every experiment that ever used it:

Sylgard 184 12 backlinks

SYL-008 thermal series"Mixed [[Sylgard 184]] base with 5wt% filler..."

SYL-009 filler loading"Prepared [[Sylgard 184]] at 10:1 ratio..."

GEL-014 cure kinetics"Control sample using [[Sylgard 184]]..."

OO-042 viscosity baseline"Measured viscosity of [[Sylgard 184]]..."

OO-038 demolding test"Cast [[Sylgard 184]] in aluminum mold..."

PA-019 adhesion study"Compared [[Sylgard 184]] to..."

+ 6 more backlinks

Why I Don’t Do Zettelkasten

Every Obsidian tutorial on YouTube spends half its runtime selling you on the zettelkasten method: link everything to everything, build a ‘second brain’, watch the graph bloom. This is the wrong shape for a lab notebook.

Zettelkasten is for synthesis. You add ‘atomic concepts’ and tease connections out of these; for example, a note for ‘radical polymerization’ which explains it, a note for ‘styrene’ which explains what styrene is. A lab notebook is a ledger: you write entries in the order you make them, you don’t go back and reorganize, and most entries will never be read again. This ends up being a huge time-sink. I know what radical polymerization is. I know what styrene is. There is no great utility in making a pretty graph of connected notes.

My graph has huge swaths of orphans and that’s fine. The things I actually link are the recurring ones: chemicals, instruments, papers. Those earn the wikilink because I’ll touch them again tomorrow. A specific experiment almost never gets linked back to, because almost nothing references it again unless there is some kind of iteration. e.g. ‘Modified version of [[SYN-007]] protocol, glycol instead of water’.

Tags have the same problem. Early on I tried tagging every concept and realized that tags operate the same as wikileaks but contain less information. A ‘heptane’ note is a lot more useful than a #heptane tag because I can encode where the heptane was acquired from, when it was purchased, what the purity was, etc. Folders and sample codes do the work instead. OO-055 tells you the project, the sequence number, and roughly when it happened. I didn’t have to pick any metadata for that; the filename did it.

The default Obsidian evangelism is written for note-taking hobbyists. Their goals are different from ours. Steal what you want, leave the rest.

Automation

One of the most annoying things about taking notes is writing the same thing over and over again. I solve this with Obsidian. I have templates for meeting notes that autofill date, time, attendees, and meeting type. The note pre-populates agenda, notes, and action items sections. Action items automatically go into my Tasks on my dashboard. I even have a recipe calculator which can automatically scale recipes to fill a specified volume. Try it below:

New Experiment Note

Click the template button to generate a scaled formulation table:

<%*
/*──────────────── BASELINE FORMULATION ────────────────*/
const baseMass = {                   // grams
  "Silicone Base": 5.00,
  "Crosslinker": 0.55,
  "Solvent": 0.50,
  "DI Water": 7.00
};
const rho = {                        // densities, g mL⁻¹
  "Silicone Base": 0.97,
  "Crosslinker": 0.98,
  "Solvent": 0.68,
  "DI Water": 1.00
};
const filler_wt_pct = 7.0;           // wt % vs. dry polymer
/*───────────────────────────────────────────────────────*/

// Calculate baseline volume from mass and density
let V_base = 0;
for (const k of Object.keys(baseMass)) V_base += baseMass[k] / rho[k];

// Prompt user for target volume
const userInput = await tp.system.prompt(
  `Desired batch volume (mL)? [default: ${V_base.toFixed(2)}]`,
  V_base.toFixed(2)
);
const V_target = parseFloat(userInput) || V_base;

// Scale all components proportionally
const scale = V_target / V_base;
const dryPolymer = (baseMass["Silicone Base"] + baseMass["Crosslinker"]) * scale;
const m_filler = dryPolymer * filler_wt_pct / 100;

// Emit Markdown table directly into the note
tR += "| Chemical | Target (g) | Actual (g) |\n";
tR += "|----------|------------|------------|\n";
for (const [chem, mass] of Object.entries(baseMass)) {
  tR += `| ${chem} | ${(mass * scale).toFixed(2)} |  |\n`;
}
tR += `| Conductive Filler | ${m_filler.toFixed(2)} |  |\n`;
%>

Git Becomes the Journal

Obsidian Git auto-commits the vault every minute, with messages like vault backup: 2026-04-21 08:49:32. git log is now a minute-resolution record of when I was actually in the notebook. Every single change is now perfectly version-tracked with authenticity guaranteed by a 3rd party. The only way it could be fudged is with access to GitHub’s servers. Every discovery is part of an immutable ledger.

Filter to a specific file and you get every time it was touched:

$ git log --oneline -- "**/*OO-023*"
0efaa11 vault backup: 2026-04-21 08:49:32
a3cf2d9 vault backup: 2026-04-16 11:22:08
b71ec2e vault backup: 2026-04-12 14:03:51
8ea1b77 vault backup: 2026-03-29 10:17:04

Every one of those commits is a diff. git diff tells me what changed between April 12 and April 16. git show reconstructs what a note looked like the day I submitted a manuscript. This is a real problem that needs to be solved with an ELN system. With a paper notebook, you can spill coffee on it. This is immediate and noticeable. With an electronic notebook, if there is some file issue, it can plausibly take months to notice. If every single change is permanently recorded, hashed, and timestamped noticing a file issue a year later no longer matters.

The biggest gotcha is filesize. GitHub only supports files < 100 MB. Git-LFS exists but this isn’t really a cost-effective backup solution. GitHub isn’t a backup provider, it’s primarily for code so ultimately it monitors for changes in text files. Perfect for this application but you’re not going to be uploading videos. Pictures are going to work nicely.

Really, you need some kind of backup system. I have a cronjob set up which automatically compresses my vault and rsyncs to a backup server in a grandparent-parent-child manner giving me weekly backups for 4 weeks, monthly backups for 6 months, and yearly backups indefinitely.

That way I don’t lose anything ever.

Remember, backup IS NOT sync. If you only have a sync solution and all of your files become corrupted, you sync that corruption and lose everything.

Notes an AI Agent Can Actually Read

Claude Code can read a vault like this cover to cover. Whether that’s useful depends on how you wrote things down.

Two things I’ve added for this.

Provenance tagging. Claude will happily draft sections for you: literature summaries, analysis paragraphs, manuscript outlines. Six months later you won’t remember which paragraphs were yours. I mark AI-drafted text with a custom callout:

> [!claude] Claude Opus 4.7, 2026-04-21
> Based on the data in OO-055, the higher-loading samples
> show a sharp transition in conductivity between 8 and 12 wt%…

In Obsidian, that renders as:

Claude Opus 4.7, 2026-04-21

Based on the data in OO-055, the higher-loading samples show a sharp transition in conductivity between 8 and 12 wt%…

The CSS that produces it is about fifteen lines:

.callout[data-callout="claude"] {
    --callout-color: 204, 120, 50;
    background-color: rgba(204, 120, 50, 0.06);
    border-left-color: rgb(204, 120, 50);
}

.callout[data-callout="claude"] .callout-icon .svg-icon {
    display: none;
}

.callout[data-callout="claude"] .callout-icon {
    background-image: url("data:image/svg+xml,...");  /* Anthropic starburst */
    background-size: 20px 20px;
    background-repeat: no-repeat;
    background-position: center;
}

The full file (with the embedded starburst SVG) is here. Drop it in your vault’s .obsidian/snippets/ folder and toggle it on under Settings → Appearance → CSS snippets.

I can tell at a glance what came from a model and which model it was. When a future model reads the note, the callout signals that the section is interpretation rather than primary data.

A dedicated workspace folder. I keep a Claude/ directory with subfolders for Analysis, Drafts, Literature reviews, and scripts. Anything an agent writes at length lives there. The experimental notes stay what they are: records of what I did at the bench, not AI speculation commingled with data.

Andrej Karpathy described a related workflow on Twitter recently: Obsidian as the “IDE frontend” for an LLM-maintained wiki, agents researching and extending a directory of markdown files. His LLM is the primary author of the knowledge base. Mine is a marginal contributor to a human-authored one. Same substrate, different role. Plain markdown in a folder tree, versioned with git, viewed through Obsidian. That combination happens to be close to the best thing you can hand a modern agent.

Obsidian’s CEO Steph Ango (kepano) has published an official set of Claude skills for working with vaults: reading notes, editing markdown, running Bases queries, building canvases, extracting clean text from webpages. The app itself ships with an obsidian command-line tool that lets scripts and agents drive a live vault: run plugin commands, execute JavaScript in the app’s context, create and edit notes on demand. When the person running the product is wiring it into Claude himself, that’s a roadmap signal you can’t get from a press release.

The bigger point: RAG was always a hack. Context windows used to be too small to hold the data you wanted to ask questions about, so the industry built retrieval pipelines around the limit: chunking, embeddings, vector databases, reranking. Every one of those layers is a place to lose the right answer. Now that context windows are big enough, most of that machinery is being preserved by inertia alone.

Plain markdown, wikilinks, and git don’t make a good RAG corpus. They make a corpus you don’t need RAG for. Four years of lab notes in this format fit inside Claude’s context with room to spare. The format has been ready for decades. The tools caught up on March 13, 2026.

A Million Tokens, to Scale

A million tokens is about 750,000 words of novel-style prose. Scientific markdown is denser: my lab notes are 298,000 tokens at only 140,000 words — wikilinks, tables, and chemical names all cost more tokens per word than plain prose.

Context, to scale

In one month, the ceiling went up 5×.

Before March 13, 2026, Claude could hold 200,000 tokens at once. After, one million. Here is what fits — and what doesn't.

A typical tweet

≈ 70 tokens

This blog post

≈ 10K tokens

My lab notebook

4 years · 298K tokens

War and Peace

full novel · 780K tokens

Harry Potter series

all 7 books · ~1.5M tokens

doesn't fit

Bar widths scaled to 1.5M tokens. Token counts via o200k_base tokenizer on full source texts where available; Harry Potter total is word count × prose token ratio.

Below 1M, retrieval is a prompt engineering problem. You decide which documents to load. Above 1M, the model reads everything and finds what it needs. The work migrates from the prompt into the model.

What Does This Get You?

When I started drafting this post in February, the “what does this unlock?” section was speculative: one day you could point an AI agent at your notebook and have it find forgotten experiments. That day arrived while I was still writing.

Anthropic released 1M-token context generally on March 13, 2026. Following this release, I loaded years of my own notes into Claude Code in a single session. I opened a conversation about a new project. Mid-conversation, unprompted, Claude surfaced SYN-007. Years ago I’d made that sample using a specific additive, and it came out conductive which was a surprise at the time but wasn’t really worth pursuing. I logged the anomaly and moved on. I began revisiting that additive and Claude had both the old note and this week’s question in the same context window. The match was obvious to Claude but I had forgotten. Claude’s working memory reached back years through my own notes; mine didn’t. I hadn’t thought about SYN-007 in years. Even as Claude told me about it, I had literally no memory of it. To me, it was something that might as well have been magic.

The squid problem, solved.

Now this is a small example of achieving superhuman memory over what I had been working on for the past few years.

Now scale that up. SYN-007 was one match in one conversation. At a company with hundreds of researchers and decades of archived notebooks, the matches are everywhere. The same additive gets screened in parallel by three groups because none of them knew the others had already tried it. A protocol gets re-derived from first principles because the person who figured it out in 2019 left. A toxicity test gets commissioned for a compound that was already tested and shelved six years ago. At scale, this is millions of dollars a year in duplicated work.

If you give an AI a god’s-eye view over every experiment a company has ever run, “I didn’t know” stops being an excuse. “Understandable you didn’t read a hundred old lab notebooks” becomes “you wasted company resources because you didn’t spend thirty seconds asking the model?” Negligence gets a new baseline and the bar is permanently raised.

Plenty of other things fall out of this. Thesis-writing gets dramatically easier when every experiment is one search away. You can hand Claude an open question, let it wander the vault for a few minutes, and come back to notes you would never have sat down to extract yourself. An instrument repair from three years ago is now findable by a group member who wasn’t even around when it happened. What is really unlocked here is a context-aware search. No matter what words were used when writing and what words are used to search, the LLM will make the connection.

But what really falls out of this is that the bar is moving up across the board. Forgotten experiments not making the cut on your thesis will cease to be acceptable. Not finding instrumentation maintenance logs from years ago will become gross incompetence. People who refuse to use AI due to their personal beliefs will be competing against people that have the rough equivalent of a personal research assistant that knows everything you’ve done and forgets nothing. The greatest incompetence of all will be taking research notes in a way that can not be parsed by an LLM (i.e. in anything other than markdown files). Companies that refuse to adopt file-first electronic lab notebooks will be left behind. OCR will likely never work perfectly for handwriting so handwritten lab notebooks will become a massive liability; even more so than they are today.

What About Privacy?

Loading your research into an AI raises a fair question: is Anthropic training on your data? Per their published policy, no. The API (which is what Claude Code uses) does not use your inputs or outputs for training unless you explicitly opt in. Enterprise and Team tiers layer additional contractual protection on top and let admins disable feedback submission entirely. Anthropic’s Trust Center lists the third-party audits that back the security substrate those policies run on: SOC 2 Type 2, ISO 27001, ISO 42001, HIPAA, NIST 800-171 among others.

This is all conditional on Anthropic being honest about their published policy. If you don’t extend that trust, the API tier isn’t for you. But the rest of the system still works. Plain markdown + git + Obsidian is entirely local. You’d swap Claude Code for a locally-hosted open-weights model (trading capability for full isolation) and keep everything else.

Your writing is likely already in Microsoft 365. That’s also a conditional trust relationship. If you’d put your data on OneDrive, the API trust question has roughly the same contours.

Caveats

This system is still not easy to set up. It’s not a turnkey solution for non-technical people. The best system is the one that gets used, and unfortunately this one is too complicated for someone who doesn’t want to touch a terminal.

There is a lot of untapped value here. If everything detailed here could be packaged up for someone in a way that a grandmother could use it, it would handily beat anything on the market. With added FDA 21 CFR Part 11 compliance, this would be THE product in the scientific space. Unfortunately, ELN providers are for the most part pay-for-play and are often locked in terms of access to your own files. There is no good solution. And unfortunately, my experience with researchers is that even if a perfect solution exists, is cheap/free, and turnkey, the “if it ain’t broke, don’t fix it” mentality will lead them to just do what they’ve always done.

Luckily, I’m in an environment where this level of experimentation is allowed. I’m confident that if you’re willing to learn systems like Git, I have demonstrated one of the best systems for an ELN currently possible. If you take anything from this post, I hope it’s your duty to solve the squid problem.

AI-generated illustration, in the style of Ernst Haeckel's Kunstformen der Natur, depicting an imagined Atlantis-style cuttlefish civilization with ornate underwater architecture and multiple cephalopods engaged in civic and scholarly activity — *Atlantis-style cuttlefish civilization.* The plate Haeckel never painted, a civilization biology never permitted, now possible to at least picture.