How We Parse SQL in the Browser
SchemaLens runs entirely in your browser. No server receives your schema dumps. No data leaves your machine. That is not just a privacy promise—it is an architectural constraint that shaped every technical decision we made.
The heart of the tool is a SQL parser that turns CREATE TABLE statements into structured JavaScript objects. Those objects are diffed, migrated, and rendered without ever touching a network request. Here is how we built it.
Why not use an existing parser?
Our first instinct was to pull in node-sql-parser via CDN. It is mature, well-tested, and supports multiple dialects. We tried it. It worked—for simple schemas. Then we fed it a real production dump and watched it choke on:
- Composite primary keys with
CLUSTEREDhints - Inline
FOREIGN KEYreferences withON DELETE CASCADE - PostgreSQL
CREATE TYPE ... AS ENUMstatements - MySQL
TINYINT(1)rewritten fromBOOLEAN - SQL Server bracket-quoted identifiers like
[User Name]
Each failure meant either patching the library (slow, fragile) or preprocessing the SQL (error-prone). Worse, the minified build was ~500 KB. For a tool whose entire value proposition is "paste and see results in 2 seconds," a 500 KB parser felt absurd.
Constraint: We only need to parse CREATE TABLE, CREATE INDEX, and CREATE TYPE. We do not need SELECT, JOIN, WINDOW, or any DML. A full SQL grammar is overkill.
So we wrote a custom parser. It is ~400 lines of vanilla JavaScript, zero dependencies, and handles everything we need.
The pipeline: from text to objects
Every schema dump goes through four stages:
- Strip comments — Remove
--and/* */blocks so they do not confuse the tokenizer. - Split statements — Break on semicolons, but respect semicolons inside string literals.
- Parse each statement — Turn a
CREATE TABLEstring into a structured table object. - Normalize identifiers — Strip dialect-specific quotes so
`users`,"users", and[users]all becomeusers.
Stage 1: Strip comments
This is the simplest stage, but easy to get wrong. A naive regex like /--.*$/gm will break if a line contains a string with '--' inside it. Our approach is character-by-character:
function stripComments(sql) {
let out = '', i = 0;
while (i < sql.length) {
if (sql[i] === '-' && sql[i+1] === '-') {
while (i < sql.length && sql[i] !== '\n') i++;
} else if (sql[i] === '/' && sql[i+1] === '*') {
i += 2;
while (i < sql.length && !(sql[i] === '*' && sql[i+1] === '/')) i++;
i += 2;
} else {
out += sql[i++];
}
}
return out;
}
It is not elegant, but it is correct. Correct beats elegant when the input is an angry DBA's 3,000-line schema dump.
Stage 2: Split statements
Splitting on semicolons is also naive—what if a default value is a string containing a semicolon? We track whether we are inside a quoted string and only split when we are not:
function splitStatements(sql) {
const stmts = [];
let current = '', inString = false, stringChar = null;
for (let i = 0; i < sql.length; i++) {
const ch = sql[i];
if (!inString && (ch === "'" || ch === '"')) {
inString = true; stringChar = ch;
} else if (inString && ch === stringChar) {
inString = false;
} else if (!inString && ch === ';') {
stmts.push(current.trim());
current = ''; continue;
}
current += ch;
}
if (current.trim()) stmts.push(current.trim());
return stmts;
}
Stage 3: Parse CREATE TABLE
This is where most of the complexity lives. A CREATE TABLE statement has two parts: the table name and the body between the parentheses. We extract the body, then split it into column definitions and table-level constraints.
function parseCreateTable(stmt, dialect) {
const match = stmt.match(
/CREATE\s+TABLE\s+(?:IF\s+NOT\s+EXISTS\s+)?([\w"`\[\]]+(?:\.[\w"`\[\]]+)?)\s*\((.*)\)\s*[^)]*$/is
);
if (!match) return null;
const tableName = match[1].replace(/["`\[\]]/g, '');
const body = match[2];
// ... split body, parse columns, parse constraints
}
The hardest part is splitting the body into individual columns. Commas inside default values (e.g., DEFAULT ('a', 'b')) or type definitions (e.g., DECIMAL(10, 2)) must not be treated as column separators. Our solution tracks parentheses nesting depth and only splits on commas at depth zero.
Stage 4: Parse columns
Each column definition is a list of tokens. The first token is the name. Subsequent tokens form the type, followed by constraints:
function parseColumn(tokens, dialect) {
const col = {
name: tokens[0].replace(/["`\[\]]/g, ''),
type: '',
nullable: true,
defaultValue: null,
primaryKey: false,
unique: false,
autoIncrement: false
};
// Collect type tokens until we hit a constraint keyword
let i = 1;
const typeTokens = [];
const constraintKeywords = [
'NOT', 'NULL', 'PRIMARY', 'KEY', 'UNIQUE',
'DEFAULT', 'AUTO_INCREMENT', 'IDENTITY',
'REFERENCES', 'CHECK', 'COLLATE', 'COMMENT'
];
while (i < tokens.length) {
const t = tokens[i].toUpperCase();
if (constraintKeywords.includes(t)) break;
typeTokens.push(tokens[i++]);
}
col.type = typeTokens.join(' ');
// Parse constraints (NOT NULL, DEFAULT, PRIMARY KEY, etc.)
while (i < tokens.length) {
const t = tokens[i].toUpperCase();
if (t === 'NOT' && tokens[i+1]?.toUpperCase() === 'NULL') {
col.nullable = false; i += 2;
} else if (t === 'PRIMARY' && tokens[i+1]?.toUpperCase() === 'KEY') {
col.primaryKey = true; col.nullable = false; i += 2;
} else if (t === 'DEFAULT') {
i++;
const def = [];
while (i < tokens.length && !constraintKeywords.includes(tokens[i].toUpperCase())) {
def.push(tokens[i++]);
}
col.defaultValue = def.join(' ') || null;
} // ... and so on
}
return col;
}
Handling dialect quirks
Each database has its own syntax for the same concept. Rather than four separate parsers, we use a single parser with dialect-aware branches:
- PostgreSQL:
SERIAL,"quoted"identifiers,CREATE TYPE ... AS ENUM - MySQL:
AUTO_INCREMENT,`backtick`quotes,TINYINT(1)as boolean - SQLite:
INTEGER PRIMARY KEY AUTOINCREMENT, limitedALTER TABLE - SQL Server:
IDENTITY(1,1),[bracket]quotes,CLUSTEREDhints
The parser stores dialect on every parsed object, so the diff engine and migration generator can produce the correct output for the target database.
Diff engine: from objects to semantics
Once both schemas are parsed into object trees, diffing is straightforward:
- Compare table names. Any table in A but not B is "removed." Any table in B but not A is "added."
- For tables present in both, compare columns by name. Added columns, removed columns, and modified columns (type change, nullability change, default change) are flagged individually.
- Compare constraints structurally. A
PRIMARY KEY (id, org_id)is parsed into{ type: 'PRIMARY KEY', columns: ['id', 'org_id'] }, so two constraints with different whitespace or ordering are still recognized as identical.
This structural diff is why SchemaLens beats a text diff. A line-based diff would scream at you for reformatting whitespace. A semantic diff tells you that role_id changed from INT to BIGINT—the thing that actually matters.
Migration generation
The migration generator is essentially a big switch statement. For every detected change, it outputs the appropriate ALTER TABLE (or table recreation script for SQLite):
// PostgreSQL
ALTER TABLE "users" ALTER COLUMN "role_id" TYPE BIGINT;
// MySQL
ALTER TABLE `users` MODIFY COLUMN `role_id` BIGINT NULL;
// SQL Server
ALTER TABLE [users] ALTER COLUMN [role_id] BIGINT;
// SQLite (not supported — must recreate table)
-- SQLite does not support ALTER COLUMN. Recreate table with new schema.
Testing strategy
Because the parser runs in the browser, we test it with Node.js by mocking the DOM and extracting the script from app.html:
const fs = require('fs');
const html = fs.readFileSync('app.html', 'utf8');
const script = html.match(/<script>([\s\S]*)<\/script>/)[1];
const fn = new Function(script + '; return { parseSQL, diffSchemas };');
const { parseSQL, diffSchemas } = fn();
// Now we can unit-test the parser headlessly
const schema = parseSQL('CREATE TABLE users (id INT PRIMARY KEY);', 'postgres');
console.assert(Object.keys(schema.tables).length === 1);
This lets us run CI-style tests without a browser environment. We validate parsing, diffing, and migration generation for all four supported dialects on every commit.
Performance
The parser processes ~1,000 lines of SQL in under 10 milliseconds on a modern laptop. For a tool that parses two schemas and diffs them, the total runtime is typically under 50 ms. There is no perceptible delay between clicking "Compare" and seeing results.
Because there are no network round trips, the entire experience—from paste to diff to migration copy—feels instantaneous. That is the advantage of doing everything client-side.
Tradeoffs and limitations
Our parser is intentionally narrow. It does not handle:
CREATE VIEW,CREATE TRIGGER, orCREATE FUNCTION- Partitioned tables
- Complex
CHECKconstraints with subqueries - Stored procedures or PL/pgSQL blocks
For 95% of schema diff use cases, CREATE TABLE + CREATE INDEX + constraints is enough. When we hit the other 5%, we add support incrementally rather than pulling in a full grammar parser that would bloat the bundle and slow down the parse.
Philosophy: Parse what people actually paste. Not what the SQL standard says they should paste.
See the parser in action
Paste two SQL schemas and watch the diff engine work in real time. No install, no signup, no data leaves your browser.
Try SchemaLens FreeWritten by the SchemaLens team. We build tools that make database migrations less terrifying. Read more articles →