HTML vs Markdown Blog Analysis

HTML vs Markdown Blog Analysis | Generated by AI

Home 2026.06

Now I have a complete picture. Let me give you the analysis.

Here’s my analysis of HTML vs Markdown for your blog, looking at your actual codebase:

The Anthropic Point

Anthropic observed that when Claude generates Artifacts, HTML produces significantly better visual output than Markdown. The reason is simple: Markdown has ~15 structural primitives (headings, lists, tables, blockquotes, code blocks). HTML has unlimited expressiveness — custom layouts, callout boxes, color, typography, embedded media, interactive elements. When an AI agent is designing for human eyes, HTML is the richer canvas.

Your Current Pipeline (What I Found)

Your note creation flow is:

Clipboard → LLM → Markdown file — ww/note/create_note_from_clipboard.py
Content goes through: clean_grok_tags() → clean_content() → generate_title() → format_front_matter() → write_note()
Output: .md files in notes/ with YAML frontmatter
Jekyll processes them via kramdown (GFM) through _layouts/post.html
The layout just does `<main id="content" class="main-content post-content" role="main">

GPT-2 Training Loss Spike | Generated by AI

Home 2026.06

================================================================ GPT-2 124M TRAINING — github-code dataset ================================================================

CURRENT PROGRESS Iteration: 257,240 / 427,000 = 60.2% Tokens seen: ~8.43B / 14B Latest ckpt: step 257,000 (ckpt.pt, 1.4GB)

LOSS TRAJECTORY Step 187,000: train 3.1404, val 3.1927 Step 234,000: train 2.6758, val 2.6576 Step 257,000: train 3.0636, val 2.9376

⚠ NOTE: Loss increased from 234k → 257k Train: 2.6758 → 3.0636 (+0.39) Val: 2.6576 → 2.9376 (+0.28)

This could indicate:

Learning rate still too high (currently decaying from 6e-4)
Data ordering causing variance (normal in long runs)
Model hitting a plateau and bouncing

HARDWARE (RTX 4070) GPU: 100% util, 66°C, 208W VRAM: 5.3 / 12.3 GB MFU: 14.43%, 622ms/step

ESTIMATED TIME REMAINING ~170k steps left × 622ms = ~29.5 hours (~1.2 days)

HEALTH CHECK ⚠ Loss regression from best checkpoint at 234k You may want to monitor the next few eval intervals to see if it recovers. If not, consider:

Rolling back to the 234k checkpoint
Reducing learning rate further

Want me to keep monitoring, or check if the loss trend is stabilizing?

Back Donate

</main>

` — dumps rendered markdown into the page

Key files:

/Users/lzwjava/projects/ww/ww/note/create_note_utils.py — core utilities
/Users/lzwjava/projects/ww/ww/note/create_note_from_clipboard.py — main pipeline
/Users/lzwjava/projects/ww/ww/note/note_workflow.py — CLI workflow

The Tension You’re Feeling

Your notes are Q&A format with tables, code blocks, lists, blockquotes. Markdown handles this adequately for content. But if you want callout boxes, styled tables, expandable sections, custom typography — Markdown can’t do it without raw HTML embedded in the Markdown.

The real problem: Jekyll’s pipeline is Markdown-first. The _config.yml has markdown: kramdown with input: GFM. The layout does `<main id="content" class="main-content post-content" role="main">

GPT-2 Training Loss Spike | Generated by AI

Home 2026.06

================================================================ GPT-2 124M TRAINING — github-code dataset ================================================================

CURRENT PROGRESS Iteration: 257,240 / 427,000 = 60.2% Tokens seen: ~8.43B / 14B Latest ckpt: step 257,000 (ckpt.pt, 1.4GB)

LOSS TRAJECTORY Step 187,000: train 3.1404, val 3.1927 Step 234,000: train 2.6758, val 2.6576 Step 257,000: train 3.0636, val 2.9376

⚠ NOTE: Loss increased from 234k → 257k Train: 2.6758 → 3.0636 (+0.39) Val: 2.6576 → 2.9376 (+0.28)

This could indicate:

Learning rate still too high (currently decaying from 6e-4)
Data ordering causing variance (normal in long runs)
Model hitting a plateau and bouncing

HARDWARE (RTX 4070) GPU: 100% util, 66°C, 208W VRAM: 5.3 / 12.3 GB MFU: 14.43%, 622ms/step

ESTIMATED TIME REMAINING ~170k steps left × 622ms = ~29.5 hours (~1.2 days)

HEALTH CHECK ⚠ Loss regression from best checkpoint at 234k You may want to monitor the next few eval intervals to see if it recovers. If not, consider:

Rolling back to the 234k checkpoint
Reducing learning rate further

Want me to keep monitoring, or check if the loss trend is stabilizing?

Back Donate

</main>

. Frontmatter is YAML. The notes collection expects .md` files.

Three Approaches (Ranked by Pragmatism)

Approach 1: HTML-in-Markdown (Minimal Change)

Jekyll already supports raw HTML inside .md files. kramdown passes it through. You can keep .md files, YAML frontmatter, and the entire pipeline — but have the LLM generate HTML content blocks instead of Markdown syntax.

Current note:

**Question:** How do I manage information sharing?

**Answer:**

## The Core Mental Model

| Tier | Close Friend | Spouse |
|------|-------------|--------|
| Salary | No | Yes |

HTML-enhanced note:

---
title: Managing Relationship Information Sharing
layout: post
...

<div class="qa-card">
<div class="question">How do I manage information sharing?</div>
<div class="answer">

<h2>The Core Mental Model</h2>

<table class="styled-table">
<thead><tr><th>Tier</th><th>Close Friend</th><th>Spouse</th></tr></thead>
<tbody><tr><td>Salary</td><td>❌ No</td><td>✅ Yes</td></tr></tbody>
</table>

<div class="callout info">Information flows inward freely, outward carefully.</div>

</div>
</div>

Pros: Zero pipeline changes. Keep frontmatter, .md extension, Jekyll collection, everything. Just add CSS classes to _sass/.

Cons: You’re writing HTML inside .md files — feels wrong semantically.

Approach 2: Full HTML Files (Big Change)

Switch to .html files. Jekyll processes .html files too — it still runs Liquid templating and frontmatter on them. But:

create_filename() hardcodes .md extension (line 100 of create_note_utils.py)
notes_card.py does notes_path.glob("*.md") — would break
check_duplicate_notes.py likely globs *.md
_posts/ and notes/ both use .md
Every script that reads notes assumes markdown
The translation pipeline assumes .md
fix_liquid_raw_tags() in write_note() would need rethinking
fix_mathjax_in_file() and process_tables_in_file() in note_workflow.py operate on markdown

You’d need to change:

create_filename() — extension
format_front_matter() — still works (Jekyll frontmatter is the same)
write_note() — content generation
clean_content() — parsing logic
notes_card.py — glob pattern
check_duplicate_notes.py — glob + content extraction
note_workflow.py — post-processing pipeline
All translation scripts that read/write notes
PDF pipeline, audio pipeline

This is 15-20 files to change, plus all the content post-processing logic.

Approach 3: Hybrid — Keep Markdown for Storage, Generate HTML Layouts via AI (Recommended)

The Anthropic insight isn’t “stop using Markdown files.” It’s “when AI generates visual content, HTML gives better results.” In your context:

Keep .md files with YAML frontmatter — don’t fight Jekyll
Change the LLM prompt in create_note_from_clipboard.py to generate HTML content blocks instead of Markdown syntax
Add CSS to _sass/ for the HTML classes you want
Update _layouts/post.html to handle both markdown and HTML content gracefully

The pipeline change is minimal:

# In create_note_from_clipboard.py, change the prompt:
full_title_prompt = lambda c: (
    f"Give a short English title (at most 6 words, no quotes, no explanation) for:\\n{c}\\n\\nTitle:"
)
# Add a content formatting prompt:
format_prompt = lambda c: (
    f"Format the following as clean semantic HTML with these CSS classes: "
    f"`.qa-card`, `.question`, `.answer`, `.callout`, `.styled-table`. "
    f"Do NOT include <html>, <head>, <body> tags. "
    f"Only return the content HTML:\\n\\n{c}"
)

Then in write_note(), after the LLM generates HTML content, write it directly:

def write_note(file_path, front_matter, content):
    # content is already HTML from the LLM
    with open(file_path, "w", encoding="utf-8") as file:
        file.write(front_matter + "\n\n" + content + "\n")

My Honest Take

The Anthropic observation is about AI-generated visual artifacts — dashboards, interactive diagrams, styled documents. Your notes are textual Q&A content. For your use case:

Markdown is fine for 90% of your notes (headings, lists, code blocks, tables, blockquotes)
HTML adds value for: styled callout boxes, better table presentation, visual hierarchy, collapsible sections
The overhead of full HTML conversion is massive and touches 15+ files

My recommendation: Approach 3. Keep the pipeline, change the prompt to emit HTML content blocks, add CSS. This gives you the visual richness without rewriting the entire toolchain. The .md file extension is just a convention — Jekyll renders HTML inside .md files identically to .html files.

The one thing you’d lose: fix_liquid_raw_tags() and fix_mathjax_in_file() assume markdown fences. If the LLM generates HTML, you’d need to adjust those post-processing steps. But that’s 2-3 functions, not 20 files.

Back Donate