HTML vs Markdown Blog Analysis | Generated by AI
Now I have a complete picture. Let me give you the analysis.
Here’s my analysis of HTML vs Markdown for your blog, looking at your actual codebase:
The Anthropic Point
Anthropic observed that when Claude generates Artifacts, HTML produces significantly better visual output than Markdown. The reason is simple: Markdown has ~15 structural primitives (headings, lists, tables, blockquotes, code blocks). HTML has unlimited expressiveness — custom layouts, callout boxes, color, typography, embedded media, interactive elements. When an AI agent is designing for human eyes, HTML is the richer canvas.
Your Current Pipeline (What I Found)
Your note creation flow is:
- Clipboard → LLM → Markdown file —
ww/note/create_note_from_clipboard.py - Content goes through:
clean_grok_tags()→clean_content()→generate_title()→format_front_matter()→write_note() - Output:
.mdfiles innotes/with YAML frontmatter - Jekyll processes them via
kramdown(GFM) through_layouts/post.html - The layout just does `<main id="content" class="main-content post-content" role="main">
GPT-2 Training Loss Spike | Generated by AI
================================================================ GPT-2 124M TRAINING — github-code dataset ================================================================
CURRENT PROGRESS Iteration: 257,240 / 427,000 = 60.2% Tokens seen: ~8.43B / 14B Latest ckpt: step 257,000 (ckpt.pt, 1.4GB)
LOSS TRAJECTORY Step 187,000: train 3.1404, val 3.1927 Step 234,000: train 2.6758, val 2.6576 Step 257,000: train 3.0636, val 2.9376
⚠ NOTE: Loss increased from 234k → 257k Train: 2.6758 → 3.0636 (+0.39) Val: 2.6576 → 2.9376 (+0.28)
This could indicate:
- Learning rate still too high (currently decaying from 6e-4)
- Data ordering causing variance (normal in long runs)
- Model hitting a plateau and bouncing
HARDWARE (RTX 4070) GPU: 100% util, 66°C, 208W VRAM: 5.3 / 12.3 GB MFU: 14.43%, 622ms/step
ESTIMATED TIME REMAINING ~170k steps left × 622ms = ~29.5 hours (~1.2 days)
HEALTH CHECK ⚠ Loss regression from best checkpoint at 234k You may want to monitor the next few eval intervals to see if it recovers. If not, consider:
- Rolling back to the 234k checkpoint
-
Reducing learning rate further
Want me to keep monitoring, or check if the loss trend is stabilizing?
</main>
` — dumps rendered markdown into the page
Key files:
/Users/lzwjava/projects/ww/ww/note/create_note_utils.py— core utilities/Users/lzwjava/projects/ww/ww/note/create_note_from_clipboard.py— main pipeline/Users/lzwjava/projects/ww/ww/note/note_workflow.py— CLI workflow
The Tension You’re Feeling
Your notes are Q&A format with tables, code blocks, lists, blockquotes. Markdown handles this adequately for content. But if you want callout boxes, styled tables, expandable sections, custom typography — Markdown can’t do it without raw HTML embedded in the Markdown.
The real problem: Jekyll’s pipeline is Markdown-first. The _config.yml has markdown: kramdown with input: GFM. The layout does `<main id="content" class="main-content post-content" role="main">
GPT-2 Training Loss Spike | Generated by AI
================================================================ GPT-2 124M TRAINING — github-code dataset ================================================================
CURRENT PROGRESS Iteration: 257,240 / 427,000 = 60.2% Tokens seen: ~8.43B / 14B Latest ckpt: step 257,000 (ckpt.pt, 1.4GB)
LOSS TRAJECTORY Step 187,000: train 3.1404, val 3.1927 Step 234,000: train 2.6758, val 2.6576 Step 257,000: train 3.0636, val 2.9376
⚠ NOTE: Loss increased from 234k → 257k Train: 2.6758 → 3.0636 (+0.39) Val: 2.6576 → 2.9376 (+0.28)
This could indicate:
- Learning rate still too high (currently decaying from 6e-4)
- Data ordering causing variance (normal in long runs)
- Model hitting a plateau and bouncing
HARDWARE (RTX 4070) GPU: 100% util, 66°C, 208W VRAM: 5.3 / 12.3 GB MFU: 14.43%, 622ms/step
ESTIMATED TIME REMAINING ~170k steps left × 622ms = ~29.5 hours (~1.2 days)
HEALTH CHECK ⚠ Loss regression from best checkpoint at 234k You may want to monitor the next few eval intervals to see if it recovers. If not, consider:
- Rolling back to the 234k checkpoint
-
Reducing learning rate further
Want me to keep monitoring, or check if the loss trend is stabilizing?
</main>
. Frontmatter is YAML. The notes collection expects .md` files.
Three Approaches (Ranked by Pragmatism)
Approach 1: HTML-in-Markdown (Minimal Change)
Jekyll already supports raw HTML inside .md files. kramdown passes it through. You can keep .md files, YAML frontmatter, and the entire pipeline — but have the LLM generate HTML content blocks instead of Markdown syntax.
Current note:
**Question:** How do I manage information sharing?
**Answer:**
## The Core Mental Model
| Tier | Close Friend | Spouse |
|------|-------------|--------|
| Salary | No | Yes |
HTML-enhanced note:
---
title: Managing Relationship Information Sharing
layout: post
...
<div class="qa-card">
<div class="question">How do I manage information sharing?</div>
<div class="answer">
<h2>The Core Mental Model</h2>
<table class="styled-table">
<thead><tr><th>Tier</th><th>Close Friend</th><th>Spouse</th></tr></thead>
<tbody><tr><td>Salary</td><td>❌ No</td><td>✅ Yes</td></tr></tbody>
</table>
<div class="callout info">Information flows inward freely, outward carefully.</div>
</div>
</div>
Pros: Zero pipeline changes. Keep frontmatter, .md extension, Jekyll collection, everything. Just add CSS classes to _sass/.
Cons: You’re writing HTML inside .md files — feels wrong semantically.
Approach 2: Full HTML Files (Big Change)
Switch to .html files. Jekyll processes .html files too — it still runs Liquid templating and frontmatter on them. But:
create_filename()hardcodes.mdextension (line 100 ofcreate_note_utils.py)notes_card.pydoesnotes_path.glob("*.md")— would breakcheck_duplicate_notes.pylikely globs*.md_posts/andnotes/both use.md- Every script that reads notes assumes markdown
- The translation pipeline assumes
.md fix_liquid_raw_tags()inwrite_note()would need rethinkingfix_mathjax_in_file()andprocess_tables_in_file()innote_workflow.pyoperate on markdown
You’d need to change:
create_filename()— extensionformat_front_matter()— still works (Jekyll frontmatter is the same)write_note()— content generationclean_content()— parsing logicnotes_card.py— glob patterncheck_duplicate_notes.py— glob + content extractionnote_workflow.py— post-processing pipeline- All translation scripts that read/write notes
- PDF pipeline, audio pipeline
This is 15-20 files to change, plus all the content post-processing logic.
Approach 3: Hybrid — Keep Markdown for Storage, Generate HTML Layouts via AI (Recommended)
The Anthropic insight isn’t “stop using Markdown files.” It’s “when AI generates visual content, HTML gives better results.” In your context:
- Keep
.mdfiles with YAML frontmatter — don’t fight Jekyll - Change the LLM prompt in
create_note_from_clipboard.pyto generate HTML content blocks instead of Markdown syntax - Add CSS to
_sass/for the HTML classes you want - Update
_layouts/post.htmlto handle both markdown and HTML content gracefully
The pipeline change is minimal:
# In create_note_from_clipboard.py, change the prompt:
full_title_prompt = lambda c: (
f"Give a short English title (at most 6 words, no quotes, no explanation) for:\\n{c}\\n\\nTitle:"
)
# Add a content formatting prompt:
format_prompt = lambda c: (
f"Format the following as clean semantic HTML with these CSS classes: "
f"`.qa-card`, `.question`, `.answer`, `.callout`, `.styled-table`. "
f"Do NOT include <html>, <head>, <body> tags. "
f"Only return the content HTML:\\n\\n{c}"
)
Then in write_note(), after the LLM generates HTML content, write it directly:
def write_note(file_path, front_matter, content):
# content is already HTML from the LLM
with open(file_path, "w", encoding="utf-8") as file:
file.write(front_matter + "\n\n" + content + "\n")
My Honest Take
The Anthropic observation is about AI-generated visual artifacts — dashboards, interactive diagrams, styled documents. Your notes are textual Q&A content. For your use case:
- Markdown is fine for 90% of your notes (headings, lists, code blocks, tables, blockquotes)
- HTML adds value for: styled callout boxes, better table presentation, visual hierarchy, collapsible sections
- The overhead of full HTML conversion is massive and touches 15+ files
My recommendation: Approach 3. Keep the pipeline, change the prompt to emit HTML content blocks, add CSS. This gives you the visual richness without rewriting the entire toolchain. The .md file extension is just a convention — Jekyll renders HTML inside .md files identically to .html files.
The one thing you’d lose: fix_liquid_raw_tags() and fix_mathjax_in_file() assume markdown fences. If the LLM generates HTML, you’d need to adjust those post-processing steps. But that’s 2-3 functions, not 20 files.