AI-Powered Blog System: Architecture & Implementation Details

A comprehensive look at the fully automated blog generation system - from AI agents to GitHub Actions deployment, featuring intelligent deduplication and dual-domain support.

#meta #automation #AI #github-actions #astro #architecture

Project Overview

After extensive development spanning multiple iterations, this blog is now a fully automated, production-ready content platform that demonstrates modern software architecture principles. The system autonomously fetches news, generates original analysis, and deploys updates - all without manual intervention.

Core Architecture Principles

  1. Monorepo Structure: Shared code, independent packages, unified builds
  2. Event-Driven Automation: GitHub Actions orchestrates the entire pipeline
  3. Tiered AI Fallback: Cost optimization through intelligent provider selection
  4. Intelligent Deduplication: Multi-level filtering prevents duplicate content
  5. Environment-Based Configuration: Single codebase supports multiple deployment targets

System Architecture

Package Structure (4 Packages)

1. @personalBlog/core - Shared Foundation

Purpose: Reusable utilities and infrastructure

Key Components:

  • AI Client (src/ai/client.ts):

    • Multi-provider support (Gemini, Groq, OpenAI)
    • Tiered fallback chain (tier1 → tier2 → tier3)
    • Usage tracking and budget monitoring
    • Automatic error handling and retries
  • Config Loader (src/config/loader.ts):

    • YAML configuration with Zod validation
    • GitHub Secrets integration (runtime secrets.yaml)
    • Workspace root detection
    • Singleton pattern for performance
  • Content Utilities:

    • Markdown generation with frontmatter
    • Filename generation (YYYY-MM-DD-slug format)
    • Content validation
    • Type-safe schemas

Dependencies: @google/generative-ai, groq-sdk, openai, zod, yaml, pino


2. @personalBlog/agent-news - News Aggregator

Purpose: Fetches and analyzes tech news, generates blog posts

Workflow:

1. Fetch (281+ articles from 7 sources)
   ├─ RSS Feeds (6): Hackaday, Embedded.com, CNX Software, etc.
   └─ Hacker News API: Top stories matching keywords

2. Deduplicate (Remove identical articles)
   └─ URL-based + content similarity

3. Rank by Relevance (AI scoring 0.0-1.0)
   └─ Filter: score ≥ 0.7

4. Filter Existing Posts
   ├─ Extract titles + source URLs from existing .md files
   ├─ Check exact URL match (most reliable)
   └─ Check title similarity (60% word overlap threshold)

5. Generate Blog Post (AI with context)
   ├─ Pass last 10 existing blog titles + URLs to AI
   ├─ AI analyzes with awareness of existing content
   └─ Generate 800-1200 word original analysis

6. Save & Validate
   ├─ Create frontmatter (title, tags, sources)
   ├─ Validate markdown structure
   └─ Save to website/src/content/news/

Key Files:

  • src/index.ts: Main orchestrator
  • src/sources/: RSS fetcher, Hacker News client
  • src/processors/: Deduplicator, relevance ranker
  • src/generator/blog-generator.ts: AI-powered blog generation

Smart Deduplication:

// 1. URL-based (100% reliable)
if (existingUrls.has(article.url)) {
  skip();
}

// 2. Title similarity (60% threshold)
const articleWords = title.split(/\s+/).filter(w => w.length > 3);
const existingWords = existing.split(/\s+/).filter(w => w.length > 3);
const overlap = intersection(articleWords, existingWords);
if (overlap / articleWords.length > 0.6) {
  skip();
}

// 3. Pass context to AI
const context = existingPosts.map(p => 
  `"${p.title}" (source: ${p.url})`
).join('\n');

3. @personalBlog/agent-diy-tutorials - Tutorial Generator

Purpose: Researches forums and creates educational content

Sources:

  • Reddit: r/embedded, r/arduino, r/raspberrypi
  • Stack Exchange: Electronics, Arduino, Raspberry Pi

Workflow (Multi-Phase Generation):

  1. Fetch forum posts by keywords
  2. Topic discovery (AI identifies tutorial-worthy topics)
  3. Tutorial outline generation (structure + sections)
  4. Content writing (detailed step-by-step guides)
  5. Save with difficulty level (beginner/intermediate/advanced)

Status: Framework ready, currently disabled


4. @personalBlog/scheduler - CLI Orchestrator

Purpose: Execute agents from command line or GitHub Actions

Commands:

# Run specific agent
node packages/scheduler/dist/cli.js run-agent news-aggregator

# Run all enabled agents
node packages/scheduler/dist/cli.js run-all

# List agents with status
node packages/scheduler/dist/cli.js list

# Show configuration
node packages/scheduler/dist/cli.js config

Registry System:

// packages/scheduler/src/registry.ts
export const agentRegistry: Record<string, AgentRunner> = {
  'news-aggregator': runNewsAgent,
  'news': runNewsAgent,  // Alias
  'diy-tutorials': runDIYAgent,
  'tutorials': runDIYAgent,  // Alias
};

Execution Flow:

  1. Load configuration (agents.yaml)
  2. Check if agent is enabled
  3. Run agent with error handling
  4. Report execution time and results
  5. Continue to next agent on failure (if running all)

Frontend Stack

Framework: Astro 4.16.19

  • Zero-JS by default (static HTML)
  • React islands for interactivity (BlogFilter, FeedbackForm)
  • Content Collections with type-safe schemas

Styling: Tailwind CSS

  • Custom .blog-content classes (not prose utilities)
  • Dark mode support (system preference)
  • Responsive grid layouts

Key Features:

  1. Unified Blog View: Single page shows all posts (news + tutorials)
  2. Client-Side Filtering: Tag search + dropdown filter (instant)
  3. Base Path Handling: Supports both custom domain (/) and GitHub Pages (/personalBlog)

Components:

  • BlogFilter.tsx: Interactive filtering with React state
  • FeedbackForm.tsx: Contact form (FormSubmit.co integration)
  • Layout.astro: Navigation + footer
  • BlogPost.astro: Article layout with enhanced typography

AI Configuration

Tiered Fallback System:

globalSettings:
  aiModels:
    tier1:
      provider: google
      model: gemini-2.0-flash-exp
      costPer1MTokens: 0.10
      
    tier2:
      provider: groq
      model: llama-3.3-70b-versatile
      costPer1MTokens: 0.79
      
    tier3:
      provider: openai
      model: gpt-4o-mini
      costPer1MTokens: 0.15
  
  fallbackChain: [tier1, tier2, tier3]

Why This Works:

  • Tier 1 (Gemini): Fast, cheap, unlimited RPD (paid tier)
  • Tier 2 (Groq): Fast, free, good quality (fallback)
  • Tier 3 (OpenAI): Expensive but reliable (emergency)

Prompt Engineering:

  • System prompt: “ANALYZE and INTERPRET, not copy”
  • User prompt: Includes article + existing blog context
  • Output format: Structured JSON (title, description, content, tags)
  • Required sections: Overview, Technical Implications, Real-World Applications, Key Takeaways, Conclusion

Automation Pipeline

GitHub Actions Workflow (generate-content.yml):

Triggers:
  - Push to main (for testing)
  - Manual dispatch (select agent: all/news/tutorials)
  - Schedule: Daily 6 AM UTC (news)
  - Schedule: Monday 8 AM UTC (tutorials)

Steps (12 total):
  1. Checkout repository
  2. Setup Node.js 20
  3. Install pnpm 8
  4. Cache pnpm store (speeds up builds)
  5. Install dependencies
  6. Build TypeScript packages (pnpm -r run build)
  7. Create config/secrets.yaml from GitHub Secrets
  8. Run agents (node packages/scheduler/dist/cli.js)
  9. Check for new content (git diff)
  10. Build Astro website (with SITE_DOMAIN env var)
  11. Copy dist/ to docs/ + create .nojekyll
  12. Commit & push (.md + HTML in single commit)

Critical Details:

  • Working Directory: MUST run from workspace root (not packages/scheduler/)
  • Why: File paths are relative to process.cwd(), incorrect directory saves files to wrong location
  • CNAME File: Created in docs/ for custom domain support
  • .nojekyll File: Prevents GitHub Pages from ignoring _astro/ folder (CSS/JS)

Domain Configuration

Dual-Domain Support (Environment Variable Pattern):

// website/astro.config.mjs
const siteDomain = process.env.SITE_DOMAIN || 'blog.sarcasticrobo.online';
const isGitHubPages = siteDomain.includes('github.io');

export default defineConfig({
  site: isGitHubPages 
    ? `https://${siteDomain.split('/')[0]}` 
    : `https://${siteDomain}`,
  base: isGitHubPages && siteDomain.includes('/') 
    ? `/${siteDomain.split('/')[1]}` 
    : '/',
});

How It Works:

  1. Custom Domain (default): base = '/', full URLs like /news/slug
  2. GitHub Pages: SITE_DOMAIN=username.github.io/repo, base = '/repo', URLs like /repo/news/slug
  3. Workflow: Passes SITE_DOMAIN as env var during build
  4. CNAME: Hardcoded to blog.sarcasticrobo.online in workflow

Switching:

  • Add GitHub Secret SITE_DOMAIN → instant switch
  • No code changes needed

Technical Challenges Solved

1. Working Directory Issues

Problem: Agents saved files to packages/scheduler/website/ instead of workspace root

Solution:

# Before (wrong)
cd packages/scheduler && pnpm agents:news

# After (correct)
node packages/scheduler/dist/cli.js run-agent news-aggregator

2. CSS Not Loading on GitHub Pages

Problem: _astro/ folder ignored by Jekyll

Solution: Create .nojekyll file in docs/

Problem: ${base}/news created //news when base was /

Solution:

const basePath = base === '/' ? '' : base;
const url = `${basePath}/news`;

4. Duplicate Posts

Problem: AI generates different titles for same source article

Solution:

  • Check source URLs (exact match)
  • Pass existing blog context to AI
  • Title similarity with 60% threshold

5. AI JSON Parsing Failures

Problem: Gemini wraps JSON in markdown code blocks

Solution:

const codeBlockMatch = text.match(/^```(?:json)?\s*\n?([\s\S]*?)\n?```$/);
if (codeBlockMatch) {
  text = codeBlockMatch[1].trim();
}

6. Rate Limiting

Problem: Hit Gemini 2K RPM limit

Solution: 5-second delay between API calls


Performance & Metrics

Build Performance

  • TypeScript Compilation: ~10s (4 packages)
  • Website Build: ~45s (Astro)
  • Total Workflow Time: ~90s
  • Deployment: ~2 min (GitHub Pages)

Content Generation

  • Articles Fetched: 281+ per run
  • Articles Ranked: Top 4 (score ≥ 0.7)
  • Posts Generated: 1 per run (configured)
  • AI Tokens: ~1500 tokens/post
  • API Delay: 5s between requests

Cost Analysis

  • Gemini API: ~$0.10/month (1M tokens)
  • GitHub Actions: Free (40-100 min/month)
  • GitHub Pages: Free (unlimited bandwidth)
  • Custom Domain: $12/year (optional)
  • Total: ~$1.10/month

Security & Best Practices

Secrets Management

  • ✅ GitHub Secrets (encrypted at rest)
  • ✅ Runtime secrets.yaml (created during workflow)
  • ✅ .gitignore prevents accidental commits
  • ❌ No SOPS/age needed (simplified from original design)

Code Quality

  • TypeScript with strict mode
  • Zod schema validation (runtime type safety)
  • Pino structured logging
  • Error boundaries in agents

Git Workflow

  • Single commit for .md + HTML
  • Automatic rebasing on conflicts
  • Branch protection (GitHub Pages serves from main/docs)

Future Roadmap

Near-Term (1-2 weeks)

  • ✅ Unified blog view with filtering (Done!)
  • ✅ Dual domain support (Done!)
  • ✅ Context-aware AI generation (Done!)
  • 🔄 Test custom domain propagation
  • 📝 Add Giscus comments system

Mid-Term (1-2 months)

  • 🚀 Enable DIY tutorials agent
  • 📊 Analytics with Cloudflare Workers
  • 🔍 Search with Pagefind (static index)
  • 📰 RSS feed generation

Long-Term (3-6 months)

  • 🌍 Multi-language support (i18n)
  • 📧 Newsletter integration
  • 🖼️ Image optimization pipeline
  • 🤖 Additional AI agents (security alerts, hardware reviews)

Key Learnings

Architecture Decisions

  1. Monorepo over multi-repo: Shared dependencies, unified tooling
  2. TypeScript ESM: Modern standard, but requires .js extensions in imports
  3. GitHub Secrets over SOPS: Simpler for GitHub-first workflow
  4. Single workflow file: Easier to maintain than multiple workflows
  5. Content Collections: Type safety prevents runtime errors

AI Integration

  1. Always strip code blocks: LLMs love wrapping JSON in markdown
  2. Pass existing context: Prevents duplicate perspectives
  3. Structured prompts: JSON schema → predictable output
  4. Tiered fallback: 99.9% uptime with cost optimization
  5. Rate limiting: Better safe than sorry (5s delay works)

Deployment

  1. Base path handling: Environment variables > hardcoded values
  2. .nojekyll is critical: Jekyll breaks modern web apps
  3. CNAME in docs/: Required for custom domains
  4. Working directory matters: Always run from workspace root
  5. Single commit strategy: Simplifies git history

Project Stats (Current)

  • Total Files: 85+ (excluding node_modules)
  • TypeScript LOC: ~3,200 lines
  • Configuration: 2 YAML files (agents, secrets)
  • Content: 4 blog posts (1 manual, 3 AI-generated)
  • Uptime: 100% (GitHub Actions handles everything)
  • Cost: $0.10/month (AI only)

Try It Yourself

Want to build your own AI-powered blog? Everything is open source:

What You’ll Need:

  1. GitHub account (free)
  2. Gemini API key (free tier available)
  3. 15 minutes to configure
  4. Zero hosting costs (GitHub Pages)

Conclusion

This project demonstrates that modern AI, automation, and web technologies can create a fully autonomous content platform with:

  • ✅ Zero manual intervention
  • ✅ Professional quality output
  • ✅ Negligible operating costs
  • ✅ Production-ready reliability

The key was combining the right tools (Astro, TypeScript, GitHub Actions) with thoughtful architecture (monorepo, tiered AI, intelligent deduplication) and iterative problem-solving (navigation bugs, CSS loading, duplicate detection).

Most importantly: It actually works! The blog generates unique, analytical content daily, learns from existing posts, and deploys automatically. 🚀


Built with ❤️ and AI | Profile | GitHub

What We Built

This is not just a blog - it’s a complete automated content pipeline that runs on GitHub’s infrastructure for free:

  • AI News Aggregation: Fetches articles from 6+ RSS feeds (Hackaday, Embedded.com, CNX Software, Zephyr Project, EE Times, Interrupt/Memfault) plus Hacker News
  • Intelligent Ranking: Uses AI to score articles by relevance to embedded systems (threshold: 0.7)
  • Original Analysis: AI generates 800-1200 word articles with technical analysis, not summaries
  • Smart Deduplication: Checks existing posts by URL and title similarity to prevent duplicates
  • Automatic Publishing: Commits markdown files and rebuilds website via GitHub Actions
  • Zero-Cost Hosting: GitHub Pages serves the static site with custom styling

System Architecture

Frontend Stack

  • Framework: Astro 4.16.19 (static site generation)
  • Styling: Tailwind CSS with custom blog content classes
  • Components: React islands for interactive elements (feedback form)
  • Typography: Custom .blog-content CSS with enhanced formatting
  • Dark Mode: System preference detection
  • Deployment: GitHub Pages from docs/ folder

Backend & AI

  • Monorepo: pnpm workspaces with 4 packages

    • @personalBlog/core: AI client, config loader, utilities
    • @personalBlog/agent-news: RSS fetcher, Hacker News client, blog generator
    • @personalBlog/agent-diy-tutorials: Reddit/Stack Exchange clients (future feature)
    • @personalBlog/scheduler: CLI orchestrator for agent execution
  • AI Provider: Google Gemini 2.0 Flash (paid tier)

    • Quota: 2K RPM, 4M TPM, unlimited RPD
    • 5-second delay between requests to respect rate limits
    • Fallback chain: Gemini → Groq → OpenAI
  • Configuration: YAML-based agent config with tiered AI models

  • Secrets: GitHub Secrets → runtime secrets.yaml (no local SOPS needed)

Automation Workflow

The entire system runs on GitHub Actions with a single workflow (generate-content.yml):

  1. Triggers: Push to main, manual dispatch, scheduled (daily 6 AM UTC)
  2. Setup: Install Node.js, pnpm, dependencies, build TypeScript packages
  3. Generate Secrets: Creates config/secrets.yaml from GitHub Secrets
  4. Run Agents: Executes from workspace root to save files correctly
  5. Check Changes: Uses git add + git diff --cached to detect new posts
  6. Build Website: Runs pnpm --filter website build
  7. Prepare Deployment: Copies dist/ to docs/, creates .nojekyll file
  8. Commit & Push: Commits both .md sources and HTML to repository
  9. Deploy: GitHub Pages auto-deploys from docs/ folder

Technical Challenges Solved

1. Working Directory Issues

Problem: Agent was saving files to packages/scheduler/website/ instead of workspace root.

Solution: Changed workflow to run CLI from workspace root using node packages/scheduler/dist/cli.js instead of pnpm scripts.

2. CSS Not Loading on GitHub Pages

Problem: Tailwind CSS files in _astro/ folder were ignored by Jekyll.

Solution: Created .nojekyll file in docs/ to disable Jekyll processing.

Problem: Links missing / between base path and routes (e.g., /personalBlognews instead of /personalBlog/news).

Solution: Used import.meta.env.BASE_URL with explicit slashes in all navigation links.

4. AI JSON Parsing Failures

Problem: Gemini sometimes wrapped JSON in markdown code blocks or returned invalid format.

Solution:

  • Strip markdown code blocks before parsing: /^```(?:json)?\s*\n?([\s\S]*?)\n?```$/
  • Added detailed error logging (2000 chars of raw response)
  • Improved prompt to explicitly request JSON-only output

5. Duplicate Post Detection

Problem: Agent posted same articles multiple times with different AI-generated titles.

Solution:

  • Check existing posts by source URL (exact match)
  • Also check title similarity (60% word overlap threshold)
  • Filter short words (<3 chars) for better matching

6. Rate Limit Errors

Problem: Hit Gemini’s 2K RPM limit when generating 4 posts quickly.

Solution: Increased delay between API calls from 2s to 5s.

7. Blog Post Styling

Problem: Tailwind prose classes not rendering properly.

Solution: Created custom .blog-content CSS classes with explicit styling for headings, lists, code blocks, and typography.

Features Implemented

Content Generation

  • ✅ Fetches 281+ articles daily from multiple sources
  • ✅ Ranks by embedded systems relevance
  • ✅ Generates 1 post per run to control API costs
  • ✅ Creates original analysis (not summaries) with structured sections:
    • Overview
    • Technical Implications (hardware/firmware)
    • Real-World Applications
    • Key Takeaways (bullet points)
    • Conclusion

Website Features

  • ✅ Responsive homepage with latest news cards
  • ✅ Individual blog post pages with enhanced typography
  • ✅ Calendar icon, gradient tag badges
  • ✅ Source attribution with clickable links
  • ✅ AI-generated content disclosure
  • ✅ Contact form with FormSubmit.co integration
  • ✅ Profile link to profile.sarcasticrobo.online

Developer Experience

  • ✅ TypeScript monorepo with proper ESM imports
  • ✅ Comprehensive logging with Pino
  • ✅ Config validation with YAML schemas
  • ✅ CLI with commands: run-agent, run-all, list, config
  • ✅ Hot reload during development (Astro)

Daily Workflow

Every day at 6 AM UTC, the system automatically:

  1. Fetches latest tech news from 7 sources
  2. Ranks articles by relevance to embedded systems
  3. Checks if article was already posted (by URL)
  4. Generates original analysis with AI (800-1200 words)
  5. Validates markdown frontmatter and content
  6. Commits new .md file to repository
  7. Builds Astro website to static HTML
  8. Commits HTML to docs/ folder
  9. GitHub Pages deploys updated site within 2-3 minutes

Cost: $0 (everything runs on free tier)

Configuration

AI Models

tier1: gemini-2.0-flash (paid tier, unlimited RPD)
tier2: llama-3.3-70b-versatile (Groq, free)
tier3: gpt-4o-mini (OpenAI, fallback)

GitHub Secrets Required

  • GEMINI_API_KEY: Google AI Studio API key
  • FORMSUBMIT_EMAIL: Email for feedback form

Content Settings

  • Max articles per run: 1
  • Deduplication window: 7 days
  • Min relevance score: 0.7
  • Content length: 800-1200 words

Future Enhancements

Planned features documented in TODO.md:

  • DIY Tutorials Agent: Reddit + Stack Exchange integration
  • Comments System: Giscus (GitHub Discussions)
  • Analytics: Cloudflare Workers Analytics
  • Search: Client-side search with Fuse.js
  • RSS Feed: Auto-generated XML feed
  • Image Optimization: Sharp + Astro Image
  • Multi-language: i18n support
  • Newsletter: Automated weekly digest

Project Stats

  • Total Packages: 4 (core, agent-news, agent-diy-tutorials, scheduler)
  • Dependencies: 25+ npm packages
  • TypeScript Files: 30+ source files
  • Lines of Code: ~3000 LOC
  • Workflow Steps: 12 automated steps
  • Build Time: ~60 seconds
  • Deploy Time: ~2 minutes

Try It Yourself

Want to build your own AI blog? Check out:

The entire system is open source and ready to deploy in under 15 minutes with just a Gemini API key!

Conclusion

This project demonstrates the power of combining AI, automation, and modern web technologies. What started as a complex idea became a reality through systematic problem-solving, debugging, and iteration.

Key learnings: ESM imports require .js extensions, Jekyll ignores _astro folders, AI needs explicit JSON instructions, and deduplication requires URL checking, not just titles.

The blog is now live, automated, and cost-free - a testament to what’s possible with GitHub Actions and AI in 2026! 🚀