AI-Powered Blog System: Architecture & Implementation Details

A comprehensive look at the fully automated blog generation system - from AI agents to GitHub Actions deployment, featuring intelligent deduplication and dual-domain support.

January 13, 2026

#meta #automation #AI #github-actions #astro #architecture

Project Overview

After extensive development spanning multiple iterations, this blog is now a fully automated, production-ready content platform that demonstrates modern software architecture principles. The system autonomously fetches news, generates original analysis, and deploys updates - all without manual intervention.

Core Architecture Principles

Monorepo Structure: Shared code, independent packages, unified builds
Event-Driven Automation: GitHub Actions orchestrates the entire pipeline
Tiered AI Fallback: Cost optimization through intelligent provider selection
Intelligent Deduplication: Multi-level filtering prevents duplicate content
Environment-Based Configuration: Single codebase supports multiple deployment targets

System Architecture

Package Structure (4 Packages)

1. @personalBlog/core - Shared Foundation

Purpose: Reusable utilities and infrastructure

Key Components:

AI Client (src/ai/client.ts):
- Multi-provider support (Gemini, Groq, OpenAI)
- Tiered fallback chain (tier1 → tier2 → tier3)
- Usage tracking and budget monitoring
- Automatic error handling and retries
Config Loader (src/config/loader.ts):
- YAML configuration with Zod validation
- GitHub Secrets integration (runtime secrets.yaml)
- Workspace root detection
- Singleton pattern for performance
Content Utilities:
- Markdown generation with frontmatter
- Filename generation (YYYY-MM-DD-slug format)
- Content validation
- Type-safe schemas

Dependencies: @google/generative-ai, groq-sdk, openai, zod, yaml, pino

2. @personalBlog/agent-news - News Aggregator

Purpose: Fetches and analyzes tech news, generates blog posts

Workflow:

1. Fetch (281+ articles from 7 sources)
   ├─ RSS Feeds (6): Hackaday, Embedded.com, CNX Software, etc.
   └─ Hacker News API: Top stories matching keywords

2. Deduplicate (Remove identical articles)
   └─ URL-based + content similarity

3. Rank by Relevance (AI scoring 0.0-1.0)
   └─ Filter: score ≥ 0.7

4. Filter Existing Posts
   ├─ Extract titles + source URLs from existing .md files
   ├─ Check exact URL match (most reliable)
   └─ Check title similarity (60% word overlap threshold)

5. Generate Blog Post (AI with context)
   ├─ Pass last 10 existing blog titles + URLs to AI
   ├─ AI analyzes with awareness of existing content
   └─ Generate 800-1200 word original analysis

6. Save & Validate
   ├─ Create frontmatter (title, tags, sources)
   ├─ Validate markdown structure
   └─ Save to website/src/content/news/

Key Files:

src/index.ts: Main orchestrator
src/sources/: RSS fetcher, Hacker News client
src/processors/: Deduplicator, relevance ranker
src/generator/blog-generator.ts: AI-powered blog generation

Smart Deduplication:

// 1. URL-based (100% reliable)
if (existingUrls.has(article.url)) {
  skip();
}

// 2. Title similarity (60% threshold)
const articleWords = title.split(/\s+/).filter(w => w.length > 3);
const existingWords = existing.split(/\s+/).filter(w => w.length > 3);
const overlap = intersection(articleWords, existingWords);
if (overlap / articleWords.length > 0.6) {
  skip();
}

// 3. Pass context to AI
const context = existingPosts.map(p => 
  `"${p.title}" (source: ${p.url})`
).join('\n');

3. @personalBlog/agent-diy-tutorials - Tutorial Generator

Purpose: Researches forums and creates educational content

Sources:

Reddit: r/embedded, r/arduino, r/raspberrypi
Stack Exchange: Electronics, Arduino, Raspberry Pi

Workflow (Multi-Phase Generation):

Fetch forum posts by keywords
Topic discovery (AI identifies tutorial-worthy topics)
Tutorial outline generation (structure + sections)
Content writing (detailed step-by-step guides)
Save with difficulty level (beginner/intermediate/advanced)

Status: Framework ready, currently disabled

4. @personalBlog/scheduler - CLI Orchestrator

Purpose: Execute agents from command line or GitHub Actions

Commands:

# Run specific agent
node packages/scheduler/dist/cli.js run-agent news-aggregator

# Run all enabled agents
node packages/scheduler/dist/cli.js run-all

# List agents with status
node packages/scheduler/dist/cli.js list

# Show configuration
node packages/scheduler/dist/cli.js config

Registry System:

// packages/scheduler/src/registry.ts
export const agentRegistry: Record<string, AgentRunner> = {
  'news-aggregator': runNewsAgent,
  'news': runNewsAgent,  // Alias
  'diy-tutorials': runDIYAgent,
  'tutorials': runDIYAgent,  // Alias
};

Execution Flow:

Load configuration (agents.yaml)
Check if agent is enabled
Run agent with error handling
Report execution time and results
Continue to next agent on failure (if running all)

Frontend Stack

Framework: Astro 4.16.19

Zero-JS by default (static HTML)
React islands for interactivity (BlogFilter, FeedbackForm)
Content Collections with type-safe schemas

Styling: Tailwind CSS

Custom .blog-content classes (not prose utilities)
Dark mode support (system preference)
Responsive grid layouts

Key Features:

Unified Blog View: Single page shows all posts (news + tutorials)
Client-Side Filtering: Tag search + dropdown filter (instant)
Base Path Handling: Supports both custom domain (/) and GitHub Pages (/personalBlog)

Components:

BlogFilter.tsx: Interactive filtering with React state
FeedbackForm.tsx: Contact form (FormSubmit.co integration)
Layout.astro: Navigation + footer
BlogPost.astro: Article layout with enhanced typography

AI Configuration

Tiered Fallback System:

globalSettings:
  aiModels:
    tier1:
      provider: google
      model: gemini-2.0-flash-exp
      costPer1MTokens: 0.10
      
    tier2:
      provider: groq
      model: llama-3.3-70b-versatile
      costPer1MTokens: 0.79
      
    tier3:
      provider: openai
      model: gpt-4o-mini
      costPer1MTokens: 0.15
  
  fallbackChain: [tier1, tier2, tier3]

Why This Works:

Tier 1 (Gemini): Fast, cheap, unlimited RPD (paid tier)
Tier 2 (Groq): Fast, free, good quality (fallback)
Tier 3 (OpenAI): Expensive but reliable (emergency)

Prompt Engineering:

System prompt: “ANALYZE and INTERPRET, not copy”
User prompt: Includes article + existing blog context
Output format: Structured JSON (title, description, content, tags)
Required sections: Overview, Technical Implications, Real-World Applications, Key Takeaways, Conclusion

Automation Pipeline

GitHub Actions Workflow (generate-content.yml):

Triggers:
  - Push to main (for testing)
  - Manual dispatch (select agent: all/news/tutorials)
  - Schedule: Daily 6 AM UTC (news)
  - Schedule: Monday 8 AM UTC (tutorials)

Steps (12 total):
  1. Checkout repository
  2. Setup Node.js 20
  3. Install pnpm 8
  4. Cache pnpm store (speeds up builds)
  5. Install dependencies
  6. Build TypeScript packages (pnpm -r run build)
  7. Create config/secrets.yaml from GitHub Secrets
  8. Run agents (node packages/scheduler/dist/cli.js)
  9. Check for new content (git diff)
  10. Build Astro website (with SITE_DOMAIN env var)
  11. Copy dist/ to docs/ + create .nojekyll
  12. Commit & push (.md + HTML in single commit)

Critical Details:

Working Directory: MUST run from workspace root (not packages/scheduler/)
Why: File paths are relative to process.cwd(), incorrect directory saves files to wrong location
CNAME File: Created in docs/ for custom domain support
.nojekyll File: Prevents GitHub Pages from ignoring _astro/ folder (CSS/JS)

Domain Configuration

Dual-Domain Support (Environment Variable Pattern):

// website/astro.config.mjs
const siteDomain = process.env.SITE_DOMAIN || 'blog.sarcasticrobo.online';
const isGitHubPages = siteDomain.includes('github.io');

export default defineConfig({
  site: isGitHubPages 
    ? `https://${siteDomain.split('/')[0]}` 
    : `https://${siteDomain}`,
  base: isGitHubPages && siteDomain.includes('/') 
    ? `/${siteDomain.split('/')[1]}` 
    : '/',
});

How It Works:

Custom Domain (default): base = '/', full URLs like /news/slug
GitHub Pages: SITE_DOMAIN=username.github.io/repo, base = '/repo', URLs like /repo/news/slug
Workflow: Passes SITE_DOMAIN as env var during build
CNAME: Hardcoded to blog.sarcasticrobo.online in workflow

Switching:

Add GitHub Secret SITE_DOMAIN → instant switch
No code changes needed

Technical Challenges Solved

1. Working Directory Issues

Problem: Agents saved files to packages/scheduler/website/ instead of workspace root

Solution:

# Before (wrong)
cd packages/scheduler && pnpm agents:news

# After (correct)
node packages/scheduler/dist/cli.js run-agent news-aggregator

2. CSS Not Loading on GitHub Pages

Problem: _astro/ folder ignored by Jekyll

Solution: Create .nojekyll file in docs/

Problem: ${base}/news created //news when base was /

Solution:

const basePath = base === '/' ? '' : base;
const url = `${basePath}/news`;

4. Duplicate Posts

Problem: AI generates different titles for same source article

Solution:

Check source URLs (exact match)
Pass existing blog context to AI
Title similarity with 60% threshold

5. AI JSON Parsing Failures

Problem: Gemini wraps JSON in markdown code blocks

Solution:

const codeBlockMatch = text.match(/^```(?:json)?\s*\n?([\s\S]*?)\n?```$/);
if (codeBlockMatch) {
  text = codeBlockMatch[1].trim();
}

6. Rate Limiting

Problem: Hit Gemini 2K RPM limit

Solution: 5-second delay between API calls

Performance & Metrics

Build Performance

TypeScript Compilation: ~10s (4 packages)
Website Build: ~45s (Astro)
Total Workflow Time: ~90s
Deployment: ~2 min (GitHub Pages)

Content Generation

Articles Fetched: 281+ per run
Articles Ranked: Top 4 (score ≥ 0.7)
Posts Generated: 1 per run (configured)
AI Tokens: ~1500 tokens/post
API Delay: 5s between requests

Cost Analysis

Gemini API: ~$0.10/month (1M tokens)
GitHub Actions: Free (40-100 min/month)
GitHub Pages: Free (unlimited bandwidth)
Custom Domain: $12/year (optional)
Total: ~$1.10/month

Security & Best Practices

Secrets Management

✅ GitHub Secrets (encrypted at rest)
✅ Runtime secrets.yaml (created during workflow)
✅ .gitignore prevents accidental commits
❌ No SOPS/age needed (simplified from original design)

Code Quality

TypeScript with strict mode
Zod schema validation (runtime type safety)
Pino structured logging
Error boundaries in agents

Git Workflow

Single commit for .md + HTML
Automatic rebasing on conflicts
Branch protection (GitHub Pages serves from main/docs)

Future Roadmap

Near-Term (1-2 weeks)

✅ Unified blog view with filtering (Done!)
✅ Dual domain support (Done!)
✅ Context-aware AI generation (Done!)
🔄 Test custom domain propagation
📝 Add Giscus comments system

Mid-Term (1-2 months)

🚀 Enable DIY tutorials agent
📊 Analytics with Cloudflare Workers
🔍 Search with Pagefind (static index)
📰 RSS feed generation

Long-Term (3-6 months)

🌍 Multi-language support (i18n)
📧 Newsletter integration
🖼️ Image optimization pipeline
🤖 Additional AI agents (security alerts, hardware reviews)

Key Learnings

Architecture Decisions

Monorepo over multi-repo: Shared dependencies, unified tooling
TypeScript ESM: Modern standard, but requires .js extensions in imports
GitHub Secrets over SOPS: Simpler for GitHub-first workflow
Single workflow file: Easier to maintain than multiple workflows
Content Collections: Type safety prevents runtime errors

AI Integration

Always strip code blocks: LLMs love wrapping JSON in markdown
Pass existing context: Prevents duplicate perspectives
Structured prompts: JSON schema → predictable output
Tiered fallback: 99.9% uptime with cost optimization
Rate limiting: Better safe than sorry (5s delay works)

Deployment

Base path handling: Environment variables > hardcoded values
.nojekyll is critical: Jekyll breaks modern web apps
CNAME in docs/: Required for custom domains
Working directory matters: Always run from workspace root
Single commit strategy: Simplifies git history

Project Stats (Current)

Total Files: 85+ (excluding node_modules)
TypeScript LOC: ~3,200 lines
Configuration: 2 YAML files (agents, secrets)
Content: 4 blog posts (1 manual, 3 AI-generated)
Uptime: 100% (GitHub Actions handles everything)
Cost: $0.10/month (AI only)

Try It Yourself

Want to build your own AI-powered blog? Everything is open source:

Repository: github.com/deepak4395/personalBlog
Live Site: blog.sarcasticrobo.online
Documentation: See DOCUMENTATION.md for complete setup guide

What You’ll Need:

GitHub account (free)
Gemini API key (free tier available)
15 minutes to configure
Zero hosting costs (GitHub Pages)

Conclusion

This project demonstrates that modern AI, automation, and web technologies can create a fully autonomous content platform with:

✅ Zero manual intervention
✅ Professional quality output
✅ Negligible operating costs
✅ Production-ready reliability

The key was combining the right tools (Astro, TypeScript, GitHub Actions) with thoughtful architecture (monorepo, tiered AI, intelligent deduplication) and iterative problem-solving (navigation bugs, CSS loading, duplicate detection).

Most importantly: It actually works! The blog generates unique, analytical content daily, learns from existing posts, and deploys automatically. 🚀

Built with ❤️ and AI | Profile | GitHub

What We Built

This is not just a blog - it’s a complete automated content pipeline that runs on GitHub’s infrastructure for free:

AI News Aggregation: Fetches articles from 6+ RSS feeds (Hackaday, Embedded.com, CNX Software, Zephyr Project, EE Times, Interrupt/Memfault) plus Hacker News
Intelligent Ranking: Uses AI to score articles by relevance to embedded systems (threshold: 0.7)
Original Analysis: AI generates 800-1200 word articles with technical analysis, not summaries
Smart Deduplication: Checks existing posts by URL and title similarity to prevent duplicates
Automatic Publishing: Commits markdown files and rebuilds website via GitHub Actions
Zero-Cost Hosting: GitHub Pages serves the static site with custom styling

System Architecture

Frontend Stack

Framework: Astro 4.16.19 (static site generation)
Styling: Tailwind CSS with custom blog content classes
Components: React islands for interactive elements (feedback form)
Typography: Custom .blog-content CSS with enhanced formatting
Dark Mode: System preference detection
Deployment: GitHub Pages from docs/ folder

Backend & AI

Monorepo: pnpm workspaces with 4 packages
- @personalBlog/core: AI client, config loader, utilities
- @personalBlog/agent-news: RSS fetcher, Hacker News client, blog generator
- @personalBlog/agent-diy-tutorials: Reddit/Stack Exchange clients (future feature)
- @personalBlog/scheduler: CLI orchestrator for agent execution
AI Provider: Google Gemini 2.0 Flash (paid tier)
- Quota: 2K RPM, 4M TPM, unlimited RPD
- 5-second delay between requests to respect rate limits
- Fallback chain: Gemini → Groq → OpenAI
Configuration: YAML-based agent config with tiered AI models
Secrets: GitHub Secrets → runtime secrets.yaml (no local SOPS needed)

Automation Workflow

The entire system runs on GitHub Actions with a single workflow (generate-content.yml):

Triggers: Push to main, manual dispatch, scheduled (daily 6 AM UTC)
Setup: Install Node.js, pnpm, dependencies, build TypeScript packages
Generate Secrets: Creates config/secrets.yaml from GitHub Secrets
Run Agents: Executes from workspace root to save files correctly
Check Changes: Uses git add + git diff --cached to detect new posts
Build Website: Runs pnpm --filter website build
Prepare Deployment: Copies dist/ to docs/, creates .nojekyll file
Commit & Push: Commits both .md sources and HTML to repository
Deploy: GitHub Pages auto-deploys from docs/ folder

Technical Challenges Solved

1. Working Directory Issues

Problem: Agent was saving files to packages/scheduler/website/ instead of workspace root.

Solution: Changed workflow to run CLI from workspace root using node packages/scheduler/dist/cli.js instead of pnpm scripts.

2. CSS Not Loading on GitHub Pages

Problem: Tailwind CSS files in _astro/ folder were ignored by Jekyll.

Solution: Created .nojekyll file in docs/ to disable Jekyll processing.

Problem: Links missing / between base path and routes (e.g., /personalBlognews instead of /personalBlog/news).

Solution: Used import.meta.env.BASE_URL with explicit slashes in all navigation links.

4. AI JSON Parsing Failures

Problem: Gemini sometimes wrapped JSON in markdown code blocks or returned invalid format.

Solution:

Strip markdown code blocks before parsing: /^```(?:json)?\s*\n?([\s\S]*?)\n?```$/
Added detailed error logging (2000 chars of raw response)
Improved prompt to explicitly request JSON-only output

5. Duplicate Post Detection

Problem: Agent posted same articles multiple times with different AI-generated titles.

Solution:

Check existing posts by source URL (exact match)
Also check title similarity (60% word overlap threshold)
Filter short words (<3 chars) for better matching

6. Rate Limit Errors

Problem: Hit Gemini’s 2K RPM limit when generating 4 posts quickly.

Solution: Increased delay between API calls from 2s to 5s.

7. Blog Post Styling

Problem: Tailwind prose classes not rendering properly.

Solution: Created custom .blog-content CSS classes with explicit styling for headings, lists, code blocks, and typography.

Features Implemented

Content Generation

✅ Fetches 281+ articles daily from multiple sources
✅ Ranks by embedded systems relevance
✅ Generates 1 post per run to control API costs
✅ Creates original analysis (not summaries) with structured sections:
- Overview
- Technical Implications (hardware/firmware)
- Real-World Applications
- Key Takeaways (bullet points)
- Conclusion

Website Features

✅ Responsive homepage with latest news cards
✅ Individual blog post pages with enhanced typography
✅ Calendar icon, gradient tag badges
✅ Source attribution with clickable links
✅ AI-generated content disclosure
✅ Contact form with FormSubmit.co integration
✅ Profile link to profile.sarcasticrobo.online

Developer Experience

✅ TypeScript monorepo with proper ESM imports
✅ Comprehensive logging with Pino
✅ Config validation with YAML schemas
✅ CLI with commands: run-agent, run-all, list, config
✅ Hot reload during development (Astro)

Daily Workflow

Every day at 6 AM UTC, the system automatically:

Fetches latest tech news from 7 sources
Ranks articles by relevance to embedded systems
Checks if article was already posted (by URL)
Generates original analysis with AI (800-1200 words)
Validates markdown frontmatter and content
Commits new .md file to repository
Builds Astro website to static HTML
Commits HTML to docs/ folder
GitHub Pages deploys updated site within 2-3 minutes

Cost: $0 (everything runs on free tier)

Configuration

AI Models

tier1: gemini-2.0-flash (paid tier, unlimited RPD)
tier2: llama-3.3-70b-versatile (Groq, free)
tier3: gpt-4o-mini (OpenAI, fallback)

GitHub Secrets Required

GEMINI_API_KEY: Google AI Studio API key
FORMSUBMIT_EMAIL: Email for feedback form

Content Settings

Max articles per run: 1
Deduplication window: 7 days
Min relevance score: 0.7
Content length: 800-1200 words

Future Enhancements

Planned features documented in TODO.md:

DIY Tutorials Agent: Reddit + Stack Exchange integration
Comments System: Giscus (GitHub Discussions)
Analytics: Cloudflare Workers Analytics
Search: Client-side search with Fuse.js
RSS Feed: Auto-generated XML feed
Image Optimization: Sharp + Astro Image
Multi-language: i18n support
Newsletter: Automated weekly digest

Project Stats

Total Packages: 4 (core, agent-news, agent-diy-tutorials, scheduler)
Dependencies: 25+ npm packages
TypeScript Files: 30+ source files
Lines of Code: ~3000 LOC
Workflow Steps: 12 automated steps
Build Time: ~60 seconds
Deploy Time: ~2 minutes

Try It Yourself

Want to build your own AI blog? Check out:

Repository: github.com/deepak4395/personalBlog
Setup Guide: See GITHUB_SETUP.md for step-by-step instructions
Live Site: deepak4395.github.io/personalBlog

The entire system is open source and ready to deploy in under 15 minutes with just a Gemini API key!

Conclusion

This project demonstrates the power of combining AI, automation, and modern web technologies. What started as a complex idea became a reality through systematic problem-solving, debugging, and iteration.

Key learnings: ESM imports require .js extensions, Jekyll ignores _astro folders, AI needs explicit JSON instructions, and deduplication requires URL checking, not just titles.

The blog is now live, automated, and cost-free - a testament to what’s possible with GitHub Actions and AI in 2026! 🚀

Sources:

personalBlog Repository - GitHub

Project Overview

Core Architecture Principles

System Architecture

Package Structure (4 Packages)

1. @personalBlog/core - Shared Foundation

2. @personalBlog/agent-news - News Aggregator

3. @personalBlog/agent-diy-tutorials - Tutorial Generator

4. @personalBlog/scheduler - CLI Orchestrator

Frontend Stack

AI Configuration

Automation Pipeline

Domain Configuration

Technical Challenges Solved

1. Working Directory Issues

2. CSS Not Loading on GitHub Pages

3. Navigation Links Broken

4. Duplicate Posts

5. AI JSON Parsing Failures

6. Rate Limiting

Performance & Metrics

Build Performance

Content Generation

Cost Analysis

Security & Best Practices

Secrets Management

Code Quality

Git Workflow

Future Roadmap

Near-Term (1-2 weeks)

Mid-Term (1-2 months)

Long-Term (3-6 months)

Key Learnings

Architecture Decisions

AI Integration

Deployment

Project Stats (Current)

Try It Yourself

Conclusion

What We Built

System Architecture

Frontend Stack

Backend & AI

Automation Workflow

Technical Challenges Solved

1. Working Directory Issues

2. CSS Not Loading on GitHub Pages

3. Navigation Links Broken

4. AI JSON Parsing Failures

5. Duplicate Post Detection

6. Rate Limit Errors

7. Blog Post Styling

Features Implemented

Content Generation

Website Features

Developer Experience

Daily Workflow

Configuration

AI Models

GitHub Secrets Required

Content Settings

Future Enhancements

Project Stats

Try It Yourself

Conclusion

Sources: