AI-Powered Blog System: Architecture & Implementation Details
A comprehensive look at the fully automated blog generation system - from AI agents to GitHub Actions deployment, featuring intelligent deduplication and dual-domain support.
Project Overview
After extensive development spanning multiple iterations, this blog is now a fully automated, production-ready content platform that demonstrates modern software architecture principles. The system autonomously fetches news, generates original analysis, and deploys updates - all without manual intervention.
Core Architecture Principles
- Monorepo Structure: Shared code, independent packages, unified builds
- Event-Driven Automation: GitHub Actions orchestrates the entire pipeline
- Tiered AI Fallback: Cost optimization through intelligent provider selection
- Intelligent Deduplication: Multi-level filtering prevents duplicate content
- Environment-Based Configuration: Single codebase supports multiple deployment targets
System Architecture
Package Structure (4 Packages)
1. @personalBlog/core - Shared Foundation
Purpose: Reusable utilities and infrastructure
Key Components:
-
AI Client (
src/ai/client.ts):- Multi-provider support (Gemini, Groq, OpenAI)
- Tiered fallback chain (tier1 → tier2 → tier3)
- Usage tracking and budget monitoring
- Automatic error handling and retries
-
Config Loader (
src/config/loader.ts):- YAML configuration with Zod validation
- GitHub Secrets integration (runtime secrets.yaml)
- Workspace root detection
- Singleton pattern for performance
-
Content Utilities:
- Markdown generation with frontmatter
- Filename generation (YYYY-MM-DD-slug format)
- Content validation
- Type-safe schemas
Dependencies: @google/generative-ai, groq-sdk, openai, zod, yaml, pino
2. @personalBlog/agent-news - News Aggregator
Purpose: Fetches and analyzes tech news, generates blog posts
Workflow:
1. Fetch (281+ articles from 7 sources)
├─ RSS Feeds (6): Hackaday, Embedded.com, CNX Software, etc.
└─ Hacker News API: Top stories matching keywords
2. Deduplicate (Remove identical articles)
└─ URL-based + content similarity
3. Rank by Relevance (AI scoring 0.0-1.0)
└─ Filter: score ≥ 0.7
4. Filter Existing Posts
├─ Extract titles + source URLs from existing .md files
├─ Check exact URL match (most reliable)
└─ Check title similarity (60% word overlap threshold)
5. Generate Blog Post (AI with context)
├─ Pass last 10 existing blog titles + URLs to AI
├─ AI analyzes with awareness of existing content
└─ Generate 800-1200 word original analysis
6. Save & Validate
├─ Create frontmatter (title, tags, sources)
├─ Validate markdown structure
└─ Save to website/src/content/news/
Key Files:
src/index.ts: Main orchestratorsrc/sources/: RSS fetcher, Hacker News clientsrc/processors/: Deduplicator, relevance rankersrc/generator/blog-generator.ts: AI-powered blog generation
Smart Deduplication:
// 1. URL-based (100% reliable)
if (existingUrls.has(article.url)) {
skip();
}
// 2. Title similarity (60% threshold)
const articleWords = title.split(/\s+/).filter(w => w.length > 3);
const existingWords = existing.split(/\s+/).filter(w => w.length > 3);
const overlap = intersection(articleWords, existingWords);
if (overlap / articleWords.length > 0.6) {
skip();
}
// 3. Pass context to AI
const context = existingPosts.map(p =>
`"${p.title}" (source: ${p.url})`
).join('\n');
3. @personalBlog/agent-diy-tutorials - Tutorial Generator
Purpose: Researches forums and creates educational content
Sources:
- Reddit: r/embedded, r/arduino, r/raspberrypi
- Stack Exchange: Electronics, Arduino, Raspberry Pi
Workflow (Multi-Phase Generation):
- Fetch forum posts by keywords
- Topic discovery (AI identifies tutorial-worthy topics)
- Tutorial outline generation (structure + sections)
- Content writing (detailed step-by-step guides)
- Save with difficulty level (beginner/intermediate/advanced)
Status: Framework ready, currently disabled
4. @personalBlog/scheduler - CLI Orchestrator
Purpose: Execute agents from command line or GitHub Actions
Commands:
# Run specific agent
node packages/scheduler/dist/cli.js run-agent news-aggregator
# Run all enabled agents
node packages/scheduler/dist/cli.js run-all
# List agents with status
node packages/scheduler/dist/cli.js list
# Show configuration
node packages/scheduler/dist/cli.js config
Registry System:
// packages/scheduler/src/registry.ts
export const agentRegistry: Record<string, AgentRunner> = {
'news-aggregator': runNewsAgent,
'news': runNewsAgent, // Alias
'diy-tutorials': runDIYAgent,
'tutorials': runDIYAgent, // Alias
};
Execution Flow:
- Load configuration (
agents.yaml) - Check if agent is enabled
- Run agent with error handling
- Report execution time and results
- Continue to next agent on failure (if running all)
Frontend Stack
Framework: Astro 4.16.19
- Zero-JS by default (static HTML)
- React islands for interactivity (BlogFilter, FeedbackForm)
- Content Collections with type-safe schemas
Styling: Tailwind CSS
- Custom
.blog-contentclasses (not prose utilities) - Dark mode support (system preference)
- Responsive grid layouts
Key Features:
- Unified Blog View: Single page shows all posts (news + tutorials)
- Client-Side Filtering: Tag search + dropdown filter (instant)
- Base Path Handling: Supports both custom domain (
/) and GitHub Pages (/personalBlog)
Components:
BlogFilter.tsx: Interactive filtering with React stateFeedbackForm.tsx: Contact form (FormSubmit.co integration)Layout.astro: Navigation + footerBlogPost.astro: Article layout with enhanced typography
AI Configuration
Tiered Fallback System:
globalSettings:
aiModels:
tier1:
provider: google
model: gemini-2.0-flash-exp
costPer1MTokens: 0.10
tier2:
provider: groq
model: llama-3.3-70b-versatile
costPer1MTokens: 0.79
tier3:
provider: openai
model: gpt-4o-mini
costPer1MTokens: 0.15
fallbackChain: [tier1, tier2, tier3]
Why This Works:
- Tier 1 (Gemini): Fast, cheap, unlimited RPD (paid tier)
- Tier 2 (Groq): Fast, free, good quality (fallback)
- Tier 3 (OpenAI): Expensive but reliable (emergency)
Prompt Engineering:
- System prompt: “ANALYZE and INTERPRET, not copy”
- User prompt: Includes article + existing blog context
- Output format: Structured JSON (title, description, content, tags)
- Required sections: Overview, Technical Implications, Real-World Applications, Key Takeaways, Conclusion
Automation Pipeline
GitHub Actions Workflow (generate-content.yml):
Triggers:
- Push to main (for testing)
- Manual dispatch (select agent: all/news/tutorials)
- Schedule: Daily 6 AM UTC (news)
- Schedule: Monday 8 AM UTC (tutorials)
Steps (12 total):
1. Checkout repository
2. Setup Node.js 20
3. Install pnpm 8
4. Cache pnpm store (speeds up builds)
5. Install dependencies
6. Build TypeScript packages (pnpm -r run build)
7. Create config/secrets.yaml from GitHub Secrets
8. Run agents (node packages/scheduler/dist/cli.js)
9. Check for new content (git diff)
10. Build Astro website (with SITE_DOMAIN env var)
11. Copy dist/ to docs/ + create .nojekyll
12. Commit & push (.md + HTML in single commit)
Critical Details:
- Working Directory: MUST run from workspace root (not packages/scheduler/)
- Why: File paths are relative to
process.cwd(), incorrect directory saves files to wrong location - CNAME File: Created in docs/ for custom domain support
- .nojekyll File: Prevents GitHub Pages from ignoring
_astro/folder (CSS/JS)
Domain Configuration
Dual-Domain Support (Environment Variable Pattern):
// website/astro.config.mjs
const siteDomain = process.env.SITE_DOMAIN || 'blog.sarcasticrobo.online';
const isGitHubPages = siteDomain.includes('github.io');
export default defineConfig({
site: isGitHubPages
? `https://${siteDomain.split('/')[0]}`
: `https://${siteDomain}`,
base: isGitHubPages && siteDomain.includes('/')
? `/${siteDomain.split('/')[1]}`
: '/',
});
How It Works:
- Custom Domain (default):
base = '/', full URLs like/news/slug - GitHub Pages:
SITE_DOMAIN=username.github.io/repo,base = '/repo', URLs like/repo/news/slug - Workflow: Passes
SITE_DOMAINas env var during build - CNAME: Hardcoded to
blog.sarcasticrobo.onlinein workflow
Switching:
- Add GitHub Secret
SITE_DOMAIN→ instant switch - No code changes needed
Technical Challenges Solved
1. Working Directory Issues
Problem: Agents saved files to packages/scheduler/website/ instead of workspace root
Solution:
# Before (wrong)
cd packages/scheduler && pnpm agents:news
# After (correct)
node packages/scheduler/dist/cli.js run-agent news-aggregator
2. CSS Not Loading on GitHub Pages
Problem: _astro/ folder ignored by Jekyll
Solution: Create .nojekyll file in docs/
3. Navigation Links Broken
Problem: ${base}/news created //news when base was /
Solution:
const basePath = base === '/' ? '' : base;
const url = `${basePath}/news`;
4. Duplicate Posts
Problem: AI generates different titles for same source article
Solution:
- Check source URLs (exact match)
- Pass existing blog context to AI
- Title similarity with 60% threshold
5. AI JSON Parsing Failures
Problem: Gemini wraps JSON in markdown code blocks
Solution:
const codeBlockMatch = text.match(/^```(?:json)?\s*\n?([\s\S]*?)\n?```$/);
if (codeBlockMatch) {
text = codeBlockMatch[1].trim();
}
6. Rate Limiting
Problem: Hit Gemini 2K RPM limit
Solution: 5-second delay between API calls
Performance & Metrics
Build Performance
- TypeScript Compilation: ~10s (4 packages)
- Website Build: ~45s (Astro)
- Total Workflow Time: ~90s
- Deployment: ~2 min (GitHub Pages)
Content Generation
- Articles Fetched: 281+ per run
- Articles Ranked: Top 4 (score ≥ 0.7)
- Posts Generated: 1 per run (configured)
- AI Tokens: ~1500 tokens/post
- API Delay: 5s between requests
Cost Analysis
- Gemini API: ~$0.10/month (1M tokens)
- GitHub Actions: Free (40-100 min/month)
- GitHub Pages: Free (unlimited bandwidth)
- Custom Domain: $12/year (optional)
- Total: ~$1.10/month
Security & Best Practices
Secrets Management
- ✅ GitHub Secrets (encrypted at rest)
- ✅ Runtime secrets.yaml (created during workflow)
- ✅ .gitignore prevents accidental commits
- ❌ No SOPS/age needed (simplified from original design)
Code Quality
- TypeScript with strict mode
- Zod schema validation (runtime type safety)
- Pino structured logging
- Error boundaries in agents
Git Workflow
- Single commit for .md + HTML
- Automatic rebasing on conflicts
- Branch protection (GitHub Pages serves from
main/docs)
Future Roadmap
Near-Term (1-2 weeks)
- ✅ Unified blog view with filtering (Done!)
- ✅ Dual domain support (Done!)
- ✅ Context-aware AI generation (Done!)
- 🔄 Test custom domain propagation
- 📝 Add Giscus comments system
Mid-Term (1-2 months)
- 🚀 Enable DIY tutorials agent
- 📊 Analytics with Cloudflare Workers
- 🔍 Search with Pagefind (static index)
- 📰 RSS feed generation
Long-Term (3-6 months)
- 🌍 Multi-language support (i18n)
- 📧 Newsletter integration
- 🖼️ Image optimization pipeline
- 🤖 Additional AI agents (security alerts, hardware reviews)
Key Learnings
Architecture Decisions
- Monorepo over multi-repo: Shared dependencies, unified tooling
- TypeScript ESM: Modern standard, but requires
.jsextensions in imports - GitHub Secrets over SOPS: Simpler for GitHub-first workflow
- Single workflow file: Easier to maintain than multiple workflows
- Content Collections: Type safety prevents runtime errors
AI Integration
- Always strip code blocks: LLMs love wrapping JSON in markdown
- Pass existing context: Prevents duplicate perspectives
- Structured prompts: JSON schema → predictable output
- Tiered fallback: 99.9% uptime with cost optimization
- Rate limiting: Better safe than sorry (5s delay works)
Deployment
- Base path handling: Environment variables > hardcoded values
- .nojekyll is critical: Jekyll breaks modern web apps
- CNAME in docs/: Required for custom domains
- Working directory matters: Always run from workspace root
- Single commit strategy: Simplifies git history
Project Stats (Current)
- Total Files: 85+ (excluding node_modules)
- TypeScript LOC: ~3,200 lines
- Configuration: 2 YAML files (agents, secrets)
- Content: 4 blog posts (1 manual, 3 AI-generated)
- Uptime: 100% (GitHub Actions handles everything)
- Cost: $0.10/month (AI only)
Try It Yourself
Want to build your own AI-powered blog? Everything is open source:
- Repository: github.com/deepak4395/personalBlog
- Live Site: blog.sarcasticrobo.online
- Documentation: See
DOCUMENTATION.mdfor complete setup guide
What You’ll Need:
- GitHub account (free)
- Gemini API key (free tier available)
- 15 minutes to configure
- Zero hosting costs (GitHub Pages)
Conclusion
This project demonstrates that modern AI, automation, and web technologies can create a fully autonomous content platform with:
- ✅ Zero manual intervention
- ✅ Professional quality output
- ✅ Negligible operating costs
- ✅ Production-ready reliability
The key was combining the right tools (Astro, TypeScript, GitHub Actions) with thoughtful architecture (monorepo, tiered AI, intelligent deduplication) and iterative problem-solving (navigation bugs, CSS loading, duplicate detection).
Most importantly: It actually works! The blog generates unique, analytical content daily, learns from existing posts, and deploys automatically. 🚀
Built with ❤️ and AI | Profile | GitHub
What We Built
This is not just a blog - it’s a complete automated content pipeline that runs on GitHub’s infrastructure for free:
- AI News Aggregation: Fetches articles from 6+ RSS feeds (Hackaday, Embedded.com, CNX Software, Zephyr Project, EE Times, Interrupt/Memfault) plus Hacker News
- Intelligent Ranking: Uses AI to score articles by relevance to embedded systems (threshold: 0.7)
- Original Analysis: AI generates 800-1200 word articles with technical analysis, not summaries
- Smart Deduplication: Checks existing posts by URL and title similarity to prevent duplicates
- Automatic Publishing: Commits markdown files and rebuilds website via GitHub Actions
- Zero-Cost Hosting: GitHub Pages serves the static site with custom styling
System Architecture
Frontend Stack
- Framework: Astro 4.16.19 (static site generation)
- Styling: Tailwind CSS with custom blog content classes
- Components: React islands for interactive elements (feedback form)
- Typography: Custom
.blog-contentCSS with enhanced formatting - Dark Mode: System preference detection
- Deployment: GitHub Pages from
docs/folder
Backend & AI
-
Monorepo: pnpm workspaces with 4 packages
@personalBlog/core: AI client, config loader, utilities@personalBlog/agent-news: RSS fetcher, Hacker News client, blog generator@personalBlog/agent-diy-tutorials: Reddit/Stack Exchange clients (future feature)@personalBlog/scheduler: CLI orchestrator for agent execution
-
AI Provider: Google Gemini 2.0 Flash (paid tier)
- Quota: 2K RPM, 4M TPM, unlimited RPD
- 5-second delay between requests to respect rate limits
- Fallback chain: Gemini → Groq → OpenAI
-
Configuration: YAML-based agent config with tiered AI models
-
Secrets: GitHub Secrets → runtime secrets.yaml (no local SOPS needed)
Automation Workflow
The entire system runs on GitHub Actions with a single workflow (generate-content.yml):
- Triggers: Push to main, manual dispatch, scheduled (daily 6 AM UTC)
- Setup: Install Node.js, pnpm, dependencies, build TypeScript packages
- Generate Secrets: Creates
config/secrets.yamlfrom GitHub Secrets - Run Agents: Executes from workspace root to save files correctly
- Check Changes: Uses
git add+git diff --cachedto detect new posts - Build Website: Runs
pnpm --filter website build - Prepare Deployment: Copies
dist/todocs/, creates.nojekyllfile - Commit & Push: Commits both
.mdsources and HTML to repository - Deploy: GitHub Pages auto-deploys from
docs/folder
Technical Challenges Solved
1. Working Directory Issues
Problem: Agent was saving files to packages/scheduler/website/ instead of workspace root.
Solution: Changed workflow to run CLI from workspace root using node packages/scheduler/dist/cli.js instead of pnpm scripts.
2. CSS Not Loading on GitHub Pages
Problem: Tailwind CSS files in _astro/ folder were ignored by Jekyll.
Solution: Created .nojekyll file in docs/ to disable Jekyll processing.
3. Navigation Links Broken
Problem: Links missing / between base path and routes (e.g., /personalBlognews instead of /personalBlog/news).
Solution: Used import.meta.env.BASE_URL with explicit slashes in all navigation links.
4. AI JSON Parsing Failures
Problem: Gemini sometimes wrapped JSON in markdown code blocks or returned invalid format.
Solution:
- Strip markdown code blocks before parsing:
/^```(?:json)?\s*\n?([\s\S]*?)\n?```$/ - Added detailed error logging (2000 chars of raw response)
- Improved prompt to explicitly request JSON-only output
5. Duplicate Post Detection
Problem: Agent posted same articles multiple times with different AI-generated titles.
Solution:
- Check existing posts by source URL (exact match)
- Also check title similarity (60% word overlap threshold)
- Filter short words (<3 chars) for better matching
6. Rate Limit Errors
Problem: Hit Gemini’s 2K RPM limit when generating 4 posts quickly.
Solution: Increased delay between API calls from 2s to 5s.
7. Blog Post Styling
Problem: Tailwind prose classes not rendering properly.
Solution: Created custom .blog-content CSS classes with explicit styling for headings, lists, code blocks, and typography.
Features Implemented
Content Generation
- ✅ Fetches 281+ articles daily from multiple sources
- ✅ Ranks by embedded systems relevance
- ✅ Generates 1 post per run to control API costs
- ✅ Creates original analysis (not summaries) with structured sections:
- Overview
- Technical Implications (hardware/firmware)
- Real-World Applications
- Key Takeaways (bullet points)
- Conclusion
Website Features
- ✅ Responsive homepage with latest news cards
- ✅ Individual blog post pages with enhanced typography
- ✅ Calendar icon, gradient tag badges
- ✅ Source attribution with clickable links
- ✅ AI-generated content disclosure
- ✅ Contact form with FormSubmit.co integration
- ✅ Profile link to profile.sarcasticrobo.online
Developer Experience
- ✅ TypeScript monorepo with proper ESM imports
- ✅ Comprehensive logging with Pino
- ✅ Config validation with YAML schemas
- ✅ CLI with commands: run-agent, run-all, list, config
- ✅ Hot reload during development (Astro)
Daily Workflow
Every day at 6 AM UTC, the system automatically:
- Fetches latest tech news from 7 sources
- Ranks articles by relevance to embedded systems
- Checks if article was already posted (by URL)
- Generates original analysis with AI (800-1200 words)
- Validates markdown frontmatter and content
- Commits new
.mdfile to repository - Builds Astro website to static HTML
- Commits HTML to
docs/folder - GitHub Pages deploys updated site within 2-3 minutes
Cost: $0 (everything runs on free tier)
Configuration
AI Models
tier1: gemini-2.0-flash (paid tier, unlimited RPD)
tier2: llama-3.3-70b-versatile (Groq, free)
tier3: gpt-4o-mini (OpenAI, fallback)
GitHub Secrets Required
GEMINI_API_KEY: Google AI Studio API keyFORMSUBMIT_EMAIL: Email for feedback form
Content Settings
- Max articles per run: 1
- Deduplication window: 7 days
- Min relevance score: 0.7
- Content length: 800-1200 words
Future Enhancements
Planned features documented in TODO.md:
- DIY Tutorials Agent: Reddit + Stack Exchange integration
- Comments System: Giscus (GitHub Discussions)
- Analytics: Cloudflare Workers Analytics
- Search: Client-side search with Fuse.js
- RSS Feed: Auto-generated XML feed
- Image Optimization: Sharp + Astro Image
- Multi-language: i18n support
- Newsletter: Automated weekly digest
Project Stats
- Total Packages: 4 (core, agent-news, agent-diy-tutorials, scheduler)
- Dependencies: 25+ npm packages
- TypeScript Files: 30+ source files
- Lines of Code: ~3000 LOC
- Workflow Steps: 12 automated steps
- Build Time: ~60 seconds
- Deploy Time: ~2 minutes
Try It Yourself
Want to build your own AI blog? Check out:
- Repository: github.com/deepak4395/personalBlog
- Setup Guide: See
GITHUB_SETUP.mdfor step-by-step instructions - Live Site: deepak4395.github.io/personalBlog
The entire system is open source and ready to deploy in under 15 minutes with just a Gemini API key!
Conclusion
This project demonstrates the power of combining AI, automation, and modern web technologies. What started as a complex idea became a reality through systematic problem-solving, debugging, and iteration.
Key learnings: ESM imports require .js extensions, Jekyll ignores _astro folders, AI needs explicit JSON instructions, and deduplication requires URL checking, not just titles.
The blog is now live, automated, and cost-free - a testament to what’s possible with GitHub Actions and AI in 2026! 🚀