MoreRSS

site iconThe Practical DeveloperModify

A constructive and inclusive social network for software developers.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of The Practical Developer

The Ultimate Prompt Strategy: How to Vibe Code Production-Ready Websites

2026-01-21 18:00:00

Snow white dusting a cupboard

Dusting off the room because it has been a minute or two since I was in here last!

Through last year, I ran a lot of vibe-coded projects. Most were for writing demos, others were simply for the fun of it.
However, with each new vibe-coded project, I kept getting super frustrated and super stuck with debugging AI's badly written (spaghetti) code.

"Vibe Coding" has been the trend of the moment. The idea to me was basically, "Describe your app in plain English, and the AI handles the syntax." This was the approach I kept using that kept failing until now.

Why Vibe Coding is Ineffective

VIbe coding is ineffective because most people treat AI like it's magic. They ask it for a feature, paste the code, and hope for the best. Usually, they get a messy file structure, insecure code, and a maintenance nightmare. The application might work on localhost, but it lacks the rigor required for the real world.

The Goal
I wanted a technical blog that was "up to standard and safe." Coming from Wordpress, where I built my blog (The Handy Developer's Guide) and lived on for the better part of a year and half, I wanted a platform I could own completely, built with modern engineering standards.

The Solution
I didn't just ask the AI for code; I managed it. I adopted the mindset of a Senior Architect and treated the AI as my junior developer.
By enforcing strict constraints and architectural patterns, I used vibe coding to build a secure, production-ready application.
The image below is where I started with Gemini. But it gets better down the line.

good AI prompt for vibe coding

Steps to Vibe Code a Production-Ready App

Step 1: Defining the Architecture of Your Project

Before writing a single line of code, I had to define the stack. A standard AI prompt might suggest a generic React app or a rigid site builder. That was not enough.

The Decision
I chose a Headless Architecture:

  • Frontend: Next.js 15 (App Router)

frontend choice

  • Backend: Sanity (Headless CMS)
  • Styling: Tailwind CSS (v4)

Why I used Sanity to Build My Blog

Separation of concerns is critical for long-term survival. With this architecture, I own the code, and I own the content.

  • Portability: If I want to change the design next year, I don't lose my posts. They live safely in Sanity's database.
  • Security: There is no exposed database or admin panel for hackers to target on the frontend.
  • Performance: Next.js allows for Static Site Generation (SSG), meaning my pages load instantly, together with Sanity.

Key Takeaway
I did not let the AI pick the stack; I picked the stack, then told the AI how to build it.

Step 2: The "System Prompt" Strategy

The quality of the output depends entirely on the constraints of the input. I didn't just say, "Make a blog." I assigned a role and a standard.

The Trick
I used a "System Prompt" strategy to set the ground rules before any code was written.

The Prompt

good prompt engineering for vibe coding

The idea was to have one tab of Gemini 3 acting as the senior developer/project manager, while another tab acted as the engineer/dev on ground.
So, I got tab A to give me the high-level prompts after already explaining i=to it its role.

The Result
The AI didn't dump files in the root directory. It set up a professional folder structure (lib/, components/, types/) and automatically created a .env.local file for credentials. By explicitly banning any types, the AI was forced to write interface definitions for my Post and Author schemas, preventing runtime crashes later.

AI acting like a project manager

Step 3: The "Schema-First" Build

Initially, I spun up a standalone Sanity Studio. I quickly realized this created redundancy—I didn't want to manage two separate projects. I directed the AI to refactor the architecture, merging the CMS directly into the Next.js application using an Embedded Studio.
This is how we managed it.

I tell AI my mistake

AI fixe coding error

The Result
I had a working CMS living at /studio before I even had a homepage. This allowed me to write and structure content immediately, giving the frontend real data to fetch during development.

Step 4: Using AI to Fix the Errors it Generated

AI is not perfect. Even with a great prompt (I'd know), "hallucinations" happen. I had to do my fair share of debugging, but they were more minor than I remember vibe-coded errors to be.
We hit two major roadblocks.

Bug 1: The Route Group Conflict
I moved my layout files into a (blog) route group to organize the code (this was totally my choice, by the way; even though the Project Manager tab suggested it, it said it was optional). Suddenly, "the internet broke." In my terminal, I got error messages about missing tags.

  • The Issue: The AI had created a layout hierarchy where the root layout.tsx was missing the essential <html> and <body> tags because I had moved them into the child group.
  • The Fix: We refactored the hierarchy. I established a "Root Layout" for the HTML shell and a "Blog Layout" for the Navbar and Footer.

AI fixes

Bug 2: The "Broken Image" Saga
The homepage rendered, but every image was a broken icon. The URL looked correct, but the browser refused to load it.

  • The Issue: I already knew this was a security feature, not a bug. Next.js blocks external images by default to prevent malicious injection.
  • The Fix: I didn't panic. I just checked the configuration. I prompted the project manager tab to update next.config.ts to explicitly whitelist cdn.sanity.io. One server restart later, the images appeared.

The Lesson
AI writes the code, but you have to check the config. And sometimes, you just have to turn it off and on again.

Step 5: Refining the UI in a Vibe Coding Project (current phase)

Design
We moved from a sort of skeleton UI to a professional UI. We implemented a "Glassmorphism" navbar with a blur effect and switched to a high-quality typography pairing (Inter for UI, Playfair Display for headings).

AI UI change prompt

How to Check If Your Blog is Up To Standard

SEO
"A blog that doesn't rank is a diary," said someone really famous.
I had the AI to implement Dynamic Metadata.
We used the generateMetadata function to automatically pull the SEO title, description, and OpenGraph images from Sanity. Now, every link shared on social media looks professional.

Analytics
I wanted to know if people were reading, but I didn't want to invade their privacy, so we integrated Vercel Analytics, a privacy-friendly tracker that gives me the data I need without the cookie banners users hate.

The Proof
I ran a Google Lighthouse audit on the production build to verify our "Senior Architect" standards. The results spoke for themselves:

  • Accessibility: 100
  • Best Practices: 96
  • SEO: 100

Google lighthouse score

My project manager assured me that this was a good score, especially seeing as my blog is not yet live. Getting it live will increase the score.

Conclusion:

I haven't launched the blog yet because I still have some work to do on it. I haven't properly tested it yet.
Having been writing articles recently on Playwright, I have learnt how to do extensive searches, simulating different browser and network conditions.
In due time, though, the blog will be launched.
I wrote this article because I wanted to share an update on one of the things I have been working on so far and how AI has helped me.

Let me know what you think of my journey so far.
Do you have any Vibe coding best practices?
Do you think I am wasting my time and should learn actual programming skills?

No matter your opinions, we want to hear them!!

Find me on LinkedIn.

BodySnatcher: How a Hardcoded Secret Led to Full ServiceNow Takeover (CVE-2025-12420)

2026-01-21 17:58:18

Imagine waking up to find a new "backdoor" admin account in your ServiceNow instance. No passwords were leaked, no MFA was bypassed, and no SSO was compromised. Instead, an attacker simply "asked" your AI agent to create it.

This isn't a hypothetical scenario. It’s the reality of CVE-2025-12420, a critical privilege escalation vulnerability nicknamed BodySnatcher.

In this post, we’ll break down how a classic security blunder, a hardcoded secret, combined with the power of AI agents to create a "perfect storm" for enterprise compromise.

The Vulnerability: A Two-Step Identity Crisis

The flaw lives in the interaction between the Virtual Agent API (sn_va_as_service) and Now Assist AI Agents (sn_aia).

The Virtual Agent API is the gateway for external platforms like Slack or Teams to talk to ServiceNow. To keep things secure, it performs two checks:

  1. Provider Auth: Is the external platform authorized?
  2. User Linking: Which ServiceNow user is sending the message?

Here is where things went wrong.

1. The Hardcoded "Master Key"

For provider authentication, ServiceNow used a method called Message Auth. The problem? The secret key was hardcoded to servicenowexternalagent across every customer environment.

If you knew that string, you could authenticate as a legitimate external provider on any vulnerable ServiceNow instance.

2. The Email-Only Identity Check

Once "authenticated" as a provider, the system needed to link the session to a user. It used a feature called Auto-Linking, which matched users based solely on their email address.

By combining these two flaws, an attacker could authenticate with the hardcoded secret and then claim to be [email protected]. The system would believe them without asking for a password or MFA.

Anatomy of the "BodySnatcher" Exploit

The exploit follows a simple but deadly three-step chain.

Step 1: Bypass API Auth

The attacker sends a POST request to the Virtual Agent endpoint using the hardcoded bearer token.

POST /api/sn_va_as_service/v1/virtualagent/message
Host: [your-instance].service-now.com
Authorization: Bearer servicenowexternalagent
Content-Type: application/json

{
    "user_id": "[email protected]",
    "message": "Hello!"
}

Step 2: Hijack the Admin Identity

The attacker swaps their ID for a target admin's email.

{
    "user_id": "[email protected]",
    "message": "Start a privileged workflow."
}

ServiceNow now treats this session as the administrator.

Step 3: Weaponize the AI Agent

Now the "Agentic Amplification" kicks in. The attacker sends a natural language command to the Record Management AI Agent.

  • Attacker: "I need to create a new user."
  • Agent: (Triggers the privileged workflow)
  • Agent Action: "Creating user 'backdoor' with 'admin' role..."

Because the agent executes actions in the context of the (hijacked) admin, it succeeds. The attacker now has a persistent backdoor.

Why This Matters: Agentic Amplification

BodySnatcher is a watershed moment for Agentic Security. In a traditional app, an auth bypass might let you see a dashboard. In an agentic system, the agent acts as a high-speed execution engine.

The agent’s ability to map natural language to high-privilege API calls drastically shortens the attack path. This is the Agentic Blast Radius: a single conversational command can now compromise entire enterprise systems.

How to Protect Your Instance

If you haven't already, patch immediately.

Component Affected Versions Fixed Versions
Now Assist AI Agents 5.0.24 – 5.1.17, 5.2.0 – 5.2.18 5.1.18, 5.2.19
Virtual Agent API <= 3.15.1, 4.0.0 – 4.0.3 3.15.2, 4.0.4

Beyond the Patch: Best Practices

  1. Kill Hardcoded Secrets: Never use static credentials for API integrations. Use OAuth 2.0 or secure vaults.

  2. Identity is Not an Email: An email address is a public identifier, not a credential. Enforce MFA/SSO at the point of identity linking.

  3. Principle of Least Privilege (PoLP): Narrowly scope what your agents can do. If an agent only needs to file tickets, it shouldn't have the power to create admin users.

  4. Implement AI Guardrails: Use specialized security layers like NeuralTrust to monitor agent behavior in real-time and block anomalous administrative actions.

Final Reflections

The BodySnatcher vulnerability proves that the most dangerous flaws today aren't just in the AI itself, but in how we connect AI to our existing (and sometimes broken) infrastructure.

As we move toward an autonomous enterprise, we must treat every tool exposed to an AI agent as a high-risk endpoint.

Have you audited your AI agent permissions lately? Let’s discuss in the comments

Solved: Migrating Confluence Pages to Markdown for Hugo/Jekyll Blog

2026-01-21 17:57:04

🚀 Executive Summary

TL;DR: This article addresses the challenge of Confluence vendor lock-in by providing a comprehensive Python script to automate the migration of Confluence pages to Markdown. It enables DevOps Engineers and System Administrators to easily publish their documentation to modern static site generators like Hugo or Jekyll, eliminating tedious manual reformatting.

🎯 Key Takeaways

  • Confluence content can be programmatically accessed using a personal API token, which should be securely managed via environment variables to prevent hardcoding.
  • The Python script leverages the requests library for Confluence REST API interaction and html2text for converting fetched HTML page bodies into Markdown format suitable for static site generators.
  • The migration process includes handling API pagination, slugifying page titles for clean URLs, and generating YAML front matter with essential metadata like title, dates, and categories for Hugo/Jekyll.
  • Common pitfalls include Confluence API rate limiting, imperfect conversion of complex HTML (e.g., embedded macros), and the script’s focus on text content without automatically downloading or re-linking embedded images and attachments.

Migrating Confluence Pages to Markdown for Hugo/Jekyll Blog

As DevOps Engineers and System Administrators, we often find ourselves wrestling with documentation challenges. Confluence is a powerful collaboration tool, widely used for team knowledge bases, project documentation, and technical articles. However, its proprietary format can quickly become a vendor lock-in dilemma. What if you want to publish your carefully crafted Confluence pages to a modern static site generator like Hugo or Jekyll, perhaps for a public-facing blog or a more lightweight internal knowledge base?

Manually copying and pasting content, then reformatting it into Markdown, is not only tedious but also prone to errors, especially for large volumes of pages. This article provides a comprehensive, step-by-step technical tutorial on how to automate the migration of your Confluence pages to Markdown, making them ready for publication on your Hugo or Jekyll blog.

Unlock your content, break free from proprietary formats, and embrace the versatility of Markdown and static site generators. Let’s get started.

Prerequisites

Before we dive into the migration process, ensure you have the following in place:

  • Confluence Cloud Instance: Access to a Confluence Cloud instance with sufficient permissions to view the pages you intend to migrate. You will need to create an API token.
  • Confluence API Token: A personal API token for authentication with the Confluence REST API. We will walk through how to generate this.
  • Python 3.x: Installed on your local machine.
  • pip: Python’s package installer, usually bundled with Python 3.x.
  • Basic Understanding: Familiarity with Python scripting, command-line interfaces, and the concept of static site generators (Hugo/Jekyll) will be beneficial.

Step-by-Step Guide

Step 1: Generate a Confluence API Token

To programmatically access your Confluence content, you need an API token. This acts as a secure password for API requests, tied to your Atlassian account.

  1. Log in to your Atlassian account (id.atlassian.com).
  2. Navigate to Security > Create and manage API tokens.
  3. Click Create API token.
  4. Give your token a descriptive Label (e.g., “Confluence Migrator”).
  5. Copy the generated token immediately. It will not be shown again.

Security Note: Treat your API token like a password. Do not hardcode it directly into scripts for production use; instead, use environment variables or a secure configuration management system. For this tutorial, we’ll use environment variables for demonstration.

Step 2: Set Up Your Python Environment

It’s good practice to work within a Python virtual environment to manage dependencies.

First, create a project directory and a virtual environment:

mkdir confluence-migrator
cd confluence-migrator
python3 -m venv venv
source venv/bin/activate # On Windows: .\venv\Scripts\activate

Next, install the necessary Python libraries. We’ll use requests for making HTTP calls to the Confluence API and html2text for converting the fetched HTML content into Markdown.

pip install requests html2text

Step 3: Write the Python Migration Script

Now, let’s craft the Python script that will fetch your Confluence pages, convert them, and save them as Markdown files. Create a file named migrate.py in your project directory.

3.1. Configure Authentication and API Endpoints

We’ll store sensitive information in environment variables. Set these in your shell before running the script (or add them to a .env file and use python-dotenv).

export CONFLUENCE_URL="https://your-domain.atlassian.net/wiki"
export CONFLUENCE_EMAIL="[email protected]"
export CONFLUENCE_API_TOKEN="YOUR_API_TOKEN_HERE"
export CONFLUENCE_SPACE_KEYS="SPACEKEY1,SPACEKEY2" # Comma-separated list of space keys

Your migrate.py script will read these variables.</p<

import os
import requests
import html2text
import re
from datetime import datetime

# --- Configuration ---
CONFLUENCE_URL = os.getenv("CONFLUENCE_URL")
CONFLUENCE_EMAIL = os.getenv("CONFLUENCE_EMAIL")
CONFLUENCE_API_TOKEN = os.getenv("CONFLUENCE_API_TOKEN")
CONFLUENCE_SPACE_KEYS = os.getenv("CONFLUENCE_SPACE_KEYS", "").split(',')

if not all([CONFLUENCE_URL, CONFLUENCE_EMAIL, CONFLUENCE_API_TOKEN]):
    print("Error: CONFLUENCE_URL, CONFLUENCE_EMAIL, or CONFLUENCE_API_TOKEN not set.")
    exit(1)

HEADERS = {
    "Accept": "application/json"
}
AUTH = (CONFLUENCE_EMAIL, CONFLUENCE_API_TOKEN)
OUTPUT_DIR = "markdown_output"
os.makedirs(OUTPUT_DIR, exist_ok=True)

# --- Helper Functions ---
def slugify(text):
    text = re.sub(r'[^a-z0-9\s-]', '', text.lower())
    text = re.sub(r'[\s-]+', '-', text).strip('-')
    return text

def get_confluence_pages(space_key):
    print(f"Fetching pages for space: {space_key}")
    pages = []
    start = 0
    limit = 25  # Max 25 for v1 API
    while True:
        url = f"{CONFLUENCE_URL}/rest/api/content?spaceKey={space_key}&expand=body.view,version&start={start}&limit={limit}"
        response = requests.get(url, headers=HEADERS, auth=AUTH)
        response.raise_for_status() # Raises HTTPError for bad responses (4xx or 5xx)
        data = response.json()
        pages.extend(data['results'])

        if 'next' not in data['_links']:
            break
        start += limit
        print(f"  Fetched {len(pages)} pages. Continuing for more...")
    return pages

def get_page_content(page_id):
    url = f"{CONFLUENCE_URL}/rest/api/content/{page_id}?expand=body.view,version"
    response = requests.get(url, headers=HEADERS, auth=AUTH)
    response.raise_for_status()
    return response.json()

def convert_html_to_markdown(html_content):
    h = html2text.HTML2Text()
    h.ignore_images = False # Set to True if you don't want image links
    h.images_as_html = True # Keep images as HTML img tags (useful for Hugo/Jekyll shortcodes)
    h.body_width = 0        # Don't wrap lines
    markdown_content = h.handle(html_content)
    return markdown_content

def generate_front_matter(title, creation_date, update_date, slug, space_key):
    # Adjust for Hugo or Jekyll requirements
    # For Hugo:
    # ---
    # title: "My Confluence Page Title"
    # date: 2023-10-27T10:00:00Z
    # lastmod: 2023-10-27T14:30:00Z
    # draft: false
    # tags: ["confluence", "migration", "devops"]
    # categories: ["documentation", "tech"]
    # ---
    #
    # For Jekyll:
    # ---
    # layout: post
    # title: "My Confluence Page Title"
    # date: 2023-10-27 10:00:00 +0000
    # categories: [documentation, tech]
    # tags: [confluence, migration, devops]
    # ---

    # Dates often come as "2023-10-27T14:30:00.000Z" from Confluence
    created = datetime.strptime(creation_date.split('.')[0], "%Y-%m-%dT%H:%M:%S").isoformat() + "Z"
    updated = datetime.strptime(update_date.split('.')[0], "%Y-%m-%dT%H:%M:%S").isoformat() + "Z"

    front_matter = f"""---
title: "{title.replace('"', '\\"')}"
date: {created}
lastmod: {updated}
draft: false
categories: ["{space_key.lower()}"]
tags: ["confluence", "migration"]
---

"""
    return front_matter

# --- Main Logic ---
def main():
    for space_key in CONFLUENCE_SPACE_KEYS:
        if not space_key:
            continue
        print(f"Processing space: {space_key}")
        pages_in_space = get_confluence_pages(space_key)

        for page_summary in pages_in_space:
            page_id = page_summary['id']
            page_title = page_summary['title']
            page_type = page_summary['type'] # Usually 'page' or 'blogpost'

            if page_type != 'page': # We might want to filter out blog posts or other content types
                print(f"  Skipping {page_type}: {page_title}")
                continue

            print(f"  Processing page: {page_title} (ID: {page_id})")

            try:
                full_page_data = get_page_content(page_id)
                html_content = full_page_data['body']['view']['value']

                # Get creation and update dates
                creation_date = full_page_data['version']['when'] # This is the last updated date
                # Confluence v1 API does not directly expose creationDate in the default expand.
                # For more accurate creationDate, you'd need to fetch page history or use v2 API (more complex for HTML body).
                # For simplicity, we'll use version['when'] as both creation and update, or just update.
                # Let's use version['when'] for both 'date' and 'lastmod' to be safe.
                # For actual creationDate, you might need to query /rest/api/content/{id}/history for the first version.

                # Assuming version['when'] is adequate for 'date' and 'lastmod' for this migration scope
                current_time_iso = datetime.now().isoformat(timespec='seconds') + "Z"

                markdown_content = convert_html_to_markdown(html_content)

                slug = slugify(page_title)
                filename = os.path.join(OUTPUT_DIR, f"{slug}.md")

                # Re-evaluate dates based on typical Confluence data:
                # 'version' contains 'when' which is the last modified date
                # 'version' contains 'created' but it's not the page creation date, but version creation.
                # To get the original page creation date, a separate API call to history or specific fields is needed.
                # For this tutorial, we'll use the version's 'when' for both date and lastmod.
                # In a real-world scenario, you might want to fetch history to get the first version's 'when' for original creation date.

                # Using current_time_iso for 'date' and 'lastmod' to make it consistent for new blog posts.
                # Or, using the Confluence 'when' attribute if it's reliable for last modified.
                # Let's use page_summary['version']['when'] for last modified.
                # For initial 'date', let's use the first known creation date or just a default.

                # For simplicity, let's just use the current date for the 'date' field in front matter,
                # and the Confluence 'when' for 'lastmod'.
                front_matter = generate_front_matter(
                    page_title,
                    current_time_iso, # Or use page_summary['version']['when'] for original page creation if available easily
                    page_summary['version']['when'],
                    slug,
                    space_key
                )

                with open(filename, "w", encoding="utf-8") as f:
                    f.write(front_matter)
                    f.write(markdown_content)
                print(f"  Saved '{page_title}' to {filename}")

            except requests.exceptions.RequestException as e:
                print(f"  Error fetching page {page_id} ({page_title}): {e}")
            except Exception as e:
                print(f"  An unexpected error occurred for page {page_id} ({page_title}): {e}")

if __name__ == "__main__":
    main()

Logic Explanation:

  • The script starts by loading your Confluence URL, email, API token, and desired space keys from environment variables for security.
  • get_confluence_pages paginates through all pages within a specified Confluence space using the v1 REST API endpoint /rest/api/content, expanding body.view to get the rendered HTML content and version for modification dates.
  • get_page_content fetches the full content of a specific page ID.
  • convert_html_to_markdown utilizes the html2text library to transform the fetched HTML into Markdown. We configure it to retain images as HTML img tags, which often integrate better with static site generators.
  • generate_front_matter creates the YAML front matter expected by Hugo or Jekyll, including title, publication date (date), last modification date (lastmod), and categories/tags derived from the Confluence space.
  • The main function iterates through each specified Confluence space, fetches its pages, converts them, and saves them to individual .md files in the markdown_output directory.
  • The slugify function ensures filenames are clean and URL-friendly.

3.2. Run the Script

With your environment variables set and the script ready, execute it from your terminal:

python3 migrate.py

You should see output indicating pages being processed, and a new markdown_output directory will be populated with your Confluence content in Markdown format.

Step 4: Integrate with Hugo/Jekyll

The final step is to incorporate the generated Markdown files into your static site generator project.

  • Hugo: Copy the .md files from markdown_output into your Hugo project’s content/posts directory (or any other content section you prefer).
  • Jekyll: Place the .md files into your Jekyll project’s _posts directory. Remember Jekyll often expects filenames in the format YYYY-MM-DD-title.md. You might need to adjust the slugify and filename generation in the script to prepend the date.

After placing the files, run your static site generator’s local server (e.g., hugo server or bundle exec jekyll serve) to preview the migrated content and make any necessary style or formatting adjustments.

Common Pitfalls

  • API Rate Limiting: Confluence Cloud APIs have rate limits. If you’re migrating a very large number of pages, you might hit these limits, resulting in 429 Too Many Requests errors. Implement a retry mechanism with exponential backoff if this becomes an issue.
  • Complex HTML Conversion:</ 関係ない/strong> While html2text is good, Confluence’s rich editor can generate highly complex HTML, including embedded macros, custom CSS, or specific table structures that might not convert perfectly to Markdown. Manual review and post-conversion cleanup of the Markdown files may be necessary, especially for heavily formatted pages.
  • Missing Attachments/Images: This script focuses on text content. Embedded images are converted to their <img> tags, but the images themselves are not downloaded. A more advanced script would need to identify image URLs, download them, and update Markdown references to point to local assets.
  • Authentication Errors: Double-check your CONFLUENCE_URL, CONFLUENCE_EMAIL, and CONFLUENCE_API_TOKEN values. A common mistake is using your regular password instead of an API token, or having typos in the URL.

Conclusion

You’ve successfully automated the often daunting task of migrating Confluence pages to Markdown. This process empowers you to take control of your content, making it portable and future-proof. By leveraging static site generators, you gain benefits like improved performance, enhanced security, simplified hosting, and seamless integration with modern DevOps workflows.

Consider this script a solid foundation. Next steps could involve:

  • Automating image and attachment migration.
  • Implementing more sophisticated error handling and logging.
  • Adding support for Confluence blog posts or other content types.
  • Creating a continuous integration pipeline to periodically sync changes from Confluence to your static site.

Embrace the power of automation and keep your documentation agile and accessible!

Darian Vance

👉 Read the original article on TechResolve.blog

Support my work

If this article helped you, you can buy me a coffee:

👉 https://buymeacoffee.com/darianvance

Using Transient Tasks in HarmonyOS Next: A DownloadCenter Sample

2026-01-21 17:56:31

Read the original article:Using Transient Tasks in HarmonyOS Next: A DownloadCenter Sample

image.png

Photo by Stanislav on Unsplash

Introduction

In HarmonyOS Next, managing transient background tasks is crucial for delivering smooth user experiences without draining system resources or violating OS lifecycle constraints.

This article focuses on demonstrating how to use Transient Tasks with a simulated file download scenario. By simulating download progress, we learn how to keep tasks alive in the background, update the notification UI in real-time, and manage the lifecycle using the official BackgroundTasksKit API.

What are Transient Tasks?

In HarmonyOS Next, a transient task is a temporary background operation that the system allows to continue execution even if the app enters the background.

DownloadCenter App

image.png

App Preview

DownloadCenter is a simulated download manager in a HarmonyOS Next application that allows users to start, pause, resume, or cancel file downloads while displaying real-time progress. It leverages the BackgroundTasksKit to request transient background tasks, ensuring that the download process continues even when the app moves to the background. Additionally, it updates the user with live notifications using NotificationKit, providing a responsive and lifecycle-aware download experience.

HarmonyOS NEXT Kits Used in DownloadCenter and Implementation Overview

DownloadCenter uses several HarmonyOS NEXT kits:

  • AbilityKit: Used to manage the UIAbility context and access ability information such as bundle name and ability name for creating WantAgents.
  • NotificationKit: Handles publishing, canceling, and checking support for custom notification templates like download progress notifications.
  • BackgroundTasksKit: Manages transient background tasks via requestSuspendDelay and cancelSuspendDelay, enabling background execution during simulated downloads.
  • BasicServicesKit: Provides error handling through the BusinessError class used in background task management.

Now, let’s dive into the core implementation and explore how the DownloadCenter is built step-by-step to simulate a seamless download experience with background task support.

Required Permissions

To execute transient tasks while running in the background, the following permission must be added to module.json5:

"requestPermissions": [
      {
        "name": "ohos.permission.KEEP_BACKGROUND_RUNNING"
      }
]

Preparing Permissions and Notifications

When the UI is about to appear, the app requests notification permission, creates a WantAgent for handling notification clicks, and checks if the custom download notification template is supported.

aboutToAppear() {
  openNotificationPermission(this.context);
  createWantAgent(bundleName, abilityName).then(agent => {
    this.wantAgentObj = agent;
  });
  notificationManager.isSupportTemplate('downloadTemplate').then(support => {
    this.isSupport = support;
  });
}

Managing Download States

The app controls the download process with methods to start, pause, resume, and cancel downloads. These methods update the download status and progress, and handle UI updates and background task requests.

async start() {
  this.downloadStatus = DOWNLOAD_STATUS.DOWNLOADING;
  this.requestSuspend();
  this.download();
}

async pause() {
  this.downloadStatus = DOWNLOAD_STATUS.PAUSE;
  clearInterval(this.interval);
  this.cancelSuspend();
}

Handling Background Task Lifecycles

To keep the download running when the app is in the background, requestSuspend() requests a transient task from the system to delay suspension, and cancelSuspend() cancels this transient task request if needed. If the transient task is about to timeout, the provided callback gracefully cancels the download.

requestSuspend() {
  const info = backgroundTaskManager.requestSuspendDelay('File downloading', () => {
    this.cancel(); // Cancel if timeout
  });
  this.requestId = info.requestId;
}

cancelSuspend() {
  if (this.requestId !== -1) {
    backgroundTaskManager.cancelSuspendDelay(this.requestId);
    this.requestId = -1;
  }
}

Simulating Download and Updating Notifications

The download() method simulates download progress using setInterval, updates the progress state, and publishes a notification with the current progress until completion.

download() {
  this.interval = setInterval(async () => {
    if (this.downloadProgress >= CommonConstants.PROGRESS_TOTAL) {
      clearInterval(this.interval);
      this.downloadStatus = DOWNLOAD_STATUS.FINISHED;
      this.cancelSuspend();
    } else {
      this.downloadProgress += CommonConstants.PROGRESS_SPEED;
    }
    publishNotification(this.downloadProgress, this.notificationTitle, this.wantAgentObj);
  }, CommonConstants.UPDATE_FREQUENCY);
}

Now, let's take a quick look at the complete code implementation.

import { common, wantAgent } from '@kit.AbilityKit';
import { notificationManager } from '@kit.NotificationKit';
import { createWantAgent, publishNotification, openNotificationPermission } from '../common/utils/NotificationUtil';
import { getStringByRes } from '../common/utils/ResourseUtil';
import Logger from '../common/utils/Logger';
import CommonConstants, { DOWNLOAD_STATUS } from '../common/constants/CommonConstants';
import { backgroundTaskManager } from '@kit.BackgroundTasksKit';
import { BusinessError } from '@kit.BasicServicesKit';

@Entry
@Component
struct Index {
  private context = this.getUIContext().getHostContext() as common.UIAbilityContext;
  private requestId: number = -1;
  @State downloadStatus: number = DOWNLOAD_STATUS.INITIAL;
  @State downloadProgress: number = 0;
  private notificationTitle: string = '';
  private wantAgentObj: object = {} as wantAgent.WantAgentInfo;
  private interval: number = -1;
  private isSupport: boolean = true;

  aboutToAppear() {
    openNotificationPermission(this.context);
    const bundleName = this.context.abilityInfo.bundleName;
    const abilityName = this.context.abilityInfo.name;
    createWantAgent(bundleName, abilityName).then(agent => {
      this.wantAgentObj = agent;
    });

    notificationManager.isSupportTemplate('downloadTemplate').then(support => {
      this.isSupport = support;
    });
  }

  onBackPress() {
    this.cancel();
  }

  build() {
    Column() {
      Column() {
        Row() {
          Image($r('app.media.ic_image'))
            .objectFit(ImageFit.Fill)
            .width(24)
            .height(24)

          Text(CommonConstants.DOWNLOAD_FILE)
            .fontSize(12)
            .textAlign(TextAlign.Center)
            .fontColor(Color.Black)
            .margin({ left: 8 })
        }
        .width('100%')

        Progress({
          value: this.downloadProgress,
          total: CommonConstants.PROGRESS_TOTAL
        }).width('100%')

        Row() {
          if (this.downloadStatus === DOWNLOAD_STATUS.INITIAL) {
            this.customButton($r('app.string.button_download'), (): Promise<void> => this.start())
          } else if (this.downloadStatus === DOWNLOAD_STATUS.DOWNLOADING) {
            Row() {
              this.cancelButton()
              this.customButton($r('app.string.button_pause'), (): Promise<void> => this.pause())
            }
          } else if (this.downloadStatus === DOWNLOAD_STATUS.PAUSE) {
            Row() {
              this.cancelButton()
              this.customButton($r('app.string.button_resume'), (): Promise<void> => this.resume())
            }
          } else {
            Text('Download completed!')
              .fontSize(12)
          }
        }
        .width('100%')
        .justifyContent(FlexAlign.SpaceBetween)
      }
      .width('85%')
      .height(108)
      .backgroundColor(Color.White)
      .borderRadius(16)
      .justifyContent(FlexAlign.SpaceBetween)
      .padding(16)
    }
    .width('100%')
    .height('100%')
    .backgroundColor(Color.Grey)
    .justifyContent(FlexAlign.Center)
  }

  // Background Operations
  async start() {
    this.notificationTitle = await getStringByRes($r('app.string.notification_title_download'), this);
    this.downloadProgress = 0;
    this.downloadStatus = DOWNLOAD_STATUS.DOWNLOADING;

    this.requestSuspend();
    this.download();
  }

  async resume() {
    this.notificationTitle = await getStringByRes($r('app.string.notification_title_download'), this);
    this.downloadStatus = DOWNLOAD_STATUS.DOWNLOADING;
    this.requestSuspend();
    this.download();
  }

  async pause() {
    this.notificationTitle = await getStringByRes($r('app.string.notification_title_pause'), this);
    this.downloadStatus = DOWNLOAD_STATUS.PAUSE;
    clearInterval(this.interval);
    this.cancelSuspend();
    if (this.isSupport) {
      publishNotification(this.downloadProgress, this.notificationTitle, this.wantAgentObj);
    }
  }

  async cancel() {
    this.downloadProgress = 0;
    this.downloadStatus = DOWNLOAD_STATUS.INITIAL;
    clearInterval(this.interval);
    this.cancelSuspend();
    notificationManager.cancel(CommonConstants.NOTIFICATION_ID);
  }

  requestSuspend() {
    const reason = 'File downloading';
    try {
      const info = backgroundTaskManager.requestSuspendDelay(reason, () => {
        Logger.warn('⏱ Transient task about to timeout.');
        this.cancel();
      });
      this.requestId = info.requestId;
      Logger.info(`Transient task started with ID: ${this.requestId}`);
    } catch (err) {
      Logger.error(`requestSuspendDelay failed: ${(err as BusinessError).message}`);
    }
  }

  cancelSuspend() {
    try {
      if (this.requestId !== -1) {
        backgroundTaskManager.cancelSuspendDelay(this.requestId);
        Logger.info(`Transient task canceled (ID: ${this.requestId})`);
        this.requestId = -1;
      }
    } catch (err) {
      Logger.error(`cancelSuspendDelay failed: ${(err as BusinessError).message}`);
    }
  }

  download() {
    this.interval = setInterval(async () => {
      if (this.downloadProgress >= CommonConstants.PROGRESS_TOTAL) {
        clearInterval(this.interval);
        this.notificationTitle = await getStringByRes($r('app.string.notification_title_finish'), this);
        this.downloadStatus = DOWNLOAD_STATUS.FINISHED;
        this.cancelSuspend();
        Logger.info('Download finished.');
      } else {
        this.downloadProgress += CommonConstants.PROGRESS_SPEED;
        Logger.info(`Downloading... progress: ${this.downloadProgress}`);
      }

      if (this.isSupport) {
        publishNotification(this.downloadProgress, this.notificationTitle, this.wantAgentObj);
      }
    }, CommonConstants.UPDATE_FREQUENCY);
  }

  @Builder
  customButton(textResource: Resource, click: Function = () => {
  }) {
    Button(textResource)
      .fontSize(8)
      .backgroundColor($r('app.color.button_color'))
      .buttonsStyle()
      .onClick(() => {
        click();
      })
  }

  @Builder
  cancelButton() {
    Button($r('app.string.button_cancel'))
      .buttonsStyle()
      .backgroundColor($r('app.color.cancel_button_color'))
      .fontColor($r('app.color.button_color'))
      .margin({ right: 8 })
      .onClick(() => {
        this.cancel();
      })
  }
}

@Extend(Button)
function buttonsStyle() {
  .constraintSize({ minWidth: 64 })
  .height(24)
  .borderRadius(14)
  .fontSize(8)
}

Conclusion

This implementation demonstrates how to manage transient background tasks effectively to keep downloads running smoothly even when the app is in the background. By integrating notification updates and lifecycle management, it ensures a seamless user experience. Overall, it highlights the power of HarmonyOS BackgroundTasksKit in handling real-time background operations.

References

https://developer.huawei.com/consumer/en/doc/harmonyos-guides-V13/transient-task-V13?source=post_page-----27ce4c67494a---------------------------------------

Written by Emine Inan

Neural Portfolio: Where AI Meets Creativity (Built with Google Gemini &amp; Cloud Run)

2026-01-21 17:56:00

This is a submission for the New Year, New You Portfolio Challenge Presented by Google AI

About Me

"Data is not just numbers; it's the heartbeat of the future."

Hi everyone, I'm Nguyen Tien Dung, an aspiring AI Engineer from HCMUT.

For years, I viewed portfolios as static digital business cards—functional, but lifeless. But as I delve deeper into the world of Artificial Intelligence and Neural Networks, my perspective shifted. I realized that connection is everything. Just as neurons fire to create thoughts, I wanted my digital presence to spark a connection with every visitor.

My goal with this project wasn't just to list my skills. I wanted to break the mold. I dreamed of creating a living, breathing digital space that truly represents my "Neural Network" mindset—a place where data connects, ideas flow fluently, and AI is not just a tool, but the very core of the experience.

This portfolio is my handshake to the world—a fusion of my logic as an engineer and my soul as a creator.

Portfolio

How I Built It

The Journey from "Hello World" to "Hello AI"

Building this portfolio felt less like coding and more like co-authoring a story with a brilliant partner. That partner was Antigravity (Google's agentic IDE).

We live in an era where we no longer have to code alone. Adopting an "AI-First" approach, I treated the development process as a dialogue.

🛠️ The Symphony of Tech

  • Google Gemini AI (The Soul): This is the beating heart of my "Neural Terminal." I didn't want a generic chatbot. I used Gemini to craft an assistant that understands context, speaks in my tone, and knows my resume better than I do. It turns a passive view into an active conversation.
  • Three.js (The Atmosphere): To visualize the concept of connection, I built an immersive 3D neural network background. It represents the infinite possibilities of AI.
  • Google Cloud Run (The Home): Deploying a static site is easy; securing it is an art. Cloud Run allowed me to containerize my dreams and ship them globally with a single command.
  • Vanilla JS/CSS (The Performance): No heavy frameworks, just pure, optimized code for that buttery-smooth 60fps experience.

💡 The Agentic Workflow

There were moments of frustration. CSS interactions broke, scroll animations lagged, and exposing API keys was a constant fear. But having Antigravity as my co-pilot changed everything. We pair-programmed through the night. It didn't just fix my bugs; it taught me why they happened. Specifically, moving from a local environment to a secure, Dockerized container on Google Cloud Run was a steep learning curve that we conquered together.

What I'm Most Proud Of

It’s the little details that whisper, not shout.

  1. The "Neural Terminal" Experience: It’s not just a chatbox; it’s a portal. When you type ai hello, you aren't querying a database; you're interacting with a digital extension of myself. Seeing it respond intelligently for the first time gave me goosebumps.
  2. The "Unbroken" Flow: I obsessed over the User Experience. The custom cursor isn't just a decoration; it’s a guide that dances with your movement. Fixing the specific overflow issues to ensure the site feels like one continuous, fluid journey was my biggest technical victory.
  3. Security Meets Simplicity: I am incredibly proud of implementing a production-grade deployment pipeline. Using Environment Variables on Cloud Run to protect my Gemini API keys proved to me that I can build systems that are not only beautiful but also robust and secure.

This project is more than code. It’s a statement that in 2026, we don't just build websites; we build experiences.

Thank you for visiting my world. 🚀

Solved: Backup All GitHub Repositories to S3 Bucket Automatically

2026-01-21 17:54:58

🚀 Executive Summary

TL;DR: This guide provides an automated, cost-effective solution to mitigate data loss risks by backing up all GitHub repositories. It leverages a Python script to clone repositories, archive them, and upload them to an AWS S3 bucket, orchestrated by GitHub Actions for scheduled execution.

🎯 Key Takeaways

  • The solution uses a Python script with requests for GitHub API interaction, subprocess for Git operations, shutil for archiving, and boto3 for S3 uploads.
  • Authentication requires a GitHub Personal Access Token (PAT) with repo scope and AWS IAM user credentials (Access Key ID, Secret Access Key) with s3:PutObject, s3:ListBucket, and s3:GetObject permissions on the target S3 bucket.
  • Automation is achieved via GitHub Actions, configured with a cron schedule for daily backups and workflow\_dispatch for manual triggers, securely passing sensitive credentials as repository secrets.

Backup All GitHub Repositories to S3 Bucket Automatically

As a Senior DevOps Engineer and Technical Writer for TechResolve, I understand the critical importance of data resilience. In today’s cloud-native landscape, while platforms like GitHub offer high availability, relying solely on a single vendor for your invaluable source code can be a significant risk. Disasters, accidental deletions, or even account compromises can lead to irreparable data loss if not properly safeguarded.

Manual backups are tedious, error-prone, and often overlooked, especially in fast-paced development environments. The cost of specialized third-party backup solutions can also be prohibitive for many teams. This tutorial addresses these challenges by providing a robust, automated, and cost-effective solution to back up all your GitHub repositories directly to an Amazon S3 bucket.

By the end of this guide, you will have a fully automated system that regularly pulls all your GitHub repositories and archives them securely in S3, giving you peace of mind, improved disaster recovery capabilities, and full control over your code’s backups.

Prerequisites

Before we dive into the automation, ensure you have the following in place:

  • GitHub Account: Access to the repositories you wish to back up.
  • GitHub Personal Access Token (PAT): With appropriate scopes. For private repositories, the repo scope (all sub-options) is required. For public repositories only, public_repo is sufficient. We recommend generating a token specifically for this backup process.
  • AWS Account: With permissions to create S3 buckets and IAM users/roles.
  • AWS S3 Bucket: A bucket configured in your AWS account to store the backups.
  • AWS IAM User or Role: With programmatic access (Access Key ID and Secret Access Key) and permissions to perform s3:PutObject and s3:ListBucket actions on the designated S3 bucket.
  • Python 3.x: Installed on the system where the script will run (or implicitly available in a GitHub Actions runner).
  • pip: Python’s package installer.
  • Git: Command-line Git installed (also implicitly available in GitHub Actions runners).

Step-by-Step Guide: Automating Your GitHub Backups

Step 1: Create a GitHub Personal Access Token (PAT)

This token will allow our script to authenticate with GitHub’s API and clone your repositories. Treat it like a password.

  1. Go to your GitHub profile settings.
  2. Navigate to “Developer settings” > “Personal access tokens” > “Tokens (classic)”.
  3. Click “Generate new token” > “Generate new token (classic)”.
  4. Give it a descriptive name (e.g., “S3-Backup-Script”).
  5. Set an appropriate expiration (e.g., 90 days, 1 year, or no expiration if managed securely). Remember to rotate it regularly.
  6. Under “Select scopes”, check repo (all sub-options) to ensure it can access both public and private repositories. If you only have public repos to back up, public_repo will suffice.
  7. Click “Generate token”.
  8. IMPORTANT: Copy the token immediately. You will not be able to see it again. Store it securely.

Step 2: Configure AWS S3 Bucket and IAM Permissions

We need an S3 bucket to store the archives and an IAM entity with the necessary permissions.

  1. Create an S3 Bucket: If you don’t have one, navigate to the S3 service in your AWS Console and create a new bucket. Choose a unique name and region. For security, consider enabling server-side encryption and versioning on the bucket.
  2. Create an IAM User (or Role):

    1. Go to the IAM service in your AWS Console.
    2. Navigate to “Users” > “Add user”.
    3. Give the user a name (e.g., “github-backup-user”) and select “Programmatic access”.
    4. On the “Permissions” page, choose “Attach existing policies directly” and then “Create policy”.
    5. Use the JSON tab to define a policy like this, replacing your-s3-bucket-name with your actual bucket name:
      {
          "Version": "2012-10-17",
          "Statement": [
              {
                  "Effect": "Allow",
                  "Action": [
                      "s3:PutObject",
                      "s3:ListBucket",
                      "s3:GetObject"
                  ],
                  "Resource": [
                      "arn:aws:s3:::your-s3-bucket-name/*",
                      "arn:aws:s3:::your-s3-bucket-name"
                  ]
              }
          ]
      }
    

    This policy grants permissions to upload objects, list the bucket content, and retrieve objects (for verification if needed).

    1. Save the policy, then attach it to your newly created IAM user.
    2. Complete the user creation process. Copy the Access Key ID and Secret Access Key. Store them securely.

Step 3: Develop the Backup Script (Python)

This Python script will fetch your repositories, clone them, create archives, and upload them to S3. Create a file named backup_github.py.

First, install the necessary Python libraries:

pip install requests boto3

Now, here’s the Python script:

import os
import requests
import subprocess
import shutil
import datetime
import boto3
import logging

# Configure logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')

# --- Configuration from Environment Variables ---
GITHUB_TOKEN = os.getenv('GITHUB_TOKEN')
AWS_ACCESS_KEY_ID = os.getenv('AWS_ACCESS_KEY_ID')
AWS_SECRET_ACCESS_KEY = os.getenv('AWS_SECRET_ACCESS_KEY')
S3_BUCKET_NAME = os.getenv('S3_BUCKET_NAME')
AWS_REGION = os.getenv('AWS_REGION', 'us-east-1') # Default to us-east-1 if not set

# Directory to temporarily store cloned repos
TEMP_DIR = 'github_backup_temp'

# --- GitHub API Functions ---
def get_user_repos(token):
    headers = {'Authorization': f'token {token}'}
    repos = []
    page = 1
    while True:
        response = requests.get(f'https://api.github.com/user/repos?type=all&per_page=100&page={page}', headers=headers)
        response.raise_for_status() # Raise HTTPError for bad responses (4xx or 5xx)
        page_repos = response.json()
        if not page_repos:
            break
        repos.extend(page_repos)
        page += 1
    return repos

# --- Git Operations ---
def clone_repo(repo_url, local_path, token):
    try:
        # For private repos, embed the token in the URL
        # For public repos, the token is not strictly needed for cloning, but doesn't hurt.
        auth_repo_url = repo_url.replace('https://', f'https://oauth2:{token}@')
        logging.info(f"Cloning {repo_url} to {local_path}...")
        subprocess.run(['git', 'clone', auth_repo_url, local_path], check=True, capture_output=True)
        logging.info(f"Successfully cloned {repo_url}")
    except subprocess.CalledProcessError as e:
        logging.error(f"Failed to clone {repo_url}. Error: {e.stderr.decode().strip()}")
        raise

# --- Archiving ---
def create_archive(source_dir, output_filename):
    logging.info(f"Creating archive for {source_dir}...")
    # shutil.make_archive creates a .tar.gz by default on Unix-like systems, or .zip on Windows.
    # We explicitly specify 'zip' for cross-platform consistency.
    archive_name = shutil.make_archive(output_filename, 'zip', source_dir)
    logging.info(f"Archive created: {archive_name}")
    return archive_name

# --- S3 Operations ---
def upload_to_s3(file_path, bucket_name, s3_key, region):
    logging.info(f"Uploading {file_path} to s3://{bucket_name}/{s3_key}...")
    try:
        s3 = boto3.client(
            's3',
            aws_access_key_id=AWS_ACCESS_KEY_ID,
            aws_secret_access_key=AWS_SECRET_ACCESS_KEY,
            region_name=region
        )
        s3.upload_file(file_path, bucket_name, s3_key)
        logging.info(f"Successfully uploaded {file_path} to S3.")
    except Exception as e:
        logging.error(f"Failed to upload {file_path} to S3. Error: {e}")
        raise

# --- Main Backup Logic ---
def main():
    if not all([GITHUB_TOKEN, AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, S3_BUCKET_NAME]):
        logging.error("Missing one or more required environment variables (GITHUB_TOKEN, AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, S3_BUCKET_NAME). Exiting.")
        exit(1)

    # Clean up any previous temp directory before starting
    if os.path.exists(TEMP_DIR):
        shutil.rmtree(TEMP_DIR)
        logging.info(f"Cleaned up previous temporary directory: {TEMP_DIR}")
    os.makedirs(TEMP_DIR, exist_ok=True)

    try:
        repos = get_user_repos(GITHUB_TOKEN)
        logging.info(f"Found {len(repos)} repositories to back up.")

        backup_date = datetime.datetime.now().strftime('%Y-%m-%d_%H-%M-%S')

        for repo in repos:
            repo_name = repo['name']
            repo_clone_url = repo['clone_url']
            repo_owner = repo['owner']['login']
            local_repo_path = os.path.join(TEMP_DIR, repo_owner, repo_name)

            try:
                # Clone the repository
                clone_repo(repo_clone_url, local_repo_path, GITHUB_TOKEN)

                # Create a zip archive
                archive_base_name = f"{repo_owner}-{repo_name}-{backup_date}"
                archive_full_path = create_archive(local_repo_path, os.path.join(TEMP_DIR, archive_base_name))

                # Define S3 key (path in S3)
                s3_key = f"github-backups/{repo_owner}/{repo_name}/{os.path.basename(archive_full_path)}"

                # Upload to S3
                upload_to_s3(archive_full_path, S3_BUCKET_NAME, s3_key, AWS_REGION)

            except Exception as e:
                logging.error(f"An error occurred while processing repository {repo_name}: {e}")
            finally:
                # Clean up local clone and archive
                if os.path.exists(local_repo_path):
                    shutil.rmtree(local_repo_path)
                    logging.info(f"Cleaned up local clone of {repo_name}")
                if 'archive_full_path' in locals() and os.path.exists(archive_full_path):
                    os.remove(archive_full_path)
                    logging.info(f"Cleaned up local archive of {repo_name}")

    except requests.exceptions.HTTPError as e:
        logging.error(f"GitHub API Error: {e}. Check your GITHUB_TOKEN and its permissions.")
    except Exception as e:
        logging.error(f"An unexpected error occurred during the backup process: {e}")
    finally:
        # Final cleanup of the main temporary directory
        if os.path.exists(TEMP_DIR):
            shutil.rmtree(TEMP_DIR)
            logging.info(f"Final cleanup: Removed temporary directory {TEMP_DIR}")

if __name__ == '__main__':
    main()

Code Logic Explanation:

  • Environment Variables: The script relies on environment variables for sensitive credentials (GitHub Token, AWS Keys) and configuration (S3 Bucket, AWS Region). This is a best practice for security.
  • get_user_repos(token): Fetches all repositories associated with the authenticated GitHub user. It handles pagination to ensure all repositories are retrieved.
  • clone_repo(...): Uses the git clone command via Python’s subprocess module. For private repositories, the GitHub PAT is embedded in the clone URL (e.g., https://oauth2:[email protected]/owner/repo.git) for authentication.
  • create_archive(...): Utilizes Python’s shutil.make_archive to create a ZIP archive of the cloned repository. We use ZIP for broad compatibility.
  • upload_to_s3(...): Employs the boto3 library to connect to AWS S3 and upload the generated archive file. Credentials are passed directly for the client initialization, which is useful in environments where AWS CLI config isn’t available, like GitHub Actions without specific AWS configuration actions.
  • Main Loop: Iterates through each fetched repository, clones it into a temporary directory, archives it, uploads the archive to S3 with a descriptive key (path), and then cleans up the temporary local files.
  • Error Handling & Logging: Includes basic error handling for API calls, Git operations, and S3 uploads, with informative logging messages to track progress and issues.
  • Cleanup: Ensures that local temporary directories and archives are removed after each repository is processed and at the end of the script run.

Step 4: Automate with GitHub Actions

GitHub Actions is an excellent choice for this automation as it’s tightly integrated with GitHub and provides a robust, serverless environment to run your script on a schedule.

  1. Store Secrets: Go to your GitHub repository (or organization) > “Settings” > “Secrets and variables” > “Actions” > “New repository secret”.

Add the following secrets:

  • USER_GITHUB_PAT: Your GitHub Personal Access Token created in Step 1. (Note: use a different name than GITHUB_TOKEN to avoid confusion with the default token provided by GitHub Actions).
  • AWS_ACCESS_KEY_ID: Your AWS Access Key ID from Step 2.
  • AWS_SECRET_ACCESS_KEY: Your AWS Secret Access Key from Step 2.
  • S3_BUCKET_NAME: The name of your S3 bucket.
  • AWS_REGION: The AWS region of your S3 bucket (e.g., us-east-1).
    1. Create Workflow File: In your GitHub repository, create a directory .github/workflows/ and inside it, create a file named backup.yml.
    2. Add Workflow Content: Paste the following YAML into .github/workflows/backup.yml:
   name: Daily GitHub Repo Backup to S3

   on:
     schedule:
       # Runs every day at 02:00 AM UTC
       - cron: '0 2 * * *'
     workflow_dispatch:
       # Allows manual trigger of the workflow

   jobs:
     backup_repositories:
       runs-on: ubuntu-latest

       env:
         GITHUB_TOKEN: ${{ secrets.USER_GITHUB_PAT }}
         AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
         AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
         S3_BUCKET_NAME: ${{ secrets.S3_BUCKET_NAME }}
         AWS_REGION: ${{ secrets.AWS_REGION }}

       steps:
       - name: Checkout repository (optional, if script is in this repo)
         uses: actions/checkout@v4

       - name: Set up Python
         uses: actions/setup-python@v5
         with:
           python-version: '3.x' # Specify your preferred Python version, e.g., '3.9'

       - name: Install Python dependencies
         run: |
           python -m pip install --upgrade pip
           pip install requests boto3

       - name: Run GitHub backup script
         run: |
           # If your script is directly in the repo root:
           python backup_github.py
           # If your script is in a subdirectory, e.g., 'scripts/':
           # python scripts/backup_github.py

Workflow Logic Explanation:

  • on: schedule: This defines when the workflow will run automatically. The cron: '0 2 * * *' expression means it will run daily at 02:00 AM UTC. You can adjust this to your needs.
  • workflow_dispatch: Adds a button to manually trigger the workflow from the GitHub Actions tab, useful for testing.
  • env:: All the secrets we defined earlier are passed into the job as environment variables, making them accessible to our Python script.
  • uses: actions/checkout@v4: This step is necessary if your backup_github.py script is located within the same GitHub repository where you’re setting up the Action.
  • uses: actions/setup-python@v5: Configures a Python environment on the runner.
  • pip install ...: Installs the required Python libraries.
  • python backup_github.py: Executes your backup script.

Commit this backup.yml file to your repository. The GitHub Action will now automatically run based on your schedule, backing up all your repositories to S3!

Common Pitfalls

  • GitHub API Rate Limits: If you have an extremely large number of repositories or run the script too frequently, you might hit GitHub’s API rate limits. The script includes basic retry mechanisms via requests.raise_for_status(), but for very high volume, consider exponential backoff or spacing out runs.
  • Authentication Errors (GitHub PAT or AWS Credentials):
    • GitHub: Double-check your USER_GITHUB_PAT secret. Ensure it has the correct repo scopes. If cloning fails for private repos, it’s almost always a token issue.
    • AWS: Verify your AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and the IAM policy. Ensure the policy grants s3:PutObject and s3:ListBucket on the specific S3 bucket.
    • S3 Bucket Name/Region: Ensure S3_BUCKET_NAME and AWS_REGION are exactly correct.
  • Large Repositories or Many Repos: Cloning large repositories or a very high number of repositories can take time and consume disk space on the GitHub Actions runner. GitHub Actions runners have ample space for typical use, but extreme cases might require optimization or a self-hosted runner.
  • Empty S3 Bucket: Verify the S3 bucket exists and is correctly named in your AWS environment. If the bucket is in a different region than specified, boto3 will error out.
  • Timeout: The GitHub Actions job might time out if the backup takes longer than the allowed job duration (default 6 hours for standard runners). Consider breaking down the backup, optimizing the script, or using self-hosted runners for very extensive backups.

Conclusion

Congratulations! You’ve successfully set up an automated, cost-effective, and robust system to back up all your GitHub repositories to an AWS S3 bucket. This solution provides a vital layer of protection against data loss, ensuring your valuable source code is always safe and accessible, independent of GitHub’s operational status.

This automated process frees up your team from tedious manual tasks, allowing them to focus on innovation while enjoying the peace of mind that comes with a solid disaster recovery strategy.

What’s Next?

  • S3 Lifecycle Policies: Configure S3 lifecycle rules to automatically transition older backups to cheaper storage classes (like S3 Glacier) or expire them after a certain period to manage costs and data retention.
  • Backup Encryption: While S3 offers server-side encryption by default, consider client-side encryption for an extra layer of security before uploading backups.
  • Organization Repositories: Modify the get_user_repos function to get_org_repos(org_name, token) using the https://api.github.com/orgs/{org_name}/repos endpoint if you need to back up repositories belonging to a GitHub organization.
  • Backup Other GitHub Data: Extend the script to back up other critical GitHub data like Gist, Wikis, or GitHub Issues (via their respective APIs).
  • Monitoring and Alerting: Set up AWS CloudWatch alarms on your S3 bucket (e.g., for object creation) or monitor GitHub Actions workflow runs to ensure your backups are consistently succeeding.

Stay vigilant, stay secure, and keep innovating with TechResolve!

Darian Vance

👉 Read the original article on TechResolve.blog

Support my work

If this article helped you, you can buy me a coffee:

👉 https://buymeacoffee.com/darianvance