AUTOMATESTACK

Reclaiming Flow: A Guide to Sustainable Productivity and Digital Sanity

Sumit Sur — Thu, 18 Dec 2025 14:46:21 GMT

If you looked at my calendar three years ago, you would have seen a mosaic of 30-minute meeting fragments, scattered Jira tickets, and "quick syncs" that effectively destroyed any chance of deep work. Like many in the tech industry, I wore my busyness like a badge of honor. I assumed that responding to Slack messages within 30 seconds was the definition of being "responsive" and "reliable."

The reality, however, was much grimmer. I was technically "working" 10 hours a day, but my actual output was suffering. Worse, my mental well-being was deteriorating under the weight of constant context switching.

Today, my setup looks very different. It is not just about having a faster processor or an ergonomic chair; it is about a philosophy of defensive resource management. Here is the honest breakdown of how I rebuilt my productivity workflow to prioritize focus over frenzy.

✨ The Hardware: Simplicity as a Feature

I used to be obsessed with having the maximum screen real estate—three monitors, a tablet for metrics, and a phone always propped up. I realized eventually that more pixels often just meant more vectors for distraction.

My current physical setup is intentionally reductive:

Single wide Monitor: It allows for side-by-side code and documentation without the neck strain of twisting between screens.
Noise-Canceling Headphones: These are non-negotiable. They are my physical "Do Not Disturb" sign.
Mechanical Keyboard: The tactile feedback helps induce a rhythm in typing, which can actually trigger a flow state trigger for me.

But hardware is the easy part. The real battle is software and psychology.

🧠 The "Deep Work" Protocol

The core of my productivity philosophy is Cal Newport’s concept of "Deep Work." As Cloud Engineers and creators, we need long, uninterrupted blocks of time to load complex contexts into our working memory or troubleshoot that critical issue within a time frame. A single notification can topple that mental house of cards.

To protect these blocks, I implemented a strict 4-hour "Deep Focus" window every morning. During this time:

Async First: I close MS Teams and Email. My team knows that unless the server room is literally on fire, I am unavailable until 1:00 PM.
Task Batching: I group all administrative minutiae (updating tickets, code reviews, emails) into a "shallow work" block in the late afternoon when my cognitive energy is naturally lower.

🚧 The Missing Link: Enforcing Boundaries

Here is the truthful part that most productivity gurus skip: Willpower is a finite resource.

In the beginning, I tried to simply "promise myself" I wouldn't check Reddit while my script runs successfully or scroll Twitter when I hit a logic error. I failed constantly. The dopamine loop of social media is engineered by some of the smartest minds in our industry to be irresistible. I needed a tool that was stronger than my own wavering discipline.

This is where 👉 🧘‍♂️ DigitalZen.app 🧘‍♂️ became the cornerstone of my digital hygiene.

I had tried other blockers before, but they were either too easy to bypass or too clunky to configure. DigitalZen integrated seamlessly into my workflow because it doesn't just "block sites"; it helps curate an environment. I set it up to whitelist only my essential dev tools (GitHub, StackOverflow, documentation sites) and blacklist the infinite-scroll traps (social media, news aggregators) during my deep focus Time.

It doesn't just block websites. I can set it to block the Steam client, Discord, or even my email client during deep work hours. My distractions aren't just websites; they are other applications running on my system too.

The feature I love most is the out-of-the-box focus templates, which make it seamless to block specific categories of websites such as social media or adult content.

The difference was immediate. When my brain instinctively reached for a distraction during a difficult coding problem, the block page was a gentle but firm reminder: Not now. Stick with the problem. It outsourced my self-discipline, preserving my mental energy for the actual work.

This Digitalzen feature page does an excellent job of explaining the impressive capabilities of the platform. 👉 DigitalZen.app How it works

🧘‍♂️ Mental Well-being: The Art of Disconnecting

Productivity is meaningless if you burn out in six months. A major part of my new setup involves rigid boundaries between "Online" and "Offline."

The "Always On" culture is a fast track to anxiety. By using 🧘‍♂️ DigitalZen.app 🧘‍♂️ to lock me out of work apps after 7:00 PM, I force myself to decompress. It sounds paradoxical to use software to stop using software, but that hard stop allows me to engage in analog hobbies—reading, cooking, or just walking without a podcast playing.

It has anti-tamper features too. You can’t just kill the process or uninstall it when you get the urge to slack off. It forces you to stick to the schedule you set when you were in a rational state of mind.

This downtime is not "wasted" time; it is recovery time. It is during these quiet moments that my brain processes the day's information, often leading to solutions for bugs that baffled me hours earlier.

🎉 The Result

Since adopting this distraction-free architecture, my output hasn't just increased in volume; it has increased in value. I ship cleaner code, write better documentation, and actually enjoy the process of building & troubleshooting again.

If you are finding yourself drowning in digital noise, stop trying to "try harder." Build a system that protects your attention. Whether it’s optimizing your physical desk or using tools like 🧘‍♂️ DigitalZen.app 🧘‍♂️ to guard your focus, the goal is the same: effortless consistency in a chaotic world.

Productivity isn't about doing more things; it's about doing the right things with your full attention.

Understanding Pod Priority and Preemption in Kubernetes: A Detailed Guide

Sumit Sur — Tue, 09 Sep 2025 06:35:05 GMT

Introduction

In Kubernetes, Pod Priority and Preemption is a powerful scheduling feature that ensures critical workloads are placed and maintained on your cluster, even when resources are scarce. With this mechanism, Kubernetes can automatically preempt (evict) lower-priority pods to make room for higher-priority ones, helping orchestrate resource-efficient and reliable workload execution. Introduced as generally available in Kubernetes v1.14, this feature has become a staple for cluster operations.

1. What Is Pod Priority?

Pod Priority is an integer value assigned to a Pod, representing its importance relative to others. Higher values indicate higher importance in scheduling decisions.

Pods without an explicit priority use a default value of 0.
Priorities are defined through PriorityClass objects, which are non-namespaced resources that map a name to an integer priority.

flowchart TD
    %% Nodes
    A([📦 Pod Scheduled]) --> B{⚖️ Cluster Under Pressure?}

    B -- "Yes" --> C[🔥 Node Pressure Eviction]
    C --> D[💀 Pod terminated on node]
    D --> E{🧑‍✈️ Controlled by ReplicaSet/Deployment?}
    E -- "Yes" --> F[♻️ Controller creates new Pod]
    F --> G[🚀 Scheduler places on another node]
    E -- "No" --> H[❌ Pod stays deleted]

    B -- "No, but Higher Priority Pod Pending" --> I[⬆️ Preemption Triggered]
    I --> J[⚔️ Lower priority pods evicted]
    J --> D

    B -- "No pressure & no higher priority pod" --> K[✅ Pod keeps running]

    %% Styling
    classDef start fill:#2E86C1,color:#fff,stroke:#1B4F72,stroke-width:2px;
    classDef decision fill:#F4D03F,color:#000,stroke:#B7950B,stroke-width:2px;
    classDef danger fill:#E74C3C,color:#fff,stroke:#922B21,stroke-width:2px;
    classDef dead fill:#6E2C00,color:#fff,stroke:#641E16,stroke-width:2px;
    classDef controller fill:#27AE60,color:#fff,stroke:#145A32,stroke-width:2px;
    classDef running fill:#1ABC9C,color:#fff,stroke:#0E6251,stroke-width:2px;

    %% Assign classes
    A:::start
    B:::decision
    C:::danger
    I:::danger
    D:::dead
    J:::danger
    E:::decision
    F:::controller
    G:::controller
    H:::dead
    K:::running

2. Defining Priority: `PriorityClass`

A PriorityClass defines both the name and numerical value of a priority:

apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: high-priority-apps
value: 1000000
globalDefault: false
description: "Pods critical to business logic."

value: Higher numbers mean higher priority.
globalDefault: If true, this is the default for pods without a specified priorityClassName—but only for pods created after the class exists.

Kubernetes ships with two default system-critical classes:

system-node-critical (≈ 2,000,001,000)
system-cluster-critical (≈ 2,000,000,000)

3. Scheduling: How Priority Influences Order

Once Pod Priority is in place, the scheduler sorts pending pods by priority. High-priority pods are attempted first. If scheduling a high-priority pod fails due to resource constraints, the scheduler may then preempt lower-priority pods to make room.

4. Preemption: Making Space for What Matters

When a pending pod cannot be scheduled:

The scheduler looks for nodes where evicting one or more lower-priority pods would free enough capacity.
It evicts the minimal necessary set of pods to schedule the higher-priority pod.
When a pod is evicted (whether due to preemption or node pressure eviction):
1. The pod is terminated on the node where it is running.
2. The pod is deleted from the current node.
3. If the pod belongs to a controller (e.g., Deployment, StatefulSet, ReplicaSet, Job, etc.), that controller will notice the missing replica and create a new pod.
4. The scheduler will then place this new pod on another suitable node.

So effectively, Standalone Pod (not managed by a controller): Once evicted, it is gone permanently.

Pod managed by a controller: It gets recreated, usually on another node, assuming resources are available.

Scheduling metadata:

The pending pod’s status.nominatedNodeName indicates which node is targeted for preemption. However, the pod may ultimately be scheduled elsewhere if conditions change.

Important Constraints:

Victim pods terminate using their graceful termination period (default ~30 seconds), which delays when space becomes available.
PodDisruptionBudget (PDB) is respected on a best-effort basis but can be violated if no alternate victim set exists.
Inter-pod affinity: If the pending pod requires co-location with lower-priority pods, preemption won't occur on that node.
Cross-node preemption is not supported: The scheduler doesn’t preempt pods on other nodes to alleviate anti-affinity constraints.

5. Non-Preempting Priority Classes

Introduced in Kubernetes v1.24, you can define a PriorityClass with:

preemptionPolicy: Never

This means pods with this class will:

Queue ahead of lower-priority pods.
Not preempt other pods.
Be preempted by even higher-priority pods.

This is useful, for example, in ML or data science workflows where you want to ensure high scheduling priority without disrupting running services.

6. Interplay with QoS and Eviction

While Pod QoS classes (Guaranteed, Burstable, BestEffort) affect eviction precedence during node-pressure scenarios, they don’t influence scheduling preemption. The scheduler focuses solely on priority values—QoS only comes into play during evictions and not scheduling.

At node pressure, pods are ranked for eviction by:

Exceeding resource requests
Priority
Resource usage relative to requests

7. Why Use Pod Priority and Preemption?

Reliability: Ensures critical workloads are scheduled promptly without over-provisioning clusters.
Resource utilization: Hosts both mission-critical and lower-priority workloads together, evicting non-essential pods under pressure.
Operational flexibility: You can finely control priority and preemption behavior using Policy, Preemption settings, and PDB nuances.

8. Sample YAML Snippet

apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: high-priority-apps
value: 1000000
globalDefault: false
description: "Priority for critical services."
---
apiVersion: v1
kind: Pod
metadata:
  name: nginx
spec:
  containers:
  - name: nginx
    image: nginx
  priorityClassName: high-priority-apps

To create a non-preempting class via kubectl:

kubectl create priorityclass high-priority --value=1000 \
  --description="High priority but non-preempting" \
  --preemption-policy="Never"

9. Best Practices & Troubleshooting

Scenario	Guidance
Unintended preemptions	Ensure priority levels are correctly assigned; empty `priorityClassName` defaults to `0`.
Pending pods not scheduling after preemption	Another higher-priority pod may have taken precedence. This is expected.
Higher-priority pods evicted first	The scheduler may choose nodes where victims have the lowest priority or where PDB isn't violated.
Affinity issues	Avoid inter-pod affinity that ties a high-priority pod to a lower-priority pod, as it can block preemption.
Termination latency in scheduling gap	Reduce or set `terminationGracePeriodSeconds` to a small value on lower-priority pods.

How to Self-Host n8n for Free Forever on Oracle Cloud

Sumit Sur — Fri, 05 Sep 2025 06:25:40 GMT

n8n stands out as one of the most powerful open-source low-code AI workflow automation tools available. While cloud-hosted n8n can get expensive quickly, Oracle Cloud's Always Free tier offers an incredible opportunity to run n8n completely free, forever.

In this comprehensive guide, I'll walk you through setting up n8n on Oracle Cloud Infrastructure (OCI) using their generous Always Free tier, which includes compute instances that never expire.

💡Why Oracle Cloud's Always Free Tier?

Oracle Cloud's Always Free tier is genuinely impressive:

2 AMD-based Compute VMs with 1/8 OCPU and 1 GB memory each
Up to 4 Arm-based Ampere A1 cores and 24 GB of memory (can be used as one VM or split)
200 GB total Block Volume storage
10 GB Object Storage
Always Free - no time limits, no credit expiration

The Arm-based instances are particularly powerful for running n8n, offering excellent performance for automation workflows.

For a more detailed overview of the Oracle Free Tier, refer to Oracle Free Tier & Always Free Resources.

✅ Prerequisites

Before we begin, you'll need:

Oracle Cloud account (free sign-up)
A registered Domain name (recommended for HTTPS)

🏗️ Architecture

This setup demonstrates how to run n8n securely inside an OCI Compute VM using Docker and Traefik as the reverse proxy.

1. Traffic Flow

A user accesses n8n.example.com, which resolves via DNS to the VM’s public IP.
Requests on ports 80/443 reach the Traefik container inside the VM.
Traefik forwards HTTPS traffic through the traefik-public network to the n8n container.

2. Security & Certificates

Traefik manages SSL/TLS certificates automatically with Let’s Encrypt CA.
Certificates are issued and renewed using the ACME TLS challenge.
All certificates are stored securely in ./letsencrypt/acme.json.

3. Service Discovery

Traefik integrates with the Docker Socket to dynamically discover running containers.
This eliminates manual configuration whenever services are added or updated.

4. Application Layer

n8n container hosts the workflow automation platform.
Postgres container provides persistent database storage, connected via the n8n-network (port 5432).

5. Infrastructure

All components (Traefik, n8n, Postgres) run as Docker containers inside a single OCI Compute VM.
Networking is logically separated using Docker networks (traefik-public and n8n-network).

Step 1: 📝Create Your Oracle Cloud Account

Visit oracle.com/cloud/free
Sign up for a free account
Complete the verification process (requires credit card for verification, but won't be charged)
Wait for account activation

Step 2: 🖥️✨ Set Up Your Compute Instance

Create a Virtual Cloud Network
- Log into your OCI Console
- Navigate to Networking → Virtual cloud networks

We will take a /16 subnet for the VCN, in this case 10.0.0.0/16

Create a /24 subnet inside the VCN where the VM instance will be connected
- 10.0.1.0/24
- Create it as a Public Subnet

Create a Internet gateway for the VCN
Create the VM Instance
- Navigate to Compute → Instances
- Click Create Instance
- Image and Shape:
  - Image: Ubuntu 22.04 LTS (Always Free-eligible)
  - Shape: VM.Standard.A1.Flex (Arm-based)
  - OCPU: 2 (or all 4 if you want maximum performance)
  - Memory: 12 GB (or up to 24 GB)

In the Networking tab, Select the VCN & subnet that we already created at the beginning

SSH Keys:

Generate a new key pair and download both public and private keys
Keep the private key secure - you'll need it to access your server

Click Create and wait for the instance to provision

Configure Network Security

Go to Networking → Virtual Cloud Networks
Click on your VCN
Click on Security Lists → Default Security List

Add ingress rules:

 Port 22 (SSH): 0.0.0.0/0
 Port 80 (HTTP): 0.0.0.0/0
 Port 443 (HTTPS): 0.0.0.0/0
 Port 5678 (n8n): 0.0.0.0/0 (temporary, we'll remove this later)

Step 3: 🔑 Connect to Your Instance

1. Note your instance's public IP address

2. Connect via SSH:

bash ssh -i /path/to/your/private-key ubuntu@YOUR_PUBLIC_IP

For Windows users with PuTTY, convert the private key to .ppk format first

Step 4: ⚙️ Prepare the Server

Update the System

bash sudo apt update && sudo apt upgrade -y

Install Docker and Docker Compose

# Add Docker's official GPG key:
sudo apt-get update
sudo apt-get install ca-certificates curl
sudo install -m 0755 -d /etc/apt/keyrings
sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
sudo chmod a+r /etc/apt/keyrings/docker.asc

# Add the repository to Apt sources:
echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \
  $(. /etc/os-release && echo "${UBUNTU_CODENAME:-$VERSION_CODENAME}") stable" | \
  sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update

#install the latest version
sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin

# Add your user to docker group
sudo usermod -aG docker $USER

Reconnect to your server after logging out.

Step 5: 🐳 Deploy n8n with Docker Compose

Prepare the project directories and files on the VM

~/n8n-oracle-cloud/
├── traefik/                 
│   └── acme.json            # ACME storage for Let's Encrypt certificates
├── prod/                    
    ├── .env                 # Environment variables
    └── docker-compose.yaml  # Main Compose stack

🚀 All the code is available on my GitHub Repo!
👉 Clone this repository to your VM to get all the necessary code 🖥️💻

# Example: clone repo to home
cd ~
git clone https://github.com/sumitsaz23/n8n-docker-traefik-postgres.git n8n-oracle-cloud
cd n8n-oracle-cloud

Secure acme.json for Traefik

Traefik requires acme.json to be present and readable/writable by the Traefik container but with strict permissions (600)

# create acme file and set permissions
touch traefik/acme.json
chmod 600 traefik/acme.json

Create docker-compose.yml file:


services:

  # -------------------------
  # Postgres (self-managed)
  # -------------------------
  postgres:
    image: postgres:16-alpine   # Postgres 16 (lightweight alpine)
    container_name: n8n_postgres
    restart: unless-stopped
    # Named volume for persistent DB files
    volumes:
      - pgdata:/var/lib/postgresql/data
    environment:
      - POSTGRES_USER=${DB_POSTGRESDB_USER}
      - POSTGRES_DB=${DB_POSTGRESDB_DATABASE}
      - POSTGRES_PASSWORD=${DB_POSTGRESDB_PASSWORD} 
      - PGDATA=/var/lib/postgresql/data/pgdata
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U ${DB_POSTGRESDB_USER} -d ${DB_POSTGRESDB_DATABASE} || exit 1"]
      interval: 10s
      timeout: 5s
      retries: 5
      start_period: 15s
    networks:
      - n8n-network
    mem_limit: 2g
    cpus: 1.0

  # -------------------------
  # n8n main (web UI + webhooks)
  # -------------------------
  n8n:
    image: n8nio/n8n:latest
    container_name: n8n_main
    restart: unless-stopped
    depends_on:
      - postgres
    volumes:
      # named volume for user-related files, credentials, workflows, logs, etc
      - n8n_data:/home/node/.n8n
    environment:
      # Database (Postgres) - prefer file-based secret usage
      - DB_TYPE=postgresdb
      - DB_POSTGRESDB_HOST=postgres
      - DB_POSTGRESDB_PORT=${DB_POSTGRESDB_PORT}
      - DB_POSTGRESDB_DATABASE=${DB_POSTGRESDB_DATABASE}
      - DB_POSTGRESDB_USER=${DB_POSTGRESDB_USER}
      - DB_POSTGRESDB_PASSWORD=${DB_POSTGRESDB_PASSWORD}

      # n8n app settings
      - N8N_PORT=5678
      - N8N_PROTOCOL=https
      - N8N_ENFORCE_SETTINGS_FILE_PERMISSIONS=true
      - N8N_REINSTALL_MISSING_PACKAGES=true
      - N8N_RUNNERS_ENABLED=true
      - WEBHOOK_URL=https://${N8N_HOSTNAME}             # actual public URL; override in .env
      - GENERIC_TIMEZONE=${TZ}                         # e.g., "UTC" or "Asia/Kolkata"

      # Basic auth for UI - use secret file variant
      - N8N_BASIC_AUTH_ACTIVE=${N8N_BASIC_AUTH_ACTIVE}
      - N8N_BASIC_AUTH_USER=${N8N_BASIC_AUTH_USER}
      - N8N_BASIC_AUTH_PASSWORD=${N8N_BASIC_AUTH_PASSWORD}

    networks:
      - n8n-network
      - traefik-public
    ports:
    #   # bind to container local port only — Traefik will route external traffic
      - 127.0.0.1:5678:5678
    labels:
      - "traefik.enable=true"
    # Tell Traefik which network to use to connect to this service
      - "traefik.docker.network=n8nstack_traefik-public"

    # --- HTTPS Router ---
      - "traefik.http.routers.n8n.rule=Host(`${N8N_HOSTNAME}`)"
      - "traefik.http.routers.n8n.entrypoints=websecure"
      - "traefik.http.routers.n8n.tls.certresolver=letsencrypt"

      # Traefik headers middleware for better security

      - traefik.http.routers.n8n.tls=true
      - traefik.http.middlewares.n8n.headers.SSLRedirect=true
      - traefik.http.middlewares.n8n.headers.STSSeconds=315360000
      - traefik.http.middlewares.n8n.headers.browserXSSFilter=true
      - traefik.http.middlewares.n8n.headers.contentTypeNosniff=true
      - traefik.http.middlewares.n8n.headers.forceSTSHeader=true
      - traefik.http.middlewares.n8n.headers.SSLHost=${N8N_HOSTNAME}
      - traefik.http.middlewares.n8n.headers.STSIncludeSubdomains=true
      - traefik.http.middlewares.n8n.headers.STSPreload=true
      - traefik.http.routers.n8n.middlewares=n8n@docker

    mem_limit: 4g
    cpus: 2

  # -------------------------
  # Traefik (reverse proxy / TLS automation)
  # -------------------------

  traefik:
    image: traefik:latest
    container_name: traefik
    restart: unless-stopped
    command:
    - --api.dashboard=true
    - --api.insecure=false  # Secure the dashboard
    - --providers.docker=true
    - --providers.docker.exposedbydefault=false
    - --providers.docker.network=n8nstack_traefik-public  # Specify network
    - --entrypoints.web.address=:80
    - --entrypoints.websecure.address=:443
    # Let's Encrypt configuration
    - --certificatesresolvers.letsencrypt.acme.httpchallenge=true
    - --certificatesresolvers.letsencrypt.acme.httpchallenge.entrypoint=web
    - --certificatesresolvers.letsencrypt.acme.email=${LETSENCRYPT_EMAIL}
    - --certificatesresolvers.letsencrypt.acme.storage=/letsencrypt/acme.json
    - --log.level=INFO  # Use INFO instead of DEBUG for production
    ports:
    - "80:80"
    - "443:443"
    - "8080:8080"
    volumes:
    #- traefik_data:/letsencrypt
    - /home/ubuntu/traefik/acme.json:/letsencrypt/acme.json
    - /home/ubuntu/traefik/log:/var/log/traefik
    - /var/run/docker.sock:/var/run/docker.sock:ro
    networks:
    - n8n-network
    - traefik-public

    mem_limit: 512m
    cpus: 0.5
# -------------------------
# Networks & Volumes
# -------------------------
networks:
  n8n-network:
    driver: bridge
  traefik-public:
    driver: bridge
volumes:
  pgdata:
    name: n8n_pgdata
  n8n_data:
    name: n8n_data

create .env file

# -------------------------
# General (non-sensitive)
# -------------------------
COMPOSE_PROJECT_NAME=n8nstack    #environment variable in Docker Compose used to define the project name for a set of Docker services
TZ=Asia/Kolkata                  # timezone for containers (set to your preferred timezone)
N8N_HOSTNAME=n8n.example.com   # <-- Replace with your public domain (used for WEBHOOK_URL & Traefik rule)
LETSENCRYPT_EMAIL=email@example.com

# -------------------------
# Postgres settings
# -------------------------
DB_POSTGRESDB_HOST=postgres
DB_POSTGRESDB_PORT=5432
DB_POSTGRESDB_DATABASE=n8n
DB_POSTGRESDB_USER=n8nuser
DB_POSTGRESDB_PASSWORD=dbsupersecret # in Production , do not put passwords in .env

# -------------------------
# n8n auth / behavior
# -------------------------
N8N_BASIC_AUTH_PASSWORD=n8nsupersecret  # in Production , do not put passwords in .env
N8N_BASIC_AUTH_ACTIVE=true
N8N_BASIC_AUTH_USER=admin

# Optional metrics/queue settings
N8N_METRICS=true
N8N_METRICS_INCLUDE_QUEUE_METRICS=true

# -------------------------
# Resource & tuning (example values)
# -------------------------
# For Postgres: max connections ≈ (typical) 100 (adjust in postgres.conf if needed)
DB_POOL_SIZE=20

Launch the docker compose config

docker-compose up -d

This will create the networks, volumes & the containers

Step 5: 🔍Verify

Verify if n8n has successfully initiated

docker compose logs n8n

#Verify if traefik ACME is able to successfully get a certificate
docker compose logs -f traefik

Now check the n8n portal using your browser

⚠️ Issues Faced & Fixes 🛠️

Traefik only shows the default self‑signed certificate

Symptom: When you open https://n8n.example.com, your browser shows the Traefik default certificate instead of a valid Let’s Encrypt cert.

Causes & Fixes:

Resolver name mismatch: The resolver name in your labels must match the resolver defined in Traefik’s command args.

Traefik command:

  --certificatesresolvers.myresolver.acme.email=you@example.com
  --certificatesresolvers.myresolver.acme.storage=/letsencrypt/acme.json
  --certificatesresolvers.myresolver.acme.tlschallenge=true

Label must match:

  - "traefik.http.routers.n8n.tls.certresolver=myresolver"

acme.json permissions: Ensure the file exists and is writable by Traefik:
```
  touch ./traefik/acme.json
  chmod 600 ./traefik/acme.json
```
Firewall/DNS: Ports 80/443 must be open, and n8n.example.com must resolve to your VPS IP.

Traefik fails to obtain ACME certificate when N8N_EDITOR_BASE_URL is set

Symptom: Traefik logs show certificate request failures, and n8n only loads behind the default cert. The issue appears right after setting N8N_EDITOR_BASE_URL=https://n8n.example.com/.

Cause: With TLS‑ALPN challenge, Traefik passes the ACME validation request through to the backend. If n8n enforces HTTPS at this point, the validation breaks.

Fixes:

Option A: Deploy without N8N_EDITOR_BASE_URL until the cert is issued, then set it and restart n8n.
Option B (better): Switch to HTTP‑01 challenge, which bypasses n8n entirely during validation.

--certificatesresolvers.le.acme.httpchallenge.entrypoint=web
Option C: Use DNS‑01 challenge if your DNS provider supports it (best for Cloudflare/Route53).

How to Deploy Your First Proxmox Virtual Machine Using Terraform

Sumit Sur — Mon, 30 Jun 2025 18:30:00 GMT

🧠 Why Terraform for Proxmox?

While Proxmox has a great web UI, infrastructure-as-code lets you:

Automate repeatable VM deployments
Keep configurations under version control
Easily spin up multi-VM Setups
Reduce human error

👋 Quick heads-up!

This guide is Part 2 of a multi-part series.

In this article, we’ll walk through How to Deploy a Virtual Machine in proxmox Using Terraform

👉 Jump to Part 1: How to Set Up Proxmox with Terraform →

📦 Prerequisites

Before you begin, make sure you have:

✅ A running Proxmox VE host or cluster
✅ A user with API access (e.g., terraform-user@pve)
✅ A cloud-init template VM ready to be cloned
✅ Terraform installed on your machine
✅ Installed the Telmate Proxmox Terraform provider

🗂 Project Directory Structure

Here’s the structure of the deployment repo we’ll use:

proxmox-vm-deploy/
├── main.tf
├── provider.tf
├── variables.tf
├── terraform.tfvars

Let’s go through each file.

🧩 provider.tf – Connect Terraform to Proxmox

terraform {
  required_providers {
    proxmox = {
      source = "Telmate/proxmox"
      version = "3.0.2-rc01" # use the latest version available
    }
  }
}


provider "proxmox" {
  pm_api_url = var.proxmox_api_url # the variable is defined in terraform.tfvars.This should match the URL in your Proxmox web interface, typically something like "https://:8006/api2/json"
  pm_parallel = 1
  pm_debug = false
  pm_tls_insecure = true
}

This sets up the connection to your Proxmox host. Make sure:

The user exists in Proxmox (pveum user add terraform-user@pve)
A suitable role (e.g., Terraform_Provisioner) with VM permissions is assigned
load the API tokens for connecting to proxmox as environment variables

👉 Check out part-1 of the series for a step-by-step guide on creating the required users, roles and tokens to connect via terraform

📜 variables.tf – Input Configuration


variable "vm_name" {
  type = string
}

variable "clone" {
  type = string
}

variable "ipconfig0" {
  type = string
}

variable "vmid" {
  type = number
}

variable "memory" {
  type = number
}

variable "cores" {
  type = number
}

variable "disk_size" {
  type = string
  description = "The size of the disk, should be at least as big as the disk in the template"
  default = "20G"

}

variable "storage" {
  type = string
  description = "the storage where the VM disk will be created"

}

variable "ssh-public-key" {
    type = string
    description = "SSH public key for the VMs"
    sensitive = true

}

variable "proxmox_api_url" {
    type        = string
    description = "Proxmox API URL"
}

variable "target_node" {
    type        = string
    description = "The Proxmox node where the VM will be created"

}

variable "nameserver" {
    type        = string
    description = "Nameserver for the VM"
    default     = "1.1.1.1 8.8.8.8"

}

variable "cicustom" {
    type        = string
    description = "Cloud-Init custom configuration"
    default     = "vendor=local:snippets/qemu-guest-agent.yml"

}

variable "cipassword" {
    type        = string
    description = "Cloud-Init password for the VM"
    sensitive = true

}

These variables define how your VM will look—name, clone template, IP config, memory, CPU, etc.

⚙️ terraform.tfvars – Your Custom Values

ssh-public-key = "ssh-ed25519 AAAAC3NzaC1lZXXXXXXXXXXXXXXXXXXXXXXwSOCiZ/OkpPDR3bR2tK4STIm+gnJk"
target_node = "proxmox-server-IP" # The Proxmox node where the VM will be created
cicustom = "value=local:snippets/install-packages.yml" # /var/lib/vz/snippets/install-packages.yml #
cipassword = "ubuntu"
ipconfig0 = "ip=dhcp"
vmid = 1000 # optional, if not set, Proxmox will assign a random VMID
vm_name = "ubuntu-vm"
clone = "ubuntu-24-04-cloudinit-copy" # The template to clone from
cores = 4 # Number of CPU cores
memory = 4096 # Memory in MB
nameserver = "1.1.1.1 8.8.8.8"
disk_size = "20G" # The size of the disk, should be at least as big as the disk in the template
storage = "hdd-vm-data" # The storage where the VM disk will be created

This is your configuration layer—values you want to pass to variables. Keep this file out of version control (.gitignore) if it includes sensitive info.

🏗 main.tf – Create the Virtual Machine

#create a new VM from a template with cloud-init enabled
resource "proxmox_vm_qemu" "ubuntu-vm" {

  # Basic VM configuration
  vmid        = var.vmid
  name        = var.vm_name
  target_node = var.target_node # The node where the VM will be created
  agent       = 1 # Enable the QEMU guest agent
  cpu {
    cores = var.cores
    sockets = 1
    numa = true
    type = "x86-64-v2-AES"
  }
  memory      = var.memory # Memory in MB
  bios        = "ovmf" # Use OVMF for UEFI support
  boot        = "order=scsi0" # has to be the same as the OS disk of the template
  clone       = var.clone # The template to clone from
  scsihw      = "virtio-scsi-single" # Use VirtIO SCSI controller
  vm_state    = "running" # "running" or "stopped"
  automatic_reboot = true

  # Cloud-Init configuration
  cicustom   = var.cicustom
  ciupgrade  = true # it will upgrade the OS to the latest version
  nameserver = var.nameserver
  ipconfig0  = var.ipconfig0
  skip_ipv6  = true
  ciuser     = "root" # The user to use for the cloud-init script
  cipassword = var.cipassword # Password for the cloud-init user
  sshkeys    = var.ssh-public-key # The SSH public key to be added to the VM

  # Most cloud-init images require a serial device for their display
  serial {
    id = 0
  }

  # EFI disk for UEFI boot
  # This is required for cloud-init images that use UEFI
  # If your template does not use UEFI, you can remove this block
  efidisk {
    efitype = "4m" 
    storage = "hdd-vm-data"
  }

  # Disk configuration
  disks {
    scsi {
      scsi0 {
        # We have to specify the disk from our template, else Terraform will think it's not supposed to be there
        disk {
          storage = var.storage
          # The size of the disk should be at least as big as the disk in the template. If it's smaller, the disk will be recreated
          size    = var.disk_size
        }
      }

  # Some images require a cloud-init disk on the IDE controller, others on the SCSI or SATA controller
      scsi1 {
        cloudinit {
          storage = "hdd-vm-data"
        }
      }
    }
  }

  network {
    id = 0
    bridge = "vmbr0"
    model  = "virtio"
  }
}

Here’s what it does:

Clones an existing cloud-init-enabled VM template
Assigns VM name, IP, memory, CPU, etc.
Injects cloud-init config, such as SSH key and default user
Enables QEMU guest agent to enhance functionality

▶️ How to Run It

Open a terminal inside the project directory and follow these steps:

# use single quotes for the API token ID because of the exclamation mark
export PM_API_TOKEN_ID='terraform-user@pve!tf_token'
export PM_API_TOKEN_SECRET="XXXXXX-XXXX-XXXXX-XXXX-XXXXXXXXXXX"


# Initialize Terraform
terraform init


# Review execution plan
terraform plan

# Apply the configuration
terraform apply

✅ Verify in Proxmox

Go to your Proxmox Web UI
You’ll see a the VM
Confirm network and SSH access
Check if the VM booted from your template and has your custom config

🛠 Troubleshooting Tips

SSH not working? Ensure cloud-init was enabled in your template and your public SSH key is valid.
Error: Permission denied? Double-check the Proxmox user's role and permissions.
Wrong IP? Validate your ipconfig0 syntax (should follow ip=x.x.x.x/xx,gw=x.x.x.x).

🧪 What’s Next?

Once you get one VM working, you can:

Create multiple VMs using for_each
Automate post-deploy scripts via null_resource and remote-exec
Turn this into a Kubernetes cluster or homelab infra stack!

How to Set Up Proxmox with Terraform: A Step-by-Step Guide

Sumit Sur — Mon, 30 Jun 2025 18:30:00 GMT

Whether you’re just dipping your toes into Terraform or you’ve been automating your infrastructure for years, Hooking up Terraform with Proxmox is a simple step that unlocks a lot of automation power.

By the end of this guide, you’ll have learned how to:

Create a dedicated Proxmox role for Terraform
A service user and API token
A reusable Cloud‑Init template and snippets for quick VM creation

👋 Quick heads-up!

This guide is Part 1 of a multi-part series.

In this article, we’ll walk through setting up Proxmox for use with Terraform—creating roles, users, API tokens, and a Cloud-Init template.

In Part 2, we’ll use Terraform to deploy a real VM from this setup—complete with configuration and automation.

👉 Jump to Part 2: Deploy Your First VM →

Create a “Terraform-Friendly” Role

Let’s give Terraform only the permissions it truly needs: allocating disks, cloning VMs, tweaking cloud‑init settings, and so on. This is called privilege separation—and it’s a best practice to keep your environment secure.

Log into the Proxmox cluster or host using ssh

Create a new role Terraform_Provisioner_role
Create the user terraform-user@pve
Add the Terraform_Provisioner_role role to terraform-user

#create the role
pveum role add Terraform_Provisioner_role -privs "Datastore.AllocateSpace Datastore.AllocateTemplate Datastore.Audit Pool.Allocate Sys.Audit Sys.Console Sys.Modify VM.Allocate VM.Audit VM.Clone VM.Config.CDROM VM.Config.Cloudinit VM.Config.CPU VM.Config.Disk VM.Config.HWType VM.Config.Memory VM.Config.Network VM.Config.Options VM.Migrate VM.Monitor VM.PowerMgmt SDN.Use"
#create the user
pveum user add terraform-user@pve --password 
#assign the role to the user
pveum aclmod / -user terraform-user@pve -role Terraform_Provisioner_role

Generate an API Token

Rather than embedding a password in plain text, we’ll use an API token. This is both safer and more portable—especially in CI/CD pipelines.

You’ll see your token’s ID and secret in the output—jot them down for the next step.

pveum user token add terraform-user@pve tf_token

Output:

If you’re running Terraform from GitHub Actions, GitLab CI, or another platform, stash these in your project’s secret variables.

Disable Privilege separation

In our Terraform use case, we usually assign permissions to the user account (like terraform-user@pve), not directly to the token itself. So if "Privilege Separation" is enabled, the token might not have access to all the necessary resources—even though the user does!

To make sure the token works as expected, Privilege Separation must be disabled, so the token inherits the user's full role permissions.

Connecting from Terraform to Proxmox

We are using the Telmate Proxmox Terraform provider

Terraform needs two environment variables: the token ID (which includes your username) and the token secret. Here’s how you’d set them in your shell:

When using the Telmate Proxmox Terraform provider, you authenticate using environment variables that start with PM_. These are used by Terraform to connect to the Proxmox API.

# use single quotes for the API token ID because of the exclamation mark
export PM_API_TOKEN_ID='terraform-user@pve!tf_token'
export PM_API_TOKEN_SECRET="XXXXXX-XXXX-XXXXX-XXXX-XXXXXXXXXXX"

If using git actions, you can store these in repository variables

Creating a Cloud Init Template

Terraform works best when you have a VM template ready. We’ll start with Ubuntu 24.04 LTS cloud image

Download the Ubuntu cloud image. I choose the ubuntu server 24.04 LTS cloud image

This will download the image under /var/lib/vz/template/iso in the proxmox server
Install guest tools
- login to the proxmox server over ssh

# as root on your Proxmox host

apt update
apt install -y libguestfs-tools

Using the guest tools , we will customize the image to install:
- qemu-guest-agent
- Clear the existing machine ID in the image by setting the files to zero size. This ensures that each cloned VM will generate a new machine ID

sudo virt-customize -a ubuntu_24-04-server-cloudimg-amd64_copy.img \
  --install qemu-guest-agent \
  --run-command 'systemctl start qemu-guest-agent'

sudo virt-customize -a ubuntu_24-04-server-cloudimg-amd64_copy.img \
  --run-command "truncate -s 0 /etc/machine-id /var/lib/dbus/machine-id"

Create the VM for Template

#create the VM shell
qm create 9000 --name ubuntu-24-04-cloudinit

#import the cloid image to the vm disk
qm set 9000 --scsi0 local-lvm:0,import-from=/var/lib/vz/template/iso/noble-server-cloudimg-amd64.img

Convert to template

#convert to template
qm template 9000

Template created

Creating a Snippet

Snippets let you inject arbitrary cloud‑init YAML at VM creation time. Here’s how to set up a folder and add two quick examples:

mkdir /var/lib/vz/snippets

Now that we have a place to store the snippet, we can create the snippet itself. The following command will create a snippet install-packages.yml :

tee /var/lib/vz/snippets/install-packages.yml <#cloud-config
runcmd:
  - apt update
  - apt install -y curl jq
EOF

Wrapping Up

And that’s it! You now have:

A Terraform role limited to exactly what you need
A service user and token for secure API access
A cloud‑init template ready to spin up Ubuntu VMs
Handy snippets for customization

To learn how to use the template you just built to create real, working virtual machines with cloud-init, snippets, and more.

👉 🚀 Read Next: Deploy Your First Proxmox VM with Terraform

The Ultimate Guide to Right-Sizing CPU & Memory for Virtual Machines

Sumit Sur — Mon, 30 Jun 2025 18:30:00 GMT

In the world of virtualization, more CPU and memory doesn’t always mean better performance. Over-provisioning resources can lead to higher contention and even degraded application responsiveness.

Right-sizing ensures your workloads get exactly the resources they need—no more, no less—while maximizing efficiency on your ESXi hosts.

In this guide, we’ll explore practical CPU and memory sizing strategies, NUMA awareness, and performance optimization tips to help you make data-driven decisions.

Why Right-Sizing Matters

Every ESXi host has finite CPU and memory resources. If one VM hoards them unnecessarily, other VMs suffer. The goal of right-sizing is to:

Improve overall cluster performance by reducing contention.
Optimize hardware utilization.
Avoid VM-level performance degradation caused by unnecessary over-allocation.
Reduce software licensing costs for CPU-bound products like databases.

CPU Right-Sizing

When it comes to CPU allocation, more isn’t always better. Virtual CPUs (vCPUs) introduce scheduling overhead—allocating more than necessary can slow things down.

Best Practices

Start Small
- Begin with the minimum number of vCPUs required for peak load.
- Common starting point: 2 vCPUs for general-purpose workloads.
Add CPUs Only When Needed
- Monitor for CPU Ready Time or Co-stop events.
- If they are consistently high, then consider adding more vCPUs.
Align with NUMA Topology
- Configure vCPUs as Cores per Socket until:
  - You exceed the physical core count of a NUMA node, or
  - You exceed the memory available in a NUMA node.
Avoid Odd vCPU Counts
- Maintain even counts for better scheduling efficiency.
Disable vCPU Hot Add Unless Necessary
- Enabling hot add disables vNUMA—critical for workloads over 8 vCPUs.
Mind Licensing Models
- some software licensing schemes had limitations on socket counts, configuring the socket count to 1 may result in better performance
  - For example, SQL Server Standard edition running on an 8-vCPU VM with 1 core per socket would be able to utilize only 4 vCPUs. But if the same VM were configured with 1 socket (that is, 8 cores per socket), then all 8 vCPUs would be leveraged.

Understanding NUMA & vNUMA

Modern servers use NUMA (Non-Uniform Memory Access), where CPU and memory are grouped into nodes. Accessing memory within the same node is faster (local memory) than accessing from another (remote memory).

Let’s understand with an example. Lets assume:
- Each physical server has 2 CPU sockets.
- Each socket contains 24 physical cores.
- This means one NUMA node = 1 socket = 24 cores + its own memory bank.

If you create a VM with up to 24 vCPUs, you can assign them as 1 socket × 24 cores.

This keeps all vCPUs within the same NUMA node, so memory access stays local and fast.

If you create a VM with more than 24 vCPUs (say 32 vCPUs):

The vCPUs will be split across two NUMA nodes (because a single node can’t handle more than 24).
This means some CPU threads may need to access memory from the other node, which is slower.
At that point, you should configure vNUMA so the VM and its OS are aware of this split.

NUMA Tips

Keep vCPUs and memory allocations within a single NUMA node when possible.
vNUMA becomes relevant when a VM’s vCPUs exceed a single NUMA node’s core count (often >8 vCPUs).
vNUMA exposes NUMA topology to the guest OS so it can optimize memory locality.

Memory Right-Sizing

Just like CPU, over-allocating RAM can harm performance due to ballooning and swapping.

Best Practices

Start with the Working Set
- Measure actual memory use and allocate slightly above it.
- Example: If a VM uses 8 GB, allocate ~10 GB.
Avoid Large Headroom
- Unused RAM may be reclaimed by ballooning, which can cause performance drops.
When Ballooning Occurs
- Check if the VM is swapping/paging.
- Right-size VMs with over-allocated RAM to free resources for others.

Final Thoughts

Right-sizing isn’t about giving VMs the most resources—it’s about giving them the right amount. By starting small, monitoring real usage, and scaling based on evidence, it will lead to boosting performance & reduce resource contention.

Awareness of physical topology and NUMA limits is also key to making the most of your hardware.

Step-by-Step Guide: Upgrading vSphere 7 to 8

Sumit Sur — Mon, 31 Mar 2025 18:30:00 GMT

The support clock is ticking toward October 2, 2025—the official end-of-life date for vSphere 7.

If you still haven’t upgraded your vCenters yet, i hope you find this blog helpful in your journey.

vSphere 8 also brings many features improvements such as —

Distributed Services Engine (DPU Offload), Accelerated GPU Capabilities, Native Kubernetes Integration, Enhanced Life-cycle & Cluster Management, Advanced DRS & vMotion, vSAN Express Storage Architecture (ESA), vNUMA GUI Visualization & Security Enhancements.

You can read more about these in details here.

📊 Assessment & Planning

🔍 Compatibility Checks

Verify hardware (servers, NICs, storage) on VMware HCL (now Broadcom compatibility guide)
Confirm interoperability of other VMware products like NSX datacenter, VMware Cloud Director, plugins etc using VMware Product Interoperability Matrix
Validate third-party tools and backup/DR systems.
Decide on the target build number of the vcenter server & the esxi hosts for your environment
- In my environment, i decided to going with Vcenter 8U3e & esxi 8U3d

🔼 Upgrade Sequence

If you need to update multiple products in your environment, start with updating the product with the lowest sequence number from the table below.
If a product is not present in your environment, update the subsequent product.
If a product is managed by vRealize Suite Lifecycle Manager, the minimum version may be dictated by vRealize Suite Lifecycle Manager.
If upgrading from vSphere 6.7 with NSX, NVDS to CVDS migration is required. To migrate from NVDS to CVDS, you must first upgrade to vSphere 7.0 Update 2 or higher along with NSX 3.1.x or higher.

Refer this KB Article for more details.

🔢 Sequence	🧩 Component
🔵 1	🛠️ vRealize Suite Lifecycle Manager
🟢 2	👤 Identity Manager
🟡 3	📊 vRealize Log Insight
🟡 3	📈 vRealize Operations Manager
🟠 4	🌐 vRealize Network Insight
🟣 5	🤖 vRealize Automation
🔴 6	💾 VADP Backup Solution
🔵 7	🔄 vSphere Replication
🔵 7	🚨 Site Recovery Manager
🟢 8	🕸️ NSX
🟡 9	🧠 vCenter Server
🟠 10	🖥️ ESXi
🟣 11	🧰 VMware Tools
🔴 12	🧱 Virtual Hardware
🔴 12	📦 vSAN On-disk Format

⚙️ Mandatory Pre-checks

Upgrading to vCenter Server 8.0 requires an additional pre-check for certificates with weak signature algorithms
- The pre-check script from vmware ensures that vCenter Server is not using certificates with weak signature algorithms
Verify and resolve any expired vCenter Server certificates
- The vCert tool from vmware can be used to ease the management capability for most vCenter Server certificate-related operations
Use the VCF Diagnostic Tool for vSphere (VDT) directly on a vCenter Server appliance.to execute a series of checks on the system configuration and reports user-friendly PASS/FAIL/WARN results for known configuration issues

🚦 Running the pre-checks

I have copied the VDT tool & pre-check script (vsphere8_upgrade_certificate_checks) & the vCert tool to the vcenter appliance using SCP at /tmp/vcenter_8_upgrade_checks

✅ Running `vsphere8_upgrade_certificate_checks.py`

✅ Running VDT tool

The tool will generate a PASS/FAIL/WARN report on screen. it will also store the reports & VDT logs at /var/log/vmware/vdt/

In My environment, i received 2 FAIL

🛑 VMDIR Domain functional level

I first validated the Domain level using : usr/lib/vmware-vmafd/bin/dir-cli domain-functional-level get
vCenter that has been upgraded since version 6.5 will have a DFL of 1.
For vCenter version 7 and above, the domain level should be 4

To fix this i followed this vmware KB article and used the below command
Ensure that you have a snapshot before proceeding

$ /usr/lib/vmware-vmafd/bin/dir-cli domain-functional-level set --level 4 --login Administrator@vsphere.local --domain-name vsphere.local

$ service-control --restart vmdird

🛑 One of the Certificate in the trusted root is not a Certificate authority

I fixed this by removing cert from VMware directory using the vCert tool
Ensure that you have a snapshot before proceeding

I re-run the VDT tool to validate if the issues are fixed & are passed
Rebooted the vcenter to make sure there are no issues after the changes

🚀🔁 Upgrade Procedure

📋 preparations

Temporary IP for the upgrade process
- The upgrade installer deploys a new vCenter 8 appliance alongside the old one. A temporary IP lets the installer access and configure it without disrupting existing services
- After data migration, the new appliance shuts down the old one and adopts its original IP
- Ensure the temporary IP belongs in the same VLAN/subnet as the existing vCenter and is reachable on ports 443 & 5480 from the system running the vCenter installer.
For upgrading vCenter in a High Availability Environment, remove vCenter HA
Reboot 🔁 the vCenter to make sure there are no pending reboot. verify the services are all up using service-control --status --all
Backup 💾 - Make sure to take file based backup from VAMI (https://:5480)

Confirm that the backup is successful
Snapshot 📸 of the vCenter appliance VM.

If the vCenter is part of an Enhanced Linked Mode (ELM) , all vCenters in ELM must be powered off simultaneously*. Snapshots should be taken only after all vCenters are fully powered off.*
If the vCenter resides on a cluster with DRS set to Fully Automated, change the DRS mode to Partially Automated or Manual to prevent automatic load balancing during the upgrade.

🏁 Start Upgrade

📀 Mount the vCenter 8 appliance ISO & start the installer

🧭 Stage 1: Deployment

During Stage 1, the installer deploys a new vCenter 8 appliance alongside the old one. The temporary IP is assigned to it, and the installer will prepare/configure the vcenter services services.

Choose the upgrade option

Provide the connection details of the vCenter you want to upgrade, along with the credentials for the vCenter or ESXi host where the existing vCenter appliance is currently registered and managed.

Specify the target vCenter where you want to deploy the new vCenter 8 appliance.

Specify the vm name & credentials for the new appliance

Choose the deployment size

Review all the information & hit FINISH

🧭 Stage 2: Data Migration

Once stage 1 is completed, the installer will connect to https://:5480 to continue with the stage 2

Pre-upgrade check being run

If any errors are encountered, they must be resolved before proceeding.
Warnings should be reviewed to determine their impact or necessity for action, and addressed accordingly before moving forward.

Choose the appropriate data set that you need to copy over

Review all the information and hit FINISH

This begins the data copy process. Once the data is copied, the source vCenter appliance will be shut down automatically.

The vCenter 8 appliance will come up with the original IP

✅ Verification

Verify all vCenter services are up

You can re-run the VDT tool to verify if all tests passes. This can help identify any post upgrade issues

Verify in the vCenter web UI that all inventory data, configurations, and permissions have been correctly migrated.
Ensure that the new vCenter Server 8.0 is functioning as expected.
Check for any visible errors in the events section.
Verify that any plugins are re-deployed.
Check that all hosts are healthy and connected.
Ensure DRS and vMotion work as expected.
Check if the vSAN, NFS, and VMFS datastores are connected to the hosts.
Check in NSX Manager to ensure the cluster shows as healthy and NSX does not report any issues related to this vCenter.
Verify that other VMware services, like VMware Cloud Director, can connect successfully to the vCenter.

Now we can move on to the next step to upgrade the esxi hosts to ver. 8

Enable Hibernate mode on Ubuntu 25.04 (Plucky Puffin): A Step-by-Step Guide

Sumit Sur — Wed, 12 Mar 2025 18:30:00 GMT

Hibernation is not enabled by-default in Ubuntu.
If you try to hibernate using “sudo systemctl hibernate”, you will run into the below error:

call to Hibernate failed: Not enough suitable swap space for hibernation available on compatible block devices and file systems

Hibernation requires a swap space at least equal to your system’s RAM.
If the swap size is less than your RAM, you’ll need to increase it.

Check current swap usage:

swapon --show

free -h

Depending on your configuration, you either might see:

A swap partition:
OR a swap file:

If you don’t have a swap partition, Create a Dedicated Swap Partition.

Creating a Dedicated Swap Partition

Make sure the swap partition is equal or more then your RAM size.
Resize an existing partition with GParted and create a Linux swap partition.

Format and enable the swap:

Identify the correct device name and use the below commands to enable the swap
In my case the partition is /dev/nvme0n1p3

sudo mkswap /dev/nvme0n1p3
sudo swapon /dev/nvme0n1p3

Find the UUID of the partition:

sudo blkid /dev/nvme0n1p3

Add to `/etc/fstab`:

UUID="XXXXXX-XXXXX-XXX-XXX-XXXXXXXX" none swap sw 0 0

Reboot and verify the new swap is active

Edit GRUB configuration:

sudo nano /etc/default/grub

Find the line starting with GRUB_CMDLINE_LINUX_DEFAULT and add the resume parameter:

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash resume=UUID=your-swap-uuid"

Replace your-swap-uuid with the UUID you noted earlier

Update GRUB:

sudo update-grub

Rebuild initramfs to include resume info & reboot

sudo update-initramfs -u

sudo reboot

Enable Hibernate in PolicyKit:

sudo nano /etc/polkit-1/rules.d/10-enable-hibernate.rules

Add the following content to the config:

polkit.addRule(function(action, subject) {
if (action.id == "org.freedesktop.login1.hibernate" ||
action.id == "org.freedesktop.login1.hibernate-multiple-sessions" ||
action.id == "org.freedesktop.upower.hibernate" ||
action.id == "org.freedesktop.login1.handle-hibernate-key" ||
action.id == "org.freedesktop.login1.hibernate-ignore-inhibit")
{
return polkit.Result.YES;
}
});

Test Hibernation:

sudo systemctl hibernate

GUI Hibernate button:

You can also add a Hibernate button in Status menu using this gnome extension

https://extensions.gnome.org/extension/755/hibernate-status-button/

CPU Shares in VMware vSphere : A Complete Guide to Prioritizing Your VMs

Sumit Sur — Fri, 28 Feb 2025 18:30:00 GMT

In virtualized environments, not all workloads are created equal. Some applications require guaranteed CPU access under heavy contention, while others can tolerate delays. vSphere’s CPU Shares lets you assign relative priority weights to virtual machines (VMs) so that critical workloads run smoothly when CPU resources are scarce. In this post, you’ll learn:

What CPU Shares are and how they work
The scope of shares within resource pools and clusters
How to calculate and set custom share values per vCPU

What Are CPU Shares?

CPU Shares are a relative metric that the ESXi scheduler uses to allocate CPU time when demand exceeds supply.

Basically, Shares tell ESXi “how much cake” each VM gets when it’s time to cut slices. Your CPUs still run at full GHz; you’re just deciding which VM gets more turns at the table when everyone’s hungry.
Under contention, the ESXi scheduler divides CPU time slices based on each VM’s total share weight.
VMware presets three levels
- Low (500 shares/vCPU)
- Normal (1,000 shares/vCPU)
- High (2,000 shares/vCPU)
By default, every VM is set to Normal, meaning 1,000 shares for each virtual processor it has. These default values can be altered per-VM or per-resource-pool in the VM’s “Edit Settings” → “CPU Shares” section.

How CPU Shares Work Under Contention

No Contention: If the ESXi host has sufficient idle CPU capacity, VMware honors VM reservations and immediately grants CPU time as requested, without considering shares
Contention Phase: Once CPU demand exceeds supply—after all reservations are met—ESXi calculates each VM’s ResourceUsagePerShare and schedules CPU time in order of decreasing share entitlement.
Relative Allocation: Suppose three VMs have 2,000, 4,000, and 8,000 shares respectively; under contention, they receive CPU in a 1:2:4 ratio

CPU Shares Are Scoped to Their Parent Container

Resource Pool-Level Scope

When you set shares on a resource pool, you control how much of that pool’s entitled CPU capacity each of its direct children receives.
Sibling Pools: If you have two child pools under the same parent pool (including the cluster’s root pool), their share values dictate what fraction of the parent’s CPU resources each pool may consume under contention.
VMs Within a Pool: Inside that resource pool, each VM’s own share settings then determine how the pool’s allocated CPU is further split among those VMs.

Cluster-Level Scope via the Root Resource Pool

Every DRS cluster automatically has a root resource pool that represents 100% of the cluster’s CPU and memory resources.
All top‑level resource pools you create in the cluster are children of this root pool. Therefore, by setting shares on those top‑level pools, you are effectively prioritizing CPU access across the entire cluster under contention.
Conversely, shares set on a nested pool only affect that subtree— they do not influence sibling pools in other branches of the hierarchy.

Example

Cluster Root Pool (entitled to 100% of cluster CPU)
- Pool A: High shares (8,000)
- Pool B: Low shares (2,000)
- Result: Under contention, Pool A gets 80% of cluster CPU, Pool B gets 20%
Inside Pool A
- VM A1: 1,000 shares
- VM A2: 2,000 shares
- Result: Pool A’s 80% CPU is split 1:2 between A1 and A2 (≈26.7% vs. 53.3% of total cluster CPU)
Inside Pool B
- VMs share Pool B’s 20% entitlement according to their own share settings, without affecting Pool A or its VMs.

Prioritizing a Smaller VM over a Larger One

To make a 4‑vCPU VM outrank a 16‑vCPU VM:

Boost the 4‑vCPU VM’s Shares

Set to Custom = 20,000 shares total (via UI or PowerCLI).
It now has higher entitlement than the 16‑vCPU VM’s 16,000 shares.

Or Lower the 16‑vCPU VM’s Shares

Change it to Low (16 × 500 = 8,000 shares), putting it beneath the 4‑vCPU VM’s default High (8,000 shares) or your custom value.

Conclusion

CPU Shares in VMware vCenter are a powerful yet underutilized tool for workload prioritization. By understanding how shares work—within resource pools and across clusters—and by defining a clear baseline-per-vCPU policy, you can ensure that your mission-critical VMs always get the CPU cycles they need, even under heavy contention.

Resource Contention in vSphere : Identification and Solutions

Sumit Sur — Fri, 31 Jan 2025 18:30:00 GMT

VMware vSphere remains the platform of choice for many organizations seeking flexibility, scalability, and performance. However, as VM density rises and workloads become more varied, performance bottlenecks can surface. To maintain a healthy and responsive environment, we administrators must understand the key performance metrics & the story they tell.

In this blog , we'll try to dive deep into some of the critical performance metrics that affect the performance of VMs in a vSphere environment and explore the symptoms of contention.

CPU Ready (%RDY)
CPU Wait (%WAIT and %VMWAIT)
CPU Co-Stop (%CSTP)
Memory Ballooning

CPU Contention

Rethinking the vCPU to pCPU Ratio

Before we dive into specific metrics like %RDY or %CSTP, we must address one of the most fundamental questions in virtualization: What is the right vCPU to pCPU ratio?

For years, we administrators relied on general rules of thumb like 4:1 or even 10:1. These static ratios, however, were born in an era when many virtual workloads were largely idle. In such an environment, over-committing physical CPUs made sense. In today's world of resource-intensive applications & dynamic workloads, such a fixed ratio can lead to performance bottlenecks and unhappy users.

Drive by Contention

Instead of focusing on a static ratio, the modern approach is to "drive by contention. This means -

Actively monitor your environment for signs of CPU stress (like high Ready and Co-Stop times, which we'll cover next).
Expand your resource pools or adjust VM sizing based on real-world data.

This approach ensures that your applications have the resources they need, when they need them, without being constrained by an arbitrary ratio.
A conservative safe starting point is a 1:1 vCPU to pCPU ratio (not counting hyper-threading) which is most predictable.
This eliminates the risk of contention by dedicating a physical core to every virtual CPU. As your monitoring and operational processes mature, you can cautiously oversubscribe based on observed performance.
Ultimately, the optimal ratio is unique to your environmnet’s specific workloads, hardware, and based on your specific needs and observations.

CPU Metrics

CPU Ready (%RDY)

What Is CPU Ready?

CPU Ready measures the percentage of time a virtual CPU (vCPU) is ready to execute instructions but must wait in a queue for a physical CPU core to become available. In a heavily contended environment, meaning when the vcpu to pcpu ratio exceeds the prescribed vale from vmware vCPUs may wait longer before being scheduled, resulting in application slowdowns.

Impact on Performance

Increased Latency: Applications experience higher response times due to these micro-pauses in scheduling.
Reduced Throughput: The overall work processed per unit of time drops, affecting both batch and transactional workloads.

How to Identify CPU Ready

vSphere Client (vCenter)
- Monitor the "Ready" & the “Readiness“ metric under each VM's CPU performance chart.
- A good rule of thumb is to investigate when the ready time consistently exceeds 5% per vCPU
- For example, a 4-vCPU VM could tolerate up to 20% total ready time before showing significant degradation.
esxtop: In the ESXi shell, run esxtop and press c for the CPU view. Observe the %RDY column for each VM.
VMware Aria Operations

Best Practices to Reduce CPU Ready

Right-Size vCPU Count: This is the most effective solution. Avoid over-provisioning vCPUs. Assign the minimum number required by the workload inside the guest OS.
Use Affinity Rules Sparingly: CPU affinity rules restrict the scheduler's flexibility, which can increase ready time. Use them only for specific, well-understood licensing or application requirements.
Resource Pools and Shares: Allocate CPU shares, reservations, and limits thoughtfully to prioritize critical VMs and prevent "noisy neighbors" from consuming all available resources.
Cluster Sizing: Ensure your cluster has enough physical cores to support the peak requirements of its running VMs.

CPU Wait Time (%WAIT & %VMWAIT)

What Is CPU Wait Time?

This is one of the most misunderstood metrics. CPU Wait (%WAIT in esxtop) measures the time a vCPU is in a stopped state, waiting for an event. A high %WAIT value is not always a problem. It is composed of two key metrics:

Idle Time (%IDLE): Time the guest OS intentionally put the vCPU in a halt state because it had no work to do. This is normal and expected for a non-busy VM.
VMWait Time (%VMWAIT): Time the vCPU was forced to wait for a hypervisor event to complete, most commonly a storage I/O or network I/O operation. This is the metric that indicates a potential problem.

The formula is simple: %WAIT = %IDLE + %VMWAIT.

Impact on Performance

A high %WAIT driven by high %IDLE has no negative performance impact; it simply means the VM is idle.
A high %WAIT driven by high %VMWAIT indicates a genuine infrastructure bottleneck, causing application stalls, slow data access, and poor user experience.

How to Identify Real CPU Wait Issues

In the ESXi shell, run esxtop and press c for the CPU view.
Observe the %WAIT column. If it's high, proceed to the next step.
Press f to change fields, navigate to the VMWAIT metric, and press the spacebar to add it to the view.
Analyze the results: If %VMWAIT is high, you have confirmed a bottleneck, likely related to storage or network latency. If %VMWAIT is low, the VM is simply idle, and no action is needed.

Best Practices to Reduce High %VMWAIT

Optimize Storage Paths: Ensure multipathing is correctly configured and all paths are active.
Upgrade Storage Tiers: Move latency-sensitive workloads to faster storage (e.g., NVMe, SSD-backed datastores).
Check Network Latency: Investigate network device performance if storage appears healthy.
Adjust Queue Depths: Tune HBA and storage array queue depths to handle your workload's I/O profile.

CPU Co-Stop (%CSTP)

What Is CPU Co-Stop?

CPU Co-Stop (%CSTP in esxtop) measures the time a vCPU is forcibly stopped by the hypervisor to allow its sibling vCPUs within the same VM to catch up. This occurs in Symmetric Multi-Processing (SMP) VMs when the hypervisor cannot schedule all of the VM's vCPUs on physical cores simultaneously. It is a direct symptom of CPU over-contention, especially for "wide" VMs (those with many vCPUs).

Impact on Performance

Synchronization Overhead: Multi-threaded applications suffer from added latency as some threads are paused, waiting for others.
Unpredictable Performance: Co-stop spikes lead to performance "jitter" in CPU-intensive workloads.

How to Identify CPU Co-Stop

esxtop: In the CPU view (c), press f to change fields and add the %CSTP column. Any value consistently above 3% is a cause for concern.
vRealize Operations: Advanced analytics can track and alert on %CSTP anomalies over time.

Best Practices to Mitigate CPU Co-Stop

Minimize vCPU Count: The primary solution is to right-size VMs with the fewest vCPUs they truly need. A 2 vCPU VM is far less likely to experience co-stop than an 8-vCPU VM.
NUMA Awareness: Align VM vCPU and memory sizing with the host's physical NUMA topology to avoid performance penalties from cross-node memory access.
Avoid Heavy Over subscription: Keep the overall vCPU-to-physical-core ratio on the host within reasonable bounds (e.g., a 4:1 ratio is a common starting point, but this depends heavily on the workload).

Memory Metrics

Memory Ballooning

What Is Memory Ballooning?

Memory ballooning is a memory reclamation technique used by the ESXi hypervisor when a host is under memory pressure. A balloon driver (vmmemctl) inside the guest OS "inflates" by requesting memory from the guest. This forces the guest OS to use its own memory management (e.g., its page/swap file) to free up pages, which the hypervisor can then reclaim and allocate to another VM.

Impact on Performance

Guest-Level Paging: When ballooning is active, the guest OS is forced to swap memory to its own virtual disk. This disk I/O is thousands of times slower than accessing RAM, severely degrading application performance.
Increased Disk I/O: Guest OS swap activity generates additional storage load, which can compound existing I/O bottlenecks.

How to Identify Ballooning

vSphere Client: In the VM’s "Memory" performance chart, monitor the “Ballooned memory” metric. Any sustained non-zero value indicates the host is or was recently under memory pressure.
esxtop: In esxtop, press m for memory view. Check the MCTLSZ column for the amount of memory being reclaimed by the balloon driver.

Best Practices to Minimize Ballooning

Right-Size VM Memory: Allocate only the memory the application truly needs. Over-allocating RAM to idle VMs "traps" that memory, making it unavailable to other VMs.
Monitor Host Memory Usage: Ensure hosts have sufficient free memory to avoid contention. Use vSphere DRS to balance memory load across a cluster.
Use Reservations for Critical VMs: If a VM must never have its memory reclaimed, set a memory reservation.
Leverage vSphere Host Cache: Configure swap-to-host-cache on a fast SSD to mitigate the performance impact when host-level swapping is unavoidable.

Conclusion

Effective VMware vSphere performance tuning depends on a deep understanding of these metrics.

By correctly interpreting CPU Ready, Wait, and Co-Stop, we can distinguish between an idle VM and one genuinely struggling with contention.

Right-sizing resources, optimizing infrastructure, and continuous monitoring are key to a high-performing virtual environment.

Implement continuous performance monitoring using vRealize Operations or native vCenter dashboards.

Effortlessly Import Your VirtualBox/VMware VMs to Proxmox Using Bash Script

Sumit Sur — Sat, 30 Nov 2024 18:30:00 GMT

📖 Description:

This script streamlines the process of provisioning a VM in Proxmox from an OVA file. It performs the following steps:

Extracts the specified OVA archive.
Converts the contained virtual disk (VMDK/VHD) to QCOW2 format.
Creates a new Proxmox VM with basic resources.
Imports the QCOW2 disk into a specified Proxmox storage.
Attaches the imported disk as a SCSI device to the created VM.

🔧 Prerequisites

Proxmox VE with qm and qemu-img utilities installed and accessible in your PATH.
Appropriate permissions to create VMs and access storage on the Proxmox host.
The OVA file and working directory must be readable/writable by the script user.

🚀 Usage

./import_ova.sh   [STORAGE] [TEMPLATE_DIR]

📋 OVA Import Parameters

Full OVA filename (including the .ova extension)

Proxmox VM ID to create and import the disk into

[STORAGE] (Optional)
Proxmox storage target for the disk
Default: local-lvm

[TEMPLATE_DIR] (Optional)
Directory containing the OVA file and generated images
Default: /var/lib/vz/import

📤 Upload the OVA to Proxmox server

Web interface:
- you can upload the .ova file in the import section of the local storage Using the proxmox web interface
- This is put the file at /var/lib/vz/import/

Using SCP:

  scp vm01.ova root@:/var/lib/vz/template/

▶️ Run the script:

Get the script at the Github Link

Locally:

  git clone https://github.com/sumitsaz23/proxmox-scripts.git

  cd proxmox-scripts/import_ova/

  ./import_ova.sh my-vm.ova 123 my-storage /var/lib/vz/template

Download & Run Directly using `curl:`

  curl -sSL \
    https://raw.githubusercontent.com/sumitsaz23/proxmox-scripts/main/import_ova/import_ova.sh \
    | bash -s -- my-vm.ova 123 my-storage /var/lib/vz/template

VM Successfully booting up

Azure VM : Custom Data vs. User Data

Sumit Sur — Wed, 31 Jul 2024 18:30:00 GMT

If you’ve ever provisioned a Virtual Machine (VM) in Azure, you’ve likely stared at the "Advanced" tab during creation and wondered, "Should I put my script in Custom Data? Or is User Data the way to go?"

While both features allow you to inject data into your VM, they serve very different phases of the VM's lifecycle. Confusing them can lead to automation failures or security gaps.

In this post, we’ll break down what can be used in each, the critical differences, and—most importantly—how to secure them.

1. Custom Data: The "Bootstrapper"

Custom Data is the classic, tried-and-true method for bootstrapping Azure VMs. Think of it as the "instruction manual" you hand to the VM the very first time it wakes up.

What is it used for?

It is primarily designed for provisioning. It tells the VM how to configure itself immediately after creation.

What can be used in it?

Cloud-init files (Linux): This is the most common use case. You pass a YAML file that creates users, installs packages (like Nginx or Docker), and writes configuration files.
Shell Scripts (Bash): Simple startup scripts for Linux.
PowerShell Scripts (Windows): While you can put them here, Windows does not execute them automatically by default.
Configuration Files: Any base64-encoded file (config, JSON, XML).

How it works

Linux: If your image uses cloud-init (standard on Ubuntu, RHEL, CentOS, etc.), it automatically detects, decodes, and executes the Custom Data during the first boot only.
Windows: Azure places the data in a binary file at %SYSTEMDRIVE%\AzureData\CustomData.bin. It sits there passively. To run it, you must use a separate tool (like the Custom Script Extension) or have a scheduled task pre-baked into your image to look for and execute this file.

2. User Data: The "Persistent Store"

User Data is a newer feature designed to offer a persistent data store that stays with the VM throughout its life.

What is it used for?

It is designed for runtime configuration and metadata that your application might need to check periodically. Unlike custom data, it is meant to be accessible easily via standard APIs from within the VM.

What can be used in it?

Environment Flags: e.g.,ENV=Production, ClusterID=12345.
Version Pins: e.g., AppVersion=2.1.0.
Bootstrapping Scripts: Modern versions of cloud-init (21.2+) can consume User Data for provisioning if Custom Data is empty.
Custom Config Blobs: A JSON blob containing connection strings (non-sensitive ones!) or feature toggles.

How it works

Persistence: User Data persists for the lifetime of the VM. You can even update it while the VM is running (though the VM won't know unless it polls for changes).
Accessibility: It is available via the Azure Instance Metadata Service (IMDS). Any process inside the VM can retrieve it by querying a local endpoint.
IMDS is a REST API that's available at a well-known, non-routable IP address (169.254.169.254). You can only access it from within the VM. Communication between the VM and IMDS never leaves the host.
```
  ## For Windows ##
  Invoke-RestMethod -Headers @{"Metadata"="true"} -Method GET -NoProxy -Uri "http://169.254.169.254/metadata/instance?api-version=2025-04-07" | ConvertTo-Json -Depth 64
```
```
  ## For Linux ##
  curl -s -H Metadata:true --noproxy "*" "http://169.254.169.254/metadata/instance?api-version=2025-04-07" | jq
```

3. The Comparison Table

Feature	Custom Data	User Data
Primary Goal	Initial Boot/Provisioning	Persistent Configuration/Metadata
Execution	Automatic (Linux/cloud-init)	Passive (Data store)*
Persistence	Available at boot; hard to retrieve later	Available anytime via API (IMDS)
Updateable?	No (Static after creation)	Yes (Can be updated anytime)
Retrieval Method	File on disk (`ovf-env.xml` / `.bin`)	HTTP Request (IMDS)
Size Limit	64 KB	64 KB

\Note: While User Data is passive by default, modern cloud-init can be configured to execute it.*

4. Security

Security is the most critical differentiator. Because both methods involve passing data to a VM, it is tempting to dump secrets (passwords, API keys) here. Do not do this.

Why is it unsafe?

1. Custom Data Risks (The File System Risk)

Exposure: Custom Data is stored as a file on the VM's disk.
- Linux: It often resides in /var/lib/waagent/ovf-env.xml or /var/lib/cloud/instance/. Any user with read access to these directories (typically root/sudo) can read it.
- Windows: It sits in %SYSTEMDRIVE%\AzureData\CustomData.bin.
Logging: If your script prints secrets to the console (stdout/stderr) during execution, those secrets might end up in system logs (/var/log/cloud-init-output.log or Azure Boot Diagnostics logs), which are viewable from the Azure Portal.

2. User Data Risks (The IMDS Risk)

Open Access: User Data is served via the Instance Metadata Service (IMDS), a local HTTP server at 169.254.169.254.
No Authentication: By default, any process running on that VM (not just root/admin) can query this URL and retrieve the data. If an attacker manages to run a simple script on your VM (or exploits a web app vulnerability like SSRF), they can easily read your User Data.
Clear Text: The API returns the data in base64, which is trivially easy to decode. It is effectively clear text.

Best Practices for Security

If you can't put secrets in Custom/User Data, how do you get them into the VM?

Instead of passing the password in Custom Data, pass the instruction to get the password.
- Enable a System Assigned Managed Identity on the VM.
- Grant that identity access to an Azure Key Vault.
- Use Custom Data to run a script (using Azure CLI or PowerShell) that logs in using the Managed Identity (az login --identity) and fetches the secret from the Key Vault.
Restrict IMDS Access (Defense in Depth): If you use User Data, ensure you are not running untrusted code on the VM. You can also use local OS firewalls (iptables/Windows Firewall) to restrict which users or processes can talk to 169.254.169.254.
Assume Visibility: Always assume that anyone with access to the VM (even low-level access) can read everything in Custom Data and User Data. Treat these fields as "public" relative to the VM's internal environment.

Deploying a Python Web App with on Kubernetes and Persistent NFS Storage

Sumit Sur — Fri, 31 May 2024 18:30:00 GMT

You’ll learn how to :

- Build a simple Flask web app that Displays an image on a web page and allows users to upload and replace the image

- Containerizing it using Docker

- Pushing the image to a Docker hub registry

- Deploying it in Kubernetes - Create Kubernetes secrets, deployments, service, init container

- Setting up persistent volumes with an NFS volume using static PVs/PVCs

Folder Structure

Below is a typical folder structure for this project.

This structure separates the application logic, HTML templates, and static assets, making it easy to maintain and deploy.

my_app/
├── app.py # Flask application code
├── Dockerfile # Dockerfile for containerization
├── requirements.txt # (Optional) Python dependencies
├── templates/
│ └── index.html # HTML template for the web app
└── static/
├── uploads/ # Directory to store uploaded images (persistent)
└── images/
└── logo.png # logo used in the header

1. Setting Up Your Development Environment

Installing Python and Flask

Before starting, ensure you have Python 3 installed. You can install it using:

sudo apt update && sudo apt install python3 python3-pip -y # Ubuntu/Debian

brew install python3 #MacOS

For Windows, download and install Python from python.org.

Next, install Flask:

pip3 install flask

2. Building the Flask App

Flask is a lightweight and flexible Python web framework used for building web applications quickly. It is minimalist yet powerful.

How It Works:

The app initially displays static/uploads/current.jpg.
Users can upload an image via a form.
The uploaded image replaces the old one.

Lets add the code to the app.py for the application & the index.html for the web-interface

app.py

from flask import Flask, render_template, request, redirect, url_for
import os

app = Flask(__name__)
UPLOAD_FOLDER = 'static/uploads/'
app.config['UPLOAD_FOLDER'] = UPLOAD_FOLDER

# Ensure the upload folder exists
os.makedirs(UPLOAD_FOLDER, exist_ok=True)

# Default image
DEFAULT_IMAGE = 'default.jpg'
image_path = os.path.join(UPLOAD_FOLDER, DEFAULT_IMAGE).replace("\", "/")

@app.route("/", methods=["GET", "POST"])
def index():
if request.method == "POST":
if "file" not in request.files:
return redirect(request.url)

file = request.files["file"]
if file.filename == "":
return redirect(request.url)

if file:
filepath = os.path.join(app.config['UPLOAD_FOLDER'], "current.jpg").replace("\", "/")
file.save(filepath)

return render_template("index.html", image_url="static/uploads/current.jpg")

#if __name__ == "__main__":
# app.run(debug=True)

if __name__ == "__main__":
app.run(host="0.0.0.0", port=5000, debug=True)

templates/index.html

html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  <title>Image Uploadtitle>
  <style>
    /* Set the background to light blue */
    body {
      background-color: lightblue;
      margin: 0;
      font-family: Arial, sans-serif;
    }
    /* Header for the Kubernetes logo at the top left */
    header {
      position: fixed;
      top: 0;
      left: 0;
      padding: 10px;
      z-index: 1000; /* Ensure header stays on top */
    }
    header img {
      height: 70px; /* Adjust size as needed */
    }
    /* Container for main content with padding to avoid header overlap */
    .content {
      padding-top: 50px;
    }
    /* Style for the green upload button */
    .upload-button {
      background-color: green;
      border: none;
      color: rgb(213, 213, 213);
      padding: 10px 20px;
      text-align: center;
      text-decoration: none;
      display: inline-block;
      font-size: 16px;
      margin: 4px 2px;
      cursor: pointer;
      border-radius: 4px;
    }
    /* Style to center and enlarge the image */
    .centered-image {
      display: block;
      margin: 20px auto;
      max-width: 80%;
      height: auto;
    }
    /* Center-align headings and form */
    .center {
      text-align: center;
    }
  style>
head>
<body>
  <header>
    
    <img src="static/images/logo.png" alt="Logo">
  header>
  <div class="content">
    <h2 class="center">Upload an Imageh2>
    <form method="POST" enctype="multipart/form-data" class="center">
      <input type="file" name="file">
      <button type="submit" class="upload-button">Uploadbutton>
    form>
    <h3 class="center">Current Image:h3>
    <img src="{{ image_url }}" alt="Uploaded Image" class="centered-image">
  div>
body>
html>

Running the App on your local system

python app.py

Visit http://127.0.0.1:5000/ in your browser to access the app.

3. Containerizing with Docker

To install docker on your system, you can follow this official docker installation guide

Lets create the docker file

Dockerfile

# Use the official Python image as a base
FROM python:3.9

# Set the working directory
WORKDIR /app

# Copy all files to the container
COPY . .

# Install dependencies
RUN pip install flask

# Expose the port Flask runs on
EXPOSE 5000

# Run the application
CMD ["python", "app.py"]

Building & Running the Docker Container

Build the Docker Image:

docker build -t my_app .

Run the Container in Detached Mode:

docker run -d -p 5000:5000 -v $(pwd)/static/uploads:/app/static/uploads --name mywebapp my_app

Explanation of Flags:

-d → Runs the container in detached mode (background).
-p 5000:5000 → Maps port 5000 of the container to port 5000 on the node.
-v $(pwd)/static/uploads:/app/static/uploads → Mounts the upload directory so files persist.
--name mywebapp→ Assigns the container a custom name (mywebapp).
my_app → The name of your Docker image.

Access the Application:

Open your browser at http://:5000

4. Pushing the Image to a Registry

Now that we have tested the application on a docker container, lets push the docker image that we build to a registry

I am using docker hub. But you can use any other cloud based registry such as GitHub Container Registry or a self hosted one such as docker registry or harbor

If your docker hub repository is a private one, then you will need to authenticate.

Use docker login command on your docker host and follow the instructions on the screen

root@docker:~# docker login

USING WEB-BASED LOGIN

i Info → To sign in with credentials on the command line, use 'docker login -u '


Your one-time device confirmation code is: XXXX-YYYY
Press ENTER to open your browser or submit your device code here: https://login.docker.com/activate

Waiting for authentication in the browser…

Tag the Image

docker tag my_app /python_picture_webapp:v1

Push the Image

docker push /python_picture_webapp:v1

5. Deploying in Kubernetes

Kubernetes manifests used in this project

apps/python_picture_webapp/
├── deployment.yaml        # Deployment resource for the Flask app
├── service.yaml           # Service to expose the app
├── persistent-volume.yaml # NFS PersistentVolume (PV)
├── persistent-claim.yaml  # PersistentVolumeClaim (PVC)
├── secret.yaml            # Secret for pulling private images from dockerhub
└── namespace.yaml         # Namespace definition (if organizing workloads)

Create a Secret for accessing the private docker hub

Lets use kubectl to create the secret

kubectl create secret docker-registry mycred \
  --docker-server=https://index.docker.io/v1/ \
  --docker-username= \
  --docker-password= \
  --docker-email=

This command creates a Secret of type kubernetes.io/dockerconfigjson

Retrieve the .data.dockerconfigjson field from that new Secret and decode the data:

kubectl get secret mycred -o jsonpath="{.data.\.dockerconfigjson}" | base64 --decode

#Output :

{"auths":{"https://index.docker.io/v1/":{"username":"test-user","password":"your-pass","email":"test@acme.example","auth":"TlJFeG1pY25mMw=="}}}

Caution:

The auth value there is base64 encoded; it is obscured but not secret. Anyone who can read that Secret can learn the registry access bearer token.

Create static persistent volume & claim using a NFS backed storage

On the NFS share, you must have rw,sync permissions

root@ovm-nfs:~# exportfs -v
/export/nfs_python_pictures_app
                192.168.1.0/24(sync,wdelay,hide,no_subtree_check,rw,secure,no_root_squash,no_all_squash)

pv_python-pictures-app.yaml


apiVersion: v1
kind: PersistentVolume
metadata:
  name: nfs-python-pictures-app
  labels:
    type: nfs
    app: python-picture-webapp
spec:
  capacity:
    storage: 1Gi
  accessModes:
    - ReadWriteMany
  persistentVolumeReclaimPolicy: Retain
  nfs:
    server: 192.168.1.110         # Replace with your NFS server hostname/IP
    path: "/export/nfs_python_pictures_app"        # Replace with your exported directory

pvc_python-pictures-app.yaml

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: pvc-python-pictures-app
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 1Gi

Apply the persistent volume & claim manifests

root@controller01:~# kubectl apply -f pv_python_picture_webapp.yaml
persistentvolume/pv-nfs-python-pictures-app created
root@controller01:~# kubectl apply -f pvc_python_picture_webapp.yaml
persistentvolumeclaim/pvc-python-pictures-app created

root@controller01:~# kubectl get pv,pvc
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                             STORAGECLASS   VOLUMEATTRIBUTESCLASS   REASON   AGE
persistentvolume/nfs-python-pictures-app   1Gi        RWX            Retain           Bound    default/pvc-python-pictures-app                  <unset>                          9d

NAME                                            STATUS   VOLUME                    CAPACITY   ACCESS MODES   STORAGECLASS   VOLUMEATTRIBUTESCLASS   AGE
persistentvolumeclaim/pvc-python-pictures-app   Bound    nfs-python-pictures-app   1Gi        RWX                           <unset>                 9d

Lets now put together the deployment.yaml

deployment.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: python-picture-webapp-v1-1
spec:
  replicas: 4  # Number of pods
  selector:
    matchLabels:
      app: python-picture-webapp
  template:
    metadata:
      labels:
        app: python-picture-webapp
        color: blue  
    spec:
      imagePullSecrets:
      - name: mycred   # use the secret created in the begining
      initContainers:  # this init container is used to copy the logo.png to the nfs share
      - name: init-static-images
        image: sumitsur74/python_picture_webapp:v1.1   # your image from your registry
        command: ['sh', '-c', 'cp -r /app/static/images/* /mnt/static/images/']
        volumeMounts:
        - name: nfs-python-pictures-app
          mountPath: /mnt/static
      containers:
      - name: python-picture-webapp
        image: sumitsur74/python_picture_webapp:v1.1  # your image from your registry
        ports:
        - containerPort: 5000
        volumeMounts:
        - name: nfs-python-pictures-app
          mountPath: /app/static  # Mount the static folder
      volumes:
      - name: nfs-python-pictures-app
        persistentVolumeClaim:
          claimName: pvc-python-pictures-app

When deploying the app, you might encountered an issue where the logo.png kept under /static/images directory were not copied to the Persistent Volume (PV). This behavior occurs because, in Kubernetes, mounting a volume to a directory within a container overrides the existing contents of that directory. Consequently, any files baked into the Docker image at that path become inaccessible once the volume is mounted.

Use an Init Container to Populate the PV:

An Init Container can be employed to copy the necessary files from the Docker image to the Persistent Volume before the main application container starts. This Init Container copies the contents from /app/static/images/ (within the Docker image) to /mnt/static/images/, which is the mounted Persistent Volume.

The main application container then mounts the same Persistent Volume at /app/static, ensuring that the /app/static/images/ directory contains the necessary files from the PV

initContainers:  # this init container is used to copy the logo.png to the nfs share
      - name: init-static-images
        image: sumitsur74/python_picture_webapp:v1.1   # your image from your registry
        command: ['sh', '-c', 'cp -r /app/static/images/* /mnt/static/images/']
        volumeMounts:
        - name: nfs-python-pictures-app
          mountPath: /mnt/static

Lets prepare the service.yaml for the networking of the application

If a client makes a request on the node at http://:32000, it will:

🡲Hit port 32000 on the node.

🡲Be forwarded to the service on port 80.

🡲 The service will route the request to a pod on port 5000.

service.yaml

apiVersion: v1
kind: Service
metadata:
  name: python-picture-service
spec:
  selector:
    app: python-picture-webapp
    color: blue  
  ports:
    - protocol: TCP
      port: 80
      targetPort: 5000
      nodePort: 32000
  type: NodePort

Apply the deployment & service

root@controller01:~/git_codebase/k8s_homelab/apps/python_picture_webapp# kubectl apply -f deployment-v1.1-persistent_init_container.yaml
deployment.apps/python-picture-webapp-v1-1 created

root@controller01:~/git_codebase/k8s_homelab/apps/python_picture_webapp# kubectl apply -f service.yaml
service/python-picture-service created

Verify the pods & service

root@controller01:~# kubectl get pods
NAME                                         READY   STATUS    RESTARTS   AGE
python-picture-webapp-v1-1-8f94986fc-47bh9   1/1     Running   0          2m28s
python-picture-webapp-v1-1-8f94986fc-6chdm   1/1     Running   0          2m28s
python-picture-webapp-v1-1-8f94986fc-88w5q   1/1     Running   0          2m28s
python-picture-webapp-v1-1-8f94986fc-h9lch   1/1     Running   0          2m28s

root@controller01:~# kubectl get service -o wide
NAME                     TYPE        CLUSTER-IP    EXTERNAL-IP   PORT(S)        AGE   SELECTOR
kubernetes               ClusterIP   10.96.0.1             443/TCP        76d   
python-picture-service   NodePort    10.96.5.206           80:32000/TCP   17d   app=python-picture-webapp,color=blue

Access the application at http://:32000

Hosting Your Own DNS with Unbound & Docker on Raspberry Pi

Sumit Sur — Fri, 31 May 2024 18:30:00 GMT

I've been steadily building a self-sufficient environment for my infrastructure experiments. One milestone was setting up my own DNS server—local and fully controlled.

I had a Raspberry Pi lying around, so I decided to use it to host my DNS as a weekend project.

This post will walk you through how I achieved that using Unbound, Docker, configuration management on my Raspberry Pi.

`🗃️` The project code

Link to the Github Repo

📦 What We Are Building

A self hosted lightweight Unbound DNS resolver
Deployed via Docker Compose
Config stored in Git and auto-applied on change
Auto-start on Raspberry Pi boot
Designed for private use with domain: home.lab
Resilient to reboots and container crashes

🧰 Prerequisites

Installed Ubuntu 24.04 LTS server on Raspberry Pi using Raspberry Pi Imager
installed Docker & Docker Compose on the Pi- Installation guide
Git installed and a private repo created to track Unbound configs

inotifywait installed for the watchdog

sudo apt update
sudo apt install inotify-tools

🗂️ Project Structure

~/unbound-gitops/
├── docker-compose.yml
├── Dockerfile
├── unbound/
│   ├── unbound.conf
│   ├── root.hints
│   └── a-records.conf
├── watch-and-restart.sh
└── git-pull.sh (optional)

🐳 Docker Compose File

services:
  unbound:
    build: .
    container_name: unbound
    restart: unless-stopped  #Persistent on reboot. So it automatically starts after reboot
    ports:
      - "53:53/udp"
      - "53:53/tcp"
    volumes:
      - ./unbound/unbound.conf:/etc/unbound/unbound.conf:ro
      - ./unbound/a-records.conf:/etc/unbound/a-records.conf:ro
      - ./unbound/root.hints:/etc/unbound/root.hints:ro

🛠️ Dockerfile for Custom Unbound Image

FROM alpine:latest

RUN apk add --no-cache unbound libcap wget

COPY unbound/unbound.conf /etc/unbound/unbound.conf
COPY unbound/a-records.conf /etc/unbound/a-records.conf
COPY unbound/root.hints /etc/unbound/root.hints

RUN unbound-checkconf

EXPOSE 53/udp 53/tcp

CMD ["unbound", "-d", "-c", "/etc/unbound/unbound.conf"]

⚙️ Sample `unbound.conf`

server:
    logfile: "/var/log/unbound/unbound.log" # Log file path
    verbosity: 1 # Set verbosity level (0-4)

    # Disable logging for performance, enable if you need to debug
    log-queries: no
    log-replies: no
    log-tag-queryreply: no  


    interface: 0.0.0.0 # Listen on all interfaces
    port: 53

    # Enable IPv4, UDP, and TCP
    do-ip4: yes
    do-udp: yes
    do-tcp: yes

    # Disable IPv6 if not needed on your network
    do-ip6: no
    prefer-ip6: no

    # Access control list. By default, refuse all.
    # Then, allow specific networks.
    access-control: 127.0.0.1/32 allow
    access-control: 192.168.1.0/24 allow
    access-control: 172.17.0.0/16 allow
    access-control: 0.0.0.0/0 deny # Deny all other IPs
    #access-control: 0.0.0.0/0 allow # Allow all IPs to query
    # Uncomment above line to allow all IPs

    #root hints file & records file
    root-hints: "/etc/unbound/root.hints"
    include: "/etc/unbound/a-records.conf"

    # Harden DNS security settings
    hide-identity: yes 
    hide-version: yes
    harden-glue: yes
    harden-dnssec-stripped: yes 

    use-caps-for-id: yes
    prefetch: yes
    rrset-roundrobin: yes
    cache-max-ttl: 86400
    cache-min-ttl: 3600

remote-control:
# Enable remote control interface with unbound-control
    control-enable: no

📝 Add A Records (Example `a-records.conf`)

# Unbound DNS Configuration for Home Lab
# This file contains local DNS records for the home lab environment.
# It is included in the main unbound configuration file.

# Local Zone
  local-zone: "home.lab." static

# A Records
  local-data: "pve01.home.lab. IN A 192.168.1.190"
  local-data: "pi.home.lab. IN A 192.168.1.10"
  local-data: "nfs.home.lab. IN A 192.168.1.110"
  local-data: "k8scontrol01.home.lab. IN A 192.168.1.100"
  local-data: "k8snode01.home.lab. IN A 192.168.1.101"
  local-data: "k8snode02.home.lab. IN A 192.168.1.102"

# PTR Record
  local-data-ptr: "192.168.1.190 pve01.home.lab"
  local-data-ptr: "192.168.1.10 pi.home.lab"
  local-data-ptr: "192.168.1.110 nfs.home.lab"
  local-data-ptr: "192.168.1.100 k8scontrol01.home.lab"
  local-data-ptr: "192.168.1.101 k8snode01.home.lab"
  local-data-ptr: "192.168.1.102 k8snode02.home.lab" 

# CNAME Record
  local-data: "storage.home.lab. IN CNAME nfs.home.lab."

🧠 Configure Raspberry Pi’s DNS Resolver

Currently, systemd-resolved is the DNS resolver for the host system i.e the RPi.

We need to stop systemd-resolved to free up port 53, so the Unbound container can use it.

sudo systemctl disable --now systemd-resolved
sudo rm -f /etc/resolv.conf
sudo tee /etc/resolv.conf > /dev/null <


🧪 Testing
Build the Docker image and start Docker Compose
git pull
docker compose build
docker compose up -d

Check the container is up
docker ps


Test DNS resolution
$ dig pve01.home.lab

; <<>> DiG 9.20.4-3ubuntu1.1-Ubuntu <<>> pve01.home.lab
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 16775
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;pve01.home.lab.            IN    A

;; ANSWER SECTION:
pve01.home.lab.        975    IN    A    192.168.1.190

;; Query time: 0 msec
;; SERVER: 127.0.0.53#53(127.0.0.53) (UDP)
;; WHEN: Sat Jun 14 22:25:18 IST 2025
;; MSG SIZE  rcvd: 59

You should see status: NOERROR and an ANSWER SECTION with your configured IP
Every time new DNS record is committed to Git, log in to the RPi and follow these steps:
git pull
docker compose restart 


🔄 Auto-Restart Container on Config Change
This steps gives the ability that any update you commit and pull will trigger a container restart.
Pull Git Updates
git-pull.sh
#!/bin/bash
cd ~/unbound-dns
git pull origin main

Setup a cron job to automate:
*/5 * * * * /home/pi/unbound-dns/git-pull.sh

Setup the watchdog
watch-and-restart.sh
#!/bin/bash
CONFIG_DIR="./unbound"

inotifywait -m -r -e modify,create,delete --format '%w%f' "$CONFIG_DIR" | while read file; do
    echo "[INFO] Change detected in $file"
    docker compose restart unbound
done

Make it executable:
chmod 700 watch-and-restart.sh

Start the watchdog

nohup ./watch-and-restart.sh > ~/watcher.log 2>&1 &


To make watch-and-restart.sh script run in the background on reboot, a reliable approach is to set it up as a systemd service.
But I want to completely scrap the watchdog and implement a pipeline-based solution that is more robust and independent. This approach allows for automated workflows that can be triggered by events such as code commits or pull requests.
I am working on a solution using git-action & container based ARM self-hosted runner, which i will share here very soon.



Step-by-Step Guide to Enabling AWS CLI Autocomplete installed via Snap
Sumit Sur — Sun, 31 Mar 2024 18:30:00 GMT
I installed the aws-cli via snap using
sudo snap install aws-cli --classic

If you’ve also installed AWS CLI using Snap on Ubuntu or other Linux distributions, you may have stumbled upon the same issue that the aws_completer is not placed in the usual /usr/local/bin/aws_completer
🔗 Locate the AWS completer
Lets try to locate the path of aws_completer
which aws_completer return is empty. As no path is shown, locating it using find:
$ find / -name aws_completer 2>/dev/null

/snap/aws-cli/1441/aws/dist/aws_completer
/snap/aws-cli/1441/bin/aws_completer
/snap/aws-cli/1443/aws/dist/aws_completer
/snap/aws-cli/1443/bin/aws_completer

Snap packages use internal versioned folders (e.g. 1441, 1443) that change on update, making hard-coded paths impractical. This makes configuring autocomplete tricky.
However, Snap maintains a /snap/aws-cli/current symlink to the active version. Link this to a location in your $PATH instead.
🛠️ 1. Create a stable symlink in your PATH
Use the snap's current alias to avoid version-specific paths.
This ensures autocomplete continues working even after AWS CLI updates.
sudo ln -sf /snap/aws-cli/current/bin/aws_completer /usr/local/bin/aws_completer

✍️ Step 2: Add completion setup to your shell
For Bash (~/.bashrc):
complete -C '/usr/local/bin/aws_completer' aws

For Zsh (~/.zshrc):
autoload bashcompinit && bashcompinit
autoload -Uz compinit && compinit
complete -C '/usr/local/bin/aws_completer' aws

🔄 Step 3: Reload your shell
Apply changes:
source ~/.bashrc  # or source ~/.zshrc

Press Tab after typing a partial aws command:
aws s3 


🤖 Step 4 (Optional): Activate AWS CLI v2 Auto‑Prompt
Beyond tab completion, AWS CLI v2 offers an interactive auto‑prompt that guides your input after pressing Enter
nable it permanently (default profile):
aws configure set cli_auto_prompt on-partial

Or for just your session:
bashCopyEdiexport AWS_CLI_AUTO_PROMPT=on-partial

This mode is especially helpful when you're exploring less familiar commands

AUTOMATESTACK

Reclaiming Flow: A Guide to Sustainable Productivity and Digital Sanity

✨ The Hardware: Simplicity as a Feature

🧠 The "Deep Work" Protocol

🚧 The Missing Link: Enforcing Boundaries

🧘‍♂️ Mental Well-being: The Art of Disconnecting

🎉 The Result

Understanding Pod Priority and Preemption in Kubernetes: A Detailed Guide

Introduction

1. What Is Pod Priority?

2. Defining Priority: PriorityClass

3. Scheduling: How Priority Influences Order

4. Preemption: Making Space for What Matters

Scheduling metadata:

Important Constraints:

5. Non-Preempting Priority Classes

6. Interplay with QoS and Eviction

7. Why Use Pod Priority and Preemption?

8. Sample YAML Snippet

9. Best Practices & Troubleshooting

How to Self-Host n8n for Free Forever on Oracle Cloud

💡Why Oracle Cloud's Always Free Tier?

✅ Prerequisites

🏗️ Architecture

1. Traffic Flow

2. Security & Certificates

3. Service Discovery

4. Application Layer

5. Infrastructure

Step 1: 📝Create Your Oracle Cloud Account

Step 2: 🖥️✨ Set Up Your Compute Instance

Step 3: 🔑 Connect to Your Instance

Step 4: ⚙️ Prepare the Server

Step 5: 🐳 Deploy n8n with Docker Compose

Step 5: 🔍Verify

⚠️ Issues Faced & Fixes 🛠️

Traefik only shows the default self‑signed certificate

Traefik fails to obtain ACME certificate when N8N_EDITOR_BASE_URL is set

How to Deploy Your First Proxmox Virtual Machine Using Terraform

🧠 Why Terraform for Proxmox?

👋 Quick heads-up!

📦 Prerequisites

🗂 Project Directory Structure

🧩 provider.tf – Connect Terraform to Proxmox

📜 variables.tf – Input Configuration

⚙️ terraform.tfvars – Your Custom Values

🏗 main.tf – Create the Virtual Machine

▶️ How to Run It

✅ Verify in Proxmox

🛠 Troubleshooting Tips

🧪 What’s Next?

How to Set Up Proxmox with Terraform: A Step-by-Step Guide

👋 Quick heads-up!

Create a “Terraform-Friendly” Role

Generate an API Token

Disable Privilege separation

Connecting from Terraform to Proxmox

Creating a Cloud Init Template

Create the VM for Template

Convert to template

Template created

Creating a Snippet

Wrapping Up

👉 🚀 Read Next: Deploy Your First Proxmox VM with Terraform

The Ultimate Guide to Right-Sizing CPU & Memory for Virtual Machines

Why Right-Sizing Matters

CPU Right-Sizing

Best Practices

Understanding NUMA & vNUMA

NUMA Tips

Memory Right-Sizing

Best Practices

Final Thoughts

Step-by-Step Guide: Upgrading vSphere 7 to 8

📊 Assessment & Planning

🔍 Compatibility Checks

🔼 Upgrade Sequence

⚙️ Mandatory Pre-checks

🚦 Running the pre-checks

✅ Running vsphere8_upgrade_certificate_checks.py

2. Defining Priority: `PriorityClass`

Traefik fails to obtain ACME certificate when `N8N_EDITOR_BASE_URL` is set

✅ Running `vsphere8_upgrade_certificate_checks.py`

Add to `/etc/fstab`:

CPU Ready (`%RDY`)

CPU Wait Time (`%WAIT` & `%VMWAIT`)

CPU Co-Stop (`%CSTP`)

Download & Run Directly using `curl:`