Prémices Kamasuwa

Building a Cron Job Scheduler Using Redis and Node.js

Prémices N. Kamasuwa — Sat, 31 Aug 2024 12:54:51 GMT

Authors: Daniel Charles Mwangila and Prémices Kamasuwa

Introduction

In this article, we'll explore how to build a robust and flexible scheduling system by combining the strengths of Redis and Node.js. This approach not only replicates the functionality of traditional cron jobs but also enhances it by leveraging Redis to manage task scheduling through expiration events. Redis, known for its high-speed in-memory data storage, becomes an optimal solution for real-time applications that demand precise and efficient task execution.

Throughout this guide, we'll walk through the development of an event-driven scheduling system that utilizes Redis' key expiration feature to trigger tasks at predefined intervals. By the end, you will have a deeper understanding of how Redis can be harnessed for more complex real-time scenarios, enabling you to build systems that go beyond basic task scheduling.

Prerequisites

To follow along with this tutorial, ensure you have the following:

Node.js is installed on your local machine. If not, follow this guide to install Node.js.
Redis is installed locally or accessible via a remote server. You can find installation instructions here.
Familiarity with TypeScript and Node.js. If you need to brush up, consider reviewing the TypeScript documentation and Node.js documentation.
A code editor, such as Visual Studio Code.
Basic knowledge of Redis operations, particularly how to set and get keys and configure keyspace notifications.

Step 1 — Setting Up Your Node.js Application

Let's start by setting up the Node.js environment for our project.

1.1. Create a New Project Directory

First, create a directory for your project and navigate into it:

mkdir redis-scheduler
cd redis-scheduler

1.2. Initialize a New Node.js Project

Initialize a new Node.js project using the following command:

npm init -y

This command creates a package.json file with default settings.

1.3. Install Necessary Node.js Packages

Install the required packages for this tutorial:

npm install redis dotenv date-fns

These packages include:

redis: For interacting with Redis.
dotenv: To manage environment variables.
date-fns: For date manipulation.

Next, install the development dependencies:

npm install @types/node ts-node typescript

These packages will help you work with TypeScript and Node.js more effectively.

1.4. Initialize a TypeScript Project

Initialize a TypeScript configuration file:

tsc --init

This command creates a tsconfig.json file in your project root, enabling TypeScript support.

1.5. Project Structure

To organize your code effectively, set up the following directory structure:

redis-scheduler/
│
├── src/
│   ├── interfaces/
│   │   └── Task.ts
│   ├── utils/
│   │   └── redisClient.ts
│   ├── handlers/
│   │   ├── classHandler.ts
│   │   └── functionHandler.ts
│   ├── scheduler.ts
│   └── listener.ts
│
├── .env
├── package.json
├── tsconfig.json
└── README.md

This structure helps keep your project organized, making it easier to maintain and extend in the future.

Step 2 — Implementing the Task Interface

Create an interface to define the structure of a task. This interface ensures that every task follows a consistent format, which is crucial for the scheduler to function correctly.

Create a new file src/interfaces/Task.ts:

export interface ExecutionPath {
    file_path: string;   // The path to the file containing the task's function
    class_name: string;  // The class name where the function resides
    function_name: string; // The specific function to execute
}

export interface Task {
    task_id: string;     // Unique identifier for the task
    title: string;       // Human-readable name for the task
    interval: string;    // The interval at which the task should run (e.g., '5m', '1h')
    last_run?: Date;     // The last time the task was executed
    next_run: Date;      // The next scheduled run time
    execution_path: ExecutionPath; // Details on where and what to execute
}

Explanation of Task Interface Fields:

task_id: A unique identifier for the task, ensuring that each task can be tracked separately.
title: A descriptive name for the task. Useful for logging and debugging.
interval: Specifies how frequently the task should run. This could be in seconds (s), minutes (m), hours (h), days (d), or weeks (w).
last_run: An optional field that records the last execution time, useful for monitoring.
next_run: The next time the task is scheduled to run.
execution_path: Specifies where to find the function to execute, including file path, class, and function names.

Step 3 — Implementing the Redis Client

To interact with Redis, we need to set up a client. This client will be used to set and get tasks and listen for expiration events.

Create a new file src/utils/redisClient.ts:

import { createClient } from "redis";
import dotenv from "dotenv";
dotenv.config();

// Retrieve Redis connection URI from the environment variables
console.log(process.env.REDIS_URI);
export const redisClient = async () => {
    const client = createClient({ url: process.env.REDIS_URI })
        .on('error', (err) => console.error(`Redis Client Error: ${err}`))
        .on('connect', () => console.info('Connected to Redis'))
        .on('ready', () => console.info('Redis is ready'));

    await client.connect();
    return client;
};

// Function to retrieve data from Redis
export const getFromRedis = async (key: string): Promise<{ [key: string]: any } | null> => {
    const client = await redisClient();
    try {
        const value = await client.get(key);
        if (value) return JSON.parse(value);
        return null;
    } catch (error) {
        console.error(`Error getting key: ${key} from Redis: ${error}`);
        return null;
    } finally {
        await client.disconnect();
    }
}

// Function to store data in Redis with an optional expiration time
export const setToRedis = async (key: string, value: string, expireIn?: number): Promise => {
    const client = await redisClient();
    try {
        await client.set(key, value);
        if (expireIn) {
            await client.expire(key, expireIn);
        }
    } catch (error) {
        console.error(`Error setting key: ${key} to Redis: ${error}`);
    } finally {
        await client.disconnect();
    }
}

Explanation:

redisClient: Initializes a connection to Redis and manages connection events.
getFromRedis: Fetches a value from Redis by key and parses it from JSON.
setToRedis: Sets a value in Redis with an optional expiration time, which is crucial for scheduling tasks to run at specific intervals.

Step 4 — Implementing the Scheduler Class

The scheduler class will handle scheduling logic, including calculating the next run time and executing tasks.

Create a new file src/scheduler.ts:

import { addSeconds, addMinutes, addHours, addDays, addWeeks } from 'date-fns';
import path from 'path';
import { setToRedis, getFromRedis } from './utils/redisClient';
import { Task } from './interfaces/Task';

export class Scheduler {

    // Calculate the next run time based on the interval string
    private calculateNextRun(interval: string): { next_run: Date; interval: number } {
        const now = new Date();
        const match = interval.match(/^(\\d+)(s|m|h|d|w)$/);

        if (!match) {
            throw new Error('Invalid interval format');
        }

        const value = parseInt(match[1], 10);
        const unit = match[2];
        let next_run;
        const seconds_per_unit = { s: 1, m: 60, h: 3600, d: 86400, w: 604800 };

        switch (unit) {
            case 's': next_run = addSeconds(now, value); break;
            case 'm': next_run = addMinutes(now, value); break;
            case 'h': next_run = addHours(now, value); break;
            case 'd': next_run = addDays(now, value); break;
            case 'w': next_run = addWeeks(now, value); break;
            default: throw new Error('Unsupported time unit');
        }

        return {
            next_run: next_run,
            interval: value * seconds_per_unit[unit],
        };
    }

    // Update the next run time in Redis
    private async updateNextRun(task: Task) {
        const { next_run, interval } = this.calculateNextRun(task.interval);
        task.next_run = next_run;
        await setToRedis(`SCHEDS:${task.task_id}`, JSON.stringify(task), interval);
    }

    // Execute a function dynamically based on its path
    private async executeFunctionFromPath(task: Task) {
        const { file_path, class_name, function_name } = task.execution_path;
        try {
            const modulePath = path.resolve('./src/handlers', `${file_path}`);
            console.info(`Importing module from path: ${modulePath}`);
            const module = await import(modulePath);

            // Get the class and create an instance
            const instance = new module[class_name]();
            if (typeof instance[function_name] !== 'function') {
                throw new Error(`Function ${function_name} not found in class ${class_name}`);
            }

            // Execute the function
            console.info('Executing scheduled task function...');
            await instance[function_name]();
            console.info('Function executed successfully');
        } catch (error) {
            console.error('Error executing function from path:', error);
        }
    }

    // Main function to run the scheduler
    async runScheduler(scheds_key: string) {
        const task_id = scheds_key.split(':')[1];
        const task = await getFromRedis(`SCHEDS:${task_id}`) as Task;

        if (!task) {
            console.error(`Task with id: ${task_id} not found`);
        } else {
            await this.executeFunctionFromPath(task);
            await this.updateNextRun(task);
        }
    }
}

export const scheduler = new Scheduler();

Explanation:

calculateNextRun: Calculates the next run time based on the given interval (e.g., "5m" for 5 minutes). It also calculates the interval in seconds for Redis’ expire function.
updateNextRun: Updates the task's next run time in Redis to ensure it triggers again at the correct interval.
executeFunctionFromPath: Dynamically loads and executes a function from a specified file and class path, providing flexibility in what tasks can be scheduled.
runScheduler: The main function that orchestrates the retrieval and execution of scheduled tasks, then updates their next run time.

Step 5 — Implementing a Sample Task Handler

For demonstration purposes, let's create a simple handler that outputs "Hello from the handler" to the console after a 5-second delay.

Create a new file src/handlers/classHandler.ts:

Explanation:

HelloHandler: A class with an execute method that simulates a delayed task by waiting 5 seconds before logging a message to the console.

Step 6 — Implementing the Redis Listener

The Redis listener monitors key expiration events and triggers the scheduler when a key expires.

Create a new file src/listener.ts:

import { redisClient } from './utils/redisClient';
import { scheduler } from './scheduler';

(async () => {
    const client = await redisClient();

    // Enable keyspace notifications for expiration events
    client.configSet('notify-keyspace-events', 'Ex');

    const sub = client.duplicate();
    await sub.connect();
    const expired_subKey = '__keyevent@0__:expired';

    // Listen for expired events on all keys
    sub.pSubscribe(expired_subKey, async (key) => {
        console.info(`[i] Key expired: ${key}`);
        await scheduler.runScheduler(key);
    });

    console.info('Redis listener set up and waiting for events...');
})();

Explanation:

Redis Keyspace Notifications: Configures Redis to notify when keys expire. This is essential for our scheduling logic as Redis uses the EXPIRE command to manage key expiration.
Listener Setup: Subscribes to expiration events and triggers the runScheduler function, which handles executing the scheduled task.

Step 7 — Running and Testing the Scheduler

7.1. Configure Environment Variables

Ensure you have your Redis connection properly configured. Create a .env file in the project root if you haven't done so already, and add your Redis URI:

REDIS_URI=redis://localhost:6379

Make sure to replace localhost:6379 with the appropriate host and port if your Redis instance is hosted elsewhere.

7.2. Schedule a Task Using TypeScript Code

Instead of using the Redis CLI, we will use a TypeScript script to add a task to Redis. This approach provides more flexibility and demonstrates how to automate task scheduling programmatically.

Step-by-Step Guide to Scheduling a Task via TypeScript:

Create a TypeScript Script to Add a Task:

Create a new file named src/scheduleTask.ts to automate the process of scheduling tasks in Redis:

import { v4 as uuidv4 } from 'uuid';
import { setToRedis } from './utils/redisClient';
import { Task } from './interfaces/Task';

// Function to schedule a new task
const scheduleTask = async () => {
    // Generate a unique UUID for the task
    const taskId = uuidv4();

    // Create a new task object
    const newTask: Task = {
        task_id: taskId,
        title: 'Hello World Task',
        interval: '5s',  // Adjust as needed for your testing
        next_run: new Date(),
        execution_path: {
            file_path: 'classHandler',  // This corresponds to src/handlers/classHandler.ts
            class_name: 'HelloHandler',
            function_name: 'execute',
        }
    };

    // Convert the task object to a JSON string
    const taskJSON = JSON.stringify(newTask);

    try {
        // Store the task in Redis without expiration
        await setToRedis(taskId, taskJSON);

        // Set a shadow key with expiration
        const intervalInSeconds = 5; // Example interval in seconds; modify as needed
        await setToRedis(`SCHEDS:${taskId}`, taskId, intervalInSeconds);

        console.info(`Task scheduled successfully with ID: ${taskId}`);
    } catch (error) {
        console.error('Error scheduling task:', error);
    }
};

// Execute the function to schedule the task
scheduleTask();

Explanation of the Script:

UUID Generation: We use the uuid package to generate a unique identifier (task_id) for each task, ensuring no two tasks have the same ID.
Task Object Creation: A Task object is created with all the necessary details, including the interval and execution path.
Storing the Task in Redis:
The task is stored under a key TASK: without an expiration time.
A shadow key, SCHEDS:, is created with an expiration time corresponding to the task's interval. This shadow key will expire and trigger the listener.
Expiration Handling: The intervalInSeconds specifies when the shadow key expires, causing Redis to emit an event that our listener can capture.

Install the uuid Package:

If you haven't already installed the uuid package, you can add it to your project by running:

npm install uuid

Also, install the TypeScript types for uuid:

npm install @types/uuid --save-dev

Run the TypeScript Script:

Execute the script to add the task to Redis:

ts-node src/scheduleTask.ts

7.3. Start the Redis Listener

With the task now scheduled, start the Redis listener in a separate terminal window to monitor for expiration events:

ts-node src/listener.ts

7.4. Observe the Output

After the interval specified in the shadow key (SCHEDS:) expires, you should see the task execution output:

[i] Key expired: SCHEDS:
Importing module from path: 
Hello from the handler

This output confirms that the Redis listener detected the key expiration and successfully executed the task associated with that key.

Conclusion

Using a TypeScript script to add tasks to Redis programmatically gives you greater flexibility and control over task scheduling. This method demonstrates how to integrate Redis-based scheduling into your applications, leveraging UUIDs for unique task identification and shadow keys for triggering task execution. You can now expand on this foundation to build more complex scheduling systems or integrate it into a broader application framework.

Ensuring Data Integrity in Real-time Synchronization: A Phoenix LiveView Tale

Prémices N. Kamasuwa — Fri, 16 Feb 2024 19:27:43 GMT

In the dynamic landscape of web applications, particularly those dealing with real-time data synchronization between systems, the challenge of handling duplicate requests is not merely theoretical.

This blog post delves into a complex issue I encountered while working on a Phoenix application designed for bidirectional data synchronization between two online enterprise platforms. My journey through identifying and solving the problem of duplicate webhook calls illustrates the importance of idempotent operations in maintaining data integrity and system reliability.

The Challenge of Synchronization

The Phoenix-based OTP application was designed to facilitate seamless real-time data synchronization between two systems. Leveraging Elixir's GenServers for asynchronous data processing, I encountered an unexpected hurdle: duplicate webhook calls, threatening the central database's integrity by risking duplicate records.

Encountering Duplication

The discovery of webhook events firing multiple times for the same data underscored a significant threat to data consistency. My initial architecture, though efficient, lacked a robust mechanism to prevent the processing of duplicate requests.

Crafting the Solution

The solution required a blend of creativity and technical expertise. My goal was to implement a deduplication mechanism that could reliably identify duplicate requests, ensuring that each unique piece of data was processed exactly once. So I created a module that I will be calling Deduplicator moving forward, just for the sake of reference.

The Genesis of the Deduplicator module

Deduplicator emerged from the necessity to intercept and evaluate incoming webhook calls before proceeding with any data manipulation. The module's design was centered around generating unique identifiers for each request based on the request's payload. By serializing the entity part of the request and hashing it, I could create a distinctive fingerprint for each operation.

Tutorial: Implementing the Deduplicator module

Here's how I brought Deduplicator to life, step by step:

Unique Identifier Generation:

For each incoming request, serialize the entity payload into a JSON string and generate a SHA-256 hash. This hash serves as a unique identifier, encapsulating the essence of the request.
Below is a codesnippet for the unique identifier generator function

defp generate_unique_id(entity) do
    encoded_entity =
      entity
      |> Jason.encode!()

    :crypto.hash(:sha256, encoded_entity)
    |> Base.encode16()
end

2. Implementation of the Deduplicator Module:

This module, leveraging Elixir's GenServer and ETS (Erlang Term Storage), is designed to ensure idempotent operations, preventing duplicate data processing. Here's a deeper dive into its implementation and integration within our Phoenix application.

The GenServer Foundation

The Deduplicator module begins its life as a GenServer, a cornerstone of Elixir applications for maintaining state and executing background work asynchronously. Using GenServer allows Deduplicator to run continuously in the background, monitoring for duplicate requests.

defmodule Deduplicator do
  use GenServer

 def start_link(_opts) do
    GenServer.start_link(__MODULE__, %{}, name: __MODULE__)
  end
end

The init/1 function

It serves as the foundational setup for the Deduplicator module. Upon the GenServer's initialization, this function is called to perform essential setup tasks crucial for the module's operation.

def init(_) do
  :ets.new(:dedup_table, [:set, :public, :named_table])
  {:ok, %{}}
end

The line :ets.new(:dedup_table, [:set, :public, :named_table]) is instrumental in establishing an Erlang Term Storage (ETS) table named :dedup_table. This table is configured with a few options:

:set: This option ensures that the table behaves as a set, meaning each entry is unique based on its key. This is crucial for our deduplication logic, as it allows us to store each request's unique identifier without duplicates.
:public: This option makes the table accessible to all processes, enabling different parts of the application to query or update the deduplication status of requests.
:named_table: This allows the table to be referenced by its name, :dedup_table, facilitating easier access throughout the application.

Marking Requests as Processed

When a request is processed, its unique identifier is stored in the ETS table along with the current system time. This marks the request as processed, preventing future duplications.

def mark_as_processed(unique_id) do
  :ets.insert(:dedup_table, {unique_id, :erlang.system_time()})
end

The mark_as_processed/1 function is a pivotal part of the MyDeduplicator module, encapsulating the mechanism that records the processing of requests to prevent duplicate handling. This function demonstrates an effective use of Elixir's Erlang Term Storage (ETS) to maintain the idempotency of operations within our application.

At the heart of this function lies the :ets.insert/2 call, which adds a new record into the :dedup_table ETS table. Each record is a tuple consisting of two elements: the unique_id of the request and the current system time captured by :erlang.system_time().

Unique Identifier: The unique_id serves as the key for the record. It is a hash derived from the request's payload, ensuring that each request can be uniquely identified based on its content. This uniqueness is crucial for detecting and preventing duplicate processing of the same request.
Timestamp: The inclusion of the current system time as the second element of the tuple serves a dual purpose. First, it timestamps when the request was processed, providing traceability. Second, it facilitates the cleanup process, allowing the system to determine which records are old and should be removed based on their age.

Checking for Duplicates

Before processing any request, Deduplicator checks the ETS table to see if the request's unique identifier already exists, indicating it has been processed.

def already_processed?(unique_id) do
  case :ets.lookup(:dedup_table, unique_id) do
    [{^unique_id, _timestamp}] -> true
    _ -> false
  end
end

The already_processed?/1 function is a critical component of the MyDeduplicator module, serving as the gatekeeper in the deduplication strategy. This function scrutinizes requests to determine if they have been processed before, thus preventing redundant operations on the same data.

Here's a closer look at its implementation and significance:

ETS Lookup: The function begins with an :ets.lookup/2 call, querying the :dedup_table ETS table for a record matching the provided unique_id. This unique_id is a hash derived from the request's payload, ensuring each request can be uniquely identified.
Match Found: If the lookup returns a tuple matching the unique_id, the function interprets this as the request having been processed before. The presence of this record in the table indicates that the specific data payload associated with this unique_id has already been handled, signaling the function to return true.
No Match Found: Conversely, if no matching record is found in the ETS table, the function concludes that the request has not been processed previously and returns false. This outcome indicates that it is safe to proceed with processing the request, as there is no risk of duplicating effort or data.

Cleaning up ETS to save up on memory usage

In any application that relies on in-memory storage for rapid data access and manipulation, managing memory usage efficiently is paramount. This is particularly true for our MyDeduplicator module, which utilizes Erlang Term Storage (ETS) to keep track of processed requests and prevent duplicates. However, without proper management, the memory consumed by the ETS table could grow indefinitely, potentially degrading system performance over time. To address this concern, we've implemented a cleanup mechanism designed to periodically remove old entries from the ETS table, thereby conserving memory and maintaining optimal performance.

Implementing Periodic Cleanup

The cleanup process is orchestrated through two primary functions: schedule_cleanup/0 and handle_info/2. Here's how they work together to ensure the ETS table remains efficient and does not grow unbounded:

defp schedule_cleanup do
  Process.send_after(self(), :cleanup, @cleanup_interval)
end

Scheduling Cleanup: The schedule_cleanup/0 function leverages Process.send_after/3 to schedule a message (:cleanup) to be sent to the GenServer itself after a predefined interval (@cleanup_interval). This periodic messaging acts as a trigger for the cleanup operation, ensuring that the process is automatically repeated at regular intervals.

Handling the Cleanup Process

When the GenServer receives the :cleanup message, it triggers the handle_info/2 function, which is responsible for the actual cleanup logic:

def handle_info(:cleanup, state) do
  current_time = :erlang.system_time()

  :ets.tab2list(:dedup_table)
  |> Enum.each(fn
    {id, timestamp} when current_time - timestamp > @ttl ->
      :ets.delete(:dedup_table, id)

    _ ->
      :noop
  end)

  schedule_cleanup()
  {:noreply, state}
end

Executing Cleanup: Upon receiving the :cleanup message, this function retrieves all entries from the :dedup_table ETS table and iterates over them. Each entry is assessed to determine if its timestamp (indicating when it was added to the table) is older than the allowed time-to-live (@ttl). If an entry is found to be older, it is removed from the table, freeing up the memory it consumed.
Recurrence of Cleanup: After performing the cleanup, the function calls schedule_cleanup/0 again to ensure that the cleanup operation continues to run at regular intervals, thus maintaining the ongoing efficiency of the ETS table.

Integrating Cleanup with Initialization

To kickstart the cleanup process when the Deduplicator GenServer is initialized, we include the schedule_cleanup/0 call within the init/1 function:

def init(_) do
    :ets.new(:dedup_table, [:set, :public, :named_table])
    schedule_cleanup()
    {:ok, %{}}
end

Ensuring Immediate Effectiveness: By invoking schedule_cleanup/0 during initialization, we ensure that the cleanup mechanism is active right from the start, preventing the ETS table from ever becoming a memory concern.

Now, the final version of the Deduplicator module:

defmodule Duplicator do
  use GenServer

  @cleanup_interval :timer.minutes(1)
  @ttl :timer.hours(1)

  def start_link(_opts) do
    GenServer.start_link(__MODULE__, %{}, name: __MODULE__)
  end

  def init(_) do
    :ets.new(:dedup_table, [:set, :public, :named_table])
    schedule_cleanup()
    {:ok, %{}}
  end

  def mark_as_processed(unique_id) do
    :ets.insert(:dedup_table, {unique_id, :erlang.system_time()})
  end

  def already_processed?(unique_id) do
    case :ets.lookup(:dedup_table, unique_id) do
      [{^unique_id, _timestamp}] -> true
      _ -> false
    end
  end

  defp schedule_cleanup do
    Process.send_after(self(), :cleanup, @cleanup_interval)
  end

  def handle_info(:cleanup, state) do
    current_time = :erlang.system_time()

    :ets.tab2list(:dedup_table)
    |> Enum.each(fn
      {id, timestamp} when current_time - timestamp > @ttl ->
        :ets.delete(:dedup_table, id)

      _ ->
        :noop
    end)

    schedule_cleanup()
    {:noreply, state}
  end
  
  def generate_unique_id(entity) do
    encoded_entity =
      entity
      |> Jason.encode!()

    :crypto.hash(:sha256, encoded_entity)
    |> Base.encode16()
end
end

Utilizing the Solution in a Phoenix LiveView application

Integrating MyDeduplicator with a Supervisor

To ensure Deduplicator's resilience and reliability, it's integrated into my application's supervision tree. This guarantees that Deduplicator is automatically restarted in case of failures, maintaining the application's robustness.

defmodule Deduplicator.Supervisor do
 use Supervisor

  def start_link(arg) do
    Supervisor.start_link(__MODULE__, arg, name: __MODULE__)
  end

  @impl true
  def init(_arg) do
    children = [
      {Duplicator, []}
    ]

    # Define the restart strategy
    opts = [strategy: :one_for_one, name: DeduplicatorSupervisor]
    Supervisor.init(children, opts)
  end
end

This Supervisor oversees Deduplicator, utilizing the :one_for_one strategy, which specifies that if the GenServer crashes, it will be the only process to be restarted.

Adding `Deduplicator` to the Phoenix Application's Supervision Tree

Integrating Deduplicator into the application's main supervision tree ensures it's started at launch, ready to deduplicate requests from the get-go. This is achieved by modifying the application's root supervisor to include Deduplicator's Supervisor as a child.

Utilizing `Deduplicator` in the Controller

With Deduplicator operational, we modify our Phoenix controller to leverage it for handling potential duplicate requests. Before processing any data, we check if it has already been processed, ensuring idempotency.

def handle_duplicates(%{assigns: _assigns} = conn, %{} = params) do
    entity = params["entity"]

    unique_id = Deduplicator.generate_unique_id(entity)

    if Deduplicator.already_processed?(unique_id) do
      respond(conn)
    else
      Deduplicator.mark_as_processed(unique_id)

      process_entity(%{} = entity)
      respond(conn)
    end
end

Lessons Learned and Concluding Thoughts

This journey illuminated the critical role of idempotency in ensuring data integrity across distributed systems. The development of Deduplicator not only solved my immediate challenge but also enriched my architectural approach, emphasizing resilience and reliability.

As I move forward, the insights gained from this experience will inform my future architectures, emphasizing the power of Elixir and Phoenix in building robust, fault-tolerant applications. For fellow engineers navigating similar challenges, I hope this account serves as both a guide and an inspiration.

Ciao 👋

Stop right there and think a bit!!

Prémices N. Kamasuwa — Sun, 01 Jan 2023 23:28:02 GMT

As software developers, it's easy to get caught up in following the latest trends and trying to do too much at once. We see articles and tweets from people who claim to have "made it," and we feel pressured to keep up with the latest technologies and approaches.

But at a certain point, it's important to stop and think about what we really want from our careers. Instead of trying to do everything that is popular or in demand, it's essential for software developers to find their own path and focus on what interests them. In this blog post, we'll discuss the importance of finding your own path as a software developer and offer some tips for doing so.

Problems with following trends

trends image

Every day, there are new articles, blogs, and social media posts proclaiming the next big thing in our industry. It's natural to want to stay up-to-date and be a part of the excitement. But the problem with trying to follow every trend and do everything that is popular is that it can be overwhelming and ultimately unfulfilling.

By trying to do too much at once, we risk spreading ourselves too thin and not being able to fully commit to any one thing. We may find ourselves constantly jumping from one trend to the next, never really mastering anything. This can lead to burnout and a lack of direction in our careers.

Furthermore, following trends can lead us to ignore our own interests and strengths. We may end up working on things that are popular, but not necessarily what we are passionate about. This can lead to boredom and a sense of disconnection from our work.

So instead of trying to do everything that is popular or in demand, it's important for us software developers to take a step back and think about what truly interests us and what we want to achieve in our careers. By focusing on our own passions and strengths, we can build fulfilling and successful careers that are truly our own.

The importance of finding your own path

It's very important to find our own path and focus on what interests us. This can help us build fulfilling and successful careers that are truly our own. By understanding our own passions and strengths, we can choose projects and technologies that align with these interests and allow us to make a real impact.

Finding your own path also allows you to stand out in the industry. Instead of following the same trends as everyone else, you can showcase your unique perspective and skills. This can make you a valuable asset to any team or organization, and increase your chances of success in the long run.

Furthermore, focusing on what interests you can help you stay motivated and engaged in your work. When you are working on something that you are truly passionate about, it's easier to put in the extra effort and dedication that is needed to succeed.

So instead of trying to do everything that is popular or in demand, take the time to think about what truly interests you as a software developer.

Conclusion

It's important for us software developers to stop and think about what we want to achieve and focus on what truly interests us.

Here are some tips for finding your own path as a software developer:

Network with others in the industry: Talk to other software developers and learn about their experiences and insights. This can help you get a sense of what different career paths are available and what might be a good fit for you.
Seek out mentors: Find someone who has been in the industry for a while and is willing to mentor you. They can provide valuable guidance and advice as you navigate your career.
Try out different technologies and approaches: Don't be afraid to experiment and try out new things. This can help you discover what you are truly interested in and what you are good at.
Take on side projects: Use side projects to explore your interests and try out new technologies. This can be a great way to get hands-on experience and find out what you really enjoy doing.
Keep learning: Stay up-to-date with the latest technologies and trends, but be selective about what you choose to learn. Focus on things that align with your interests and career goals.

By following these tips, you can start to find your own path as a software developer and focus on what truly interests you. This can lead to a fulfilling and successful career that is truly your own.

Understanding Distributed Systems, my comments

Prémices N. Kamasuwa — Tue, 27 Dec 2022 01:14:00 GMT

As a software engineer, I've always been fascinated by the complexity and power of distributed systems. These systems, which operate across multiple devices and locations, are at the heart of many of the technologies we rely on every day, from the internet and cloud computing to social networks and online marketplaces. Recently, I decided to dive deeper into the subject by reading the book 'Understanding Distributed Systems' by Roberto Vitillo. In this post, I'll share my thoughts on the book and how it has helped me better understand the fundamental concepts and principles of distributed systems. I'll also provide some examples of how these systems are used in the real world and offer some tips for those interested in working with distributed systems.

The book is generally well-regarded as a comprehensive and accessible introduction to the field of distributed systems. It covers a wide range of topics in detail and includes numerous examples and illustrations to help readers understand the concepts and ideas presented. Many readers have found the book to be a useful resource for learning about distributed systems and for gaining a deeper understanding of the fundamental principles that underlie these systems. Some reviewers have noted that the book can be dense and technical at times, and may be more suitable for readers with some background in computer science or a related field.

One of the things I appreciated most about this book was the clear and thorough explanation of key concepts and terminology. The book covers a wide range of topics, including distributed algorithms, network communication protocols, fault tolerance, and security. Each concept is introduced in a way that is easy to understand, and the book provides numerous examples and illustrations to help readers grasp the material.

As I read the book, I found myself jotting down notes and diagrams to help me visualize the different components and processes involved in distributed systems. The book does a great job of breaking down complex ideas into bite-sized chunks and explaining them in a way that is accessible to readers with a wide range of backgrounds. Whether you are new to the field of distributed systems or have some experience under your belt, you'll find plenty of valuable information and insights in this book.

One of the things I found most interesting as I read 'Understanding Distributed Systems' was learning about the different types of distributed systems that exist. The book distinguishes between three main types: peer-to-peer systems, client-server systems, and cloud computing systems.

Peer-to-peer (P2P) systems are decentralized networks in which each device acts as both a client and a server. These systems are often used for file sharing and other forms of data exchange. Examples of P2P systems include BitTorrent and Napster.

Client-server systems, on the other hand, consist of a central server that manages data and resources, and a set of clients that request and receive information from the server. These systems are commonly used for web-based applications, where the server handles the logic and data storage, and the clients are web browsers that display the information to users.

Cloud computing systems are large-scale distributed systems that provide on-demand access to a shared pool of computing resources, such as servers, storage, and networking. These systems are often used for storing and processing big data, and for running complex algorithms and applications. Examples of cloud computing systems include Amazon Web Services and Microsoft Azure.

Understanding the differences between these types of distributed systems is important for designing and implementing effective and efficient systems. The book does a great job of explaining the key characteristics and trade-offs of each type, and provides examples of when each might be most appropriate.

As I worked my way through the book, I found myself tempted to apply what I was learning to a proof-of-concept side project: a clone of the popular file-sharing service Dropbox, built using Rust, Go, Elixir, and TypeScript. Building a distributed system like this can be a complex and challenging endeavor, as it requires you to consider a wide range of factors, including scalability, reliability, performance, and security.

One of the things that I found most useful as I tackled this project was the book's discussion of the common challenges that arise when designing and implementing distributed systems. The book covers topics such as concurrent access to shared data, networking and communication protocols, and fault tolerance, and provides examples of how these challenges can be addressed in real-world systems.

I also appreciated the book's emphasis on the importance of testing and monitoring distributed systems. As I built my Dropbox clone, I made sure to include a suite of unit and integration tests, as well as monitoring tools to help me identify and resolve issues as they arose. Working with a variety of languages allowed me to gain experience with different tools and approaches, and helped me to better understand the trade-offs and benefits of each. Although I have to say, it wasn't an easy thing to do.

Overall, trying to build my own distributed system helped me to gain a deeper understanding of the concepts and principles covered in the book. It was a challenging but rewarding experience and one that I would recommend to anyone interested in working with distributed systems.

Conclusion

In conclusion, the book Understanding Distributed Systems has been a valuable resource for me as I delve into the world of distributed systems. It has helped me to gain a deeper understanding of the fundamental concepts and principles that underlie these systems and has provided me with a wealth of practical insights and examples to draw upon.

One of the ways I've put this knowledge into practice is by building a proof-of-concept clone of the popular file-sharing service Dropbox using Rust, Go, Elixir, and TypeScript. This project has been a challenging but rewarding experience and has given me a firsthand appreciation of the complexity and power of distributed systems.

Though my Dropbox clone is still a work in progress, I'm excited to see what 2023 holds. With the knowledge and skills I've gained from reading this book and working on this project, I'm looking forward to continuing to explore the field of distributed systems and perhaps even writing a clear documentation of how far I am in builiding my own system.

___

I hope this post has provided you with a helpful overview of the book 'Understanding Distributed Systems' and has given you some insight into the world of distributed systems. Whether you are just starting out in this field or are a seasoned pro, I think you'll find this book to be a valuable resource for understanding and working with these complex and powerful systems.

Ciao 👋🏾

Creating a cronjob micro-service using Elixir

Prémices N. Kamasuwa — Tue, 20 Dec 2022 00:45:00 GMT

If you've ever needed to automate a task or ensure that an important job gets done on schedule, you've probably used a cronjob. Simply put, a cronjob is a tool that allows you to schedule tasks to run automatically at a predetermined time or interval. Whether it's sending out a daily report or backing up a database, cronjobs are a convenient way to automate repetitive or time-consuming tasks.

Recently, I was working on a side project where I wanted to explore the benefits and trade-offs of using multiple languages in a system. I decided to create a cronjob micro-service using Elixir, a functional programming language that is well-suited for building scalable and fault-tolerant systems.

In this blog post, I'll document my experience building the service, and share some of the benefits and challenges I encountered along the way. If you're interested in using Elixir to create your own cronjob service, or if you're just curious about what's involved, I hope you'll find this post helpful.

Why Elixir for a Cronjob Service?

When it came to choosing a language for my cronjob service, I knew I wanted something that was fast, scalable, and fault-tolerant. Elixir checked all of those boxes and then some.

For those unfamiliar with Elixir, it is a functional programming language that runs on the Erlang virtual machine. One of the key benefits of Elixir is its support for concurrency, which allows it to easily handle multiple tasks concurrently. This makes it well-suited for building scalable and fault-tolerant systems.

In addition to its good performance and concurrency support, I was also drawn to Elixir's functional nature. Elixir encourages a functional programming style, which can make it easier to reason about code and write tests. All of these factors made Elixir an appealing choice for my cronjob service.

Using Phoenix for the Web Interface

To build the web interface for my cronjob service, I decided to use the Phoenix framework. Phoenix is a popular web framework for Elixir that makes it easy to build scalable and reliable web applications. It offers a variety of features that made it a good fit for my cronjob service, including support for web sockets, channels, and live view.

One of the key benefits of Phoenix is its use of the actor model for concurrency. In Phoenix, each web request is handled by its own Elixir process, which makes it easy to scale the application by adding more processes. This makes Phoenix well-suited for building a cronjob service, which may need to handle a large number of concurrent tasks.

Overall Architecture

The overall architecture of my cronjob service is designed to be scalable and fault-tolerant. It is composed of multiple Elixir processes that communicate with each other using the actor model. Each process is responsible for a specific task, such as scheduling a job to run or executing a job.

To ensure that the service can recover from failures, I used Elixir's built-in process supervision to monitor the health of the service. If any process fails, the supervisor will restart it, ensuring that the service stays up and running.

The Role of the Database

In my cronjob service, I used a database to store information about the tasks that are scheduled to run and the status of those tasks. This made it easy to track the progress of the service and ensure that tasks were being run as expected.

Integrating a database with an Elixir-based service can sometimes be a challenge, but I found that Elixir's Ecto library made it relatively straightforward. Ecto is a database library for Elixir that provides a simple interface for querying and updating a database.

Setting up the Development Environment

Before I could start building my cronjob service, I needed to set up a development environment. This involved installing Elixir, the Phoenix framework, and any other dependencies that were needed.

If you're new to Elixir, the first step is to install the Elixir runtime and build tools. You can find instructions for installing Elixir on the Elixir website. Once Elixir is installed, you'll also need to install the Phoenix framework. You can do this by running the following command:

mix archive.install hex phx_new

Next, you'll need to set up a database for your cronjob service. I chose to use PostgreSQL, but you could use any database that is supported by Elixir's Ecto library. Once you have a database set up, you'll need to configure your development environment to use it. This typically involves creating a database and a user, and then updating your Phoenix configuration to use the correct database credentials.

Defining the Tasks

Once I had my development environment set up, I was ready to start defining the tasks that my cronjob service would run. I used Elixir's built-in scheduling functions, such as cron/4, to specify the schedule for each task. For example, if I wanted to run a task every hour, I would use the following code:

cron("0 * * * *", MyApp.TaskScheduler, :run_task, [])

In this example, the cron/4 function takes four arguments: a cron expression, a module, a function, and a list of arguments. The cron expression specifies the schedule for the task, and the module and function specify the code that should be run when the task is triggered.

Implementing the Tasks

Once I had defined the tasks that my cronjob service would run, I needed to implement the code that would actually perform the work. This involved writing Elixir functions that would be called by the cron/4 function when the tasks were triggered.

For example, let's say I had defined a task to send a daily report by email. The implementation of this task might look something like this:

defmodule MyApp.TaskScheduler do
  def run_task do
    # Generate the report
    report = generate_report()

    # Send the report by email
    send_email(report)
  end

  defp generate_report do
    # code to generate the report goes here
  end

  defp send_email(report) do
    # code to send the email goes here
  end
end

In this example, the run_task/0 function is the entry point for the task. It calls the generate_report/0 and send_email/1 functions to perform the work of generating and sending the report.

Configuring the Cronjob Service

Once I had implemented the tasks that my cronjob service would run, I needed to configure the service to run on a predetermined schedule. To do this, I used the Phoenix framework to set up routes and controllers for the service.

For example, let's say I wanted to create a web interface for my cronjob service that would allow users to view and manage the tasks that were scheduled to run.

To create a web interface for my cronjob service, I used the Phoenix framework to set up routes and controllers. For example, I might create a route like this:

scope "/tasks", MyApp do
  pipe_through :api
  resources "/", TaskController
end

This route would allow users to access the tasks resource at the /tasks URL. I could then create a TaskController to handle requests to this resource.

To make it easy for users to view and manage the tasks that were scheduled to run, I used Phoenix's live view feature. Live view allows you to build real-time, interactive interfaces with minimal coding. For example, I might create a live view like this:

defmodule MyApp.TaskLive do
  use Phoenix.LiveView
  
  def mount(_params, _session, socket) do
    tasks = fetch_tasks()
    {:ok, assign(socket, tasks: tasks)}
  end
  
  def render(assigns) do
    # code to render the live view goes here
  end
  
  def handle_event("add_task", %{"name" => name, "schedule" => schedule}, socket) do
    # code to handle the "add_task" event goes here
  end
  
  defp fetch_tasks do
    # code to fetch the tasks from the database goes here
  end
end

In this example, the mount/3 function is called when the live view is first rendered. It fetches the tasks from the database and assigns them to the tasks variable. The render/1 function is then called to render the live view, and the handle_event/3 function is called to handle events that are sent from the client (such as an "add_task" event).

Using live view made it easy for me to create a real-time, interactive interface for my cronjob service. Users could view and manage the tasks that were scheduled to run, and they could see the changes in real time as they were made.

External communication with the service

Here is an example of how I used a message queue (in this case, RabbitMQ) to communicate with the cronjob service:

First, I needed to set up a RabbitMQ server and install the amqp library, which is an Elixir client library for RabbitMQ. You can do this by adding the following dependencies to your mix.exs file:

defp deps do
  [
    {:amqp, "~> 3.0"}
  ]
end

Next, I needed to create a connection to the RabbitMQ server and set up a channel for sending and receiving messages. We can do this in our application's startup code:

def start(_type, _args) do
  # Connect to the RabbitMQ server
  {:ok, conn} = AMQP.Connection.open(...)

  # Open a channel
  {:ok, chan} = AMQP.Channel.open(conn)

  # Set up queues and exchanges
  AMQP.Queue.declare(chan, "tasks", durable: true)
  AMQP.Exchange.direct(chan, "tasks", durable: true)
  AMQP.Queue.bind(chan, "tasks", "tasks", "")

  # Start the task scheduler process
  TaskScheduler.start_link(chan)
  
    # Start the web server
  {:ok, _pid} = Phoenix.Server.start_link(...)
end

With the connection and channel set up, we can start using RabbitMQ to send and receive messages. Here is an example of how one might use it in the TaskScheduler process:

defmodule TaskScheduler do
  use GenServer

  def start_link(chan) do
    GenServer.start_link(__MODULE__, chan, name: __MODULE__)
  end

  def init(chan) do
    # Set up the queue and exchange
    {:ok, _queue} = AMQP.Queue.declare(chan, "", exclusive: true)
    AMQP.Queue.bind(chan, "", "tasks", "")

    # Set up a consumer to receive messages from the queue
    AMQP.Basic.consume(chan, "", fn(payload, _metadata, _ack) ->
      # Parse the message and schedule the task
      {:ok, %{"name" => name, "schedule" => schedule}} = Jason.decode(payload)
      schedule_task(name, schedule)

      # Acknowledge the message
      AMQP.Basic.ack(chan, _ack)
    end)

    # Return the initial state
    {:ok, chan}
  end

  def schedule_task(name, schedule) do
    # Code to schedule the task goes here
  end
end

This TaskScheduler process uses RabbitMQ to set up a consumer that listens for messages on the "tasks" exchange. When a message is received, it parses the message and calls the schedule_task/2 function to schedule the task.

To send a message to the cronjob service, you can use the AMQP.Basic.publish/4 function. For example:

def create_task(name, schedule) do
  # Connect to the RabbitMQ server
  {:ok, conn} = AMQP.Connection.open(...)

  # Open a channel
  {:ok, chan} = AMQP.Channel.open(conn)

  # Encode the message
  message = Jason.encode(%{"name" => name, "schedule" => schedule})

  
  # Publish the message to the "tasks" exchange
  AMQP.Basic.publish(chan, "tasks", "", message)
  
  # Close the channel and connection
  AMQP.Channel.close(chan)
  AMQP.Connection.close(conn)
end

Conclusion

Creating a cronjob micro-service using Elixir was a fun and interesting project. I enjoyed the process of setting up a development environment, defining and implementing tasks, and configuring the service to run on a predetermined schedule.

One of the biggest challenges I faced was getting used to the syntax and concepts of Elixir, which was new to me. However, once I got the hang of it, I found that Elixir was a powerful and expressive language that made it easy to build the cronjob service.

If you're interested in creating your own cronjob service using Elixir, I recommend checking out the following resources:

The Elixir documentation: https://elixir-lang.org/docs/stable/elixir/
The Phoenix framework documentation: https://hexdocs.pm/phoenix/index.html
The Ecto library documentation: https://hexdocs.pm/ecto/index.html

I hope this blog post has been helpful and gives you an idea of what's involved in creating a cronjob service using Elixir. If you have any questions or comments, I'd love to hear from you!

If you find anything wrong, or anything that needs correction, please feel free to leave a comment and let me know and I will make sure to check it out and address it.

Ciao 👋🏾

References

1. https://github.com/quantum-elixir/quantum-core

2. https://www.phoenixframework.org/

3. https://blog.kalvad.com/write-your-own-cron-with-with-elixir/

4. https://wrgoldstein.github.io/2017/02/20/phoenix-rabbitmq.html

Serving an unsupported third-party middleware to the NestJs dependency injection layer

Prémices N. Kamasuwa — Sat, 17 Dec 2022 23:57:00 GMT

Have you ever found yourself in the middle of a project and realized that the tool you need to use is not supported by your framework of choice? That's exactly what happened to me recently when I was working on a NestJS application and wanted to integrate Cloudinary for image hosting.

NestJS is a powerful framework for building server-side applications with Node.js, and it offers a variety of built-in modules and middleware for common tasks such as logging, validation, and routing. However, when it comes to working with third-party services, there may be times when you need to add a custom solution to the mix.

In this blog post, I'll share my experience of serving an unsupported third-party middleware (Cloudinary) to the NestJS dependency injection layer. I'll explain the options I considered and the approach I ultimately took to solve this problem. I hope that by reading this post, you'll be able to apply these techniques to your own NestJS projects and take your skills to the next level.

Dependency injection

"Dependency injection" is a design pattern that helps to decouple parts of a system and make it more flexible and easier to test. In NestJS, the dependency injection system is based on the inversion of control (IoC) principle, which means that the framework is responsible for creating and supplying the dependencies required by a module or component.

To use dependency injection in NestJS, you first need to define a provider, which is a class or a function that returns an object or a value. This provider can then be injected into a module, controller, or service using the @Injectable() decorator.

For example, let's say you have a LoggerService that you want to use in multiple places throughout your application. You can define the LoggerService as a provider and then inject it wherever it is needed by using the @Inject() decorator:

@Injectable()
export class LoggerService {
  log(message: string) {
    console.log(message);
  }
}

@Controller()
export class SomeController {
  constructor(@Inject(LoggerService) private logger: LoggerService) {}

  @Get()
  doSomething() {
    this.logger.log('Doing something...');
  }
}

Adding third-party middleware to NestJS

In NestJS, adding third-party middleware to your application is typically a straightforward process. First, you need to install the npm package for the middleware you want to use. Then, you can apply the middleware to a specific route or to the entire application using the @UseMiddleware() decorator.

For example, let's say you want to add the cors middleware to your NestJS application to enable cross-origin resource sharing (CORS). You can install the cors package using npm or yarn:

npm install cors

Then, you can apply the cors middleware to a specific route by using the @UseMiddleware() decorator:

import { Controller, Get, UseMiddleware } from '@nestjs/common';
import * as cors from 'cors';

@Controller()
export class SomeController {
  @Get()
  @UseMiddleware(cors())
  doSomething() {
    // ...
  }
}

Or, you can apply the cors middleware to the entire application by using the app.use() method in the root module:

import { MiddlewareConsumer, Module, NestModule } from '@nestjs/common';
import * as cors from 'cors';

@Module({})
export class AppModule implements NestModule {
  configure(consumer: MiddlewareConsumer): void {
    consumer.apply(cors()).forRoutes('*');
  }
}

This is the normal process for adding third-party middleware to NestJS, but what do you do when the middleware you want to use is not officially supported by the framework? That's the topic of the next section, where we'll discuss the options for serving unsupported third-party middleware to the NestJS dependency injection layer.

Serving unsupported third-party middleware

As mentioned earlier, I recently ran into the challenge of integrating Cloudinary in a NestJS codebase. Cloudinary is a popular cloud-based image hosting and manipulation service, but it is not officially supported by NestJS. This meant that I had to find a way to serve the Cloudinary middleware to the NestJS dependency injection layer.

After researching different options, I decided to create a custom provider that wrapped the Cloudinary middleware and made it available for injection. This approach involved creating a class or function that returned the middleware as an object or a function and then decorating it with the @Injectable() decorator. Here is an example of the CloudinaryMiddleware provider I created:

import { Injectable } from '@nestjs/common';
import * as cloudinary from 'cloudinary';

@Injectable()
export class CloudinaryMiddleware {
  getMiddleware() {
    return cloudinary.v2.uploader.upload;
  }
}

The cloudinary.v2.uploader.upload function is the main method for uploading images to Cloudinary. By exposing it as a middleware function, I was able to use the @UseMiddleware() decorator to apply it to a specific route in my NestJS application:

import { Controller, Get, Inject, UseMiddleware } from '@nestjs/common';
import { CloudinaryMiddleware } from './cloudinary.middleware';

@Controller()
export class SomeController {
  constructor(@Inject(CloudinaryMiddleware) private cloudinaryMiddleware: CloudinaryMiddleware) {}

  @Get()
  @UseMiddleware(this.cloudinaryMiddleware.getMiddleware())
  doSomething() {
    // ...
  }
}

The CloudinaryMiddleware provider is just a wrapper around the cloudinary.v2.uploader.upload function, which is the main method for uploading images to Cloudinary.

To actually use this middleware to upload files, you would need to do the following:

Install the Cloudinary npm package: npm install cloudinary
Set up your Cloudinary account and obtain your API key, API secret, and cloud name. You can find these details in the Cloudinary dashboard.
Configure the Cloudinary npm package with your API key, API secret, and cloud name:

import * as cloudinary from 'cloudinary';

cloudinary.v2.config({
  api_key: 'YOUR_API_KEY',
  api_secret: 'YOUR_API_SECRET',
  cloud_name: 'YOUR_CLOUD_NAME'
});

4. Use the @UseMiddleware() decorator to apply the CloudinaryMiddleware middleware to a specific route or controller in your NestJS application. You can then call the cloudinary.v2.uploader.upload function within that route or controller to upload an image to Cloudinary.

Here is an example of how you might use the CloudinaryMiddleware middleware to upload an image from a NestJS route:

import { Controller, Get, Inject, UseMiddleware } from '@nestjs/common';
import { CloudinaryMiddleware } from './cloudinary.middleware';

@Controller()
export class SomeController {
  constructor(@Inject(CloudinaryMiddleware) private cloudinaryMiddleware: CloudinaryMiddleware) {}

  @Get()
  @UseMiddleware(this.cloudinaryMiddleware.getMiddleware())
  async doSomething() {
    const imagePath = '/path/to/image.jpg';
    const result = await this.cloudinaryMiddleware.getMiddleware()(imagePath);
    console.log(result); // logs the uploaded image details
  }
}

Conclusion

In this blog post, we looked at the problem of serving an unsupported third-party middleware (Cloudinary) to the NestJS dependency injection layer. We discussed two options for solving this problem: creating a custom provider, and extending the HttpAdapterHost class.

Using a custom provider, we were able to wrap the Cloudinary middleware and make it available for injection into a module or controller. This allowed us to use the @UseMiddleware() decorator to apply the Cloudinary middleware to a specific route or controller in our NestJS application.

Alternatively, we could have extended the HttpAdapterHost class and overridden the register() method to apply the Cloudinary middleware to the underlying HTTP server instance. This approach may be useful if you want to apply the middleware globally to the entire application.

I hope that by reading this blog post, you've gained a better understanding of how to serve unsupported third-party middleware to the NestJS dependency injection layer. Whether you're working with Cloudinary or another service, these techniques can help you extend the capabilities of your NestJS applications and take your skills to the next level.

References:

NestJS documentation: https://docs.nestjs.com/
NestJS middleware tutorial: https://docs.nestjs.com/middleware
NestJS dependency injection documentation: https://docs.nestjs.com/fundamentals/dependency-injection
Cloudinary documentation: https://cloudinary.com/documentation
Cloudinary npm package: https://www.npmjs.com/package/cloudinary

Brainstorming Ideas for Exposing a Postgres 9.6 Server for Remote Access on a Custom Domain with Nginx Reverse Proxy (Ubuntu 18.04)

Prémices N. Kamasuwa — Fri, 16 Dec 2022 23:01:00 GMT

Exposing a local Postgres server for remote access on a custom domain using Nginx as a reverse proxy can be a useful configuration for a variety of scenarios, such as hosting a database for a web application or enabling remote access for database administration tasks. In this blog post, we'll explore some ideas for setting up this configuration on a server running Ubuntu 18.04.

Throughout this post, we'll brainstorm different approaches to setting up a Postgres server for remote access on a custom domain with Nginx reverse proxy, considering factors such as security, performance, and maintenance. Whether you're a seasoned Postgres administrator or just getting started, we hope that this post will provide some useful insights and ideas for setting up this configuration.

Setting up Nginx as a reverse proxy

To set up Nginx as a reverse proxy for a Postgres server, you will need to perform the following steps:

Install Nginx on your server: To install Nginx on a server running Ubuntu 18.04, you can use the following command:

sudo apt-get update
sudo apt-get install nginx

2. Configure Nginx as a reverse proxy: To configure Nginx as a reverse proxy, you will need to create a new configuration file in the /etc/nginx/conf.d directory. For example, you might create a file named postgres.conf with the following contents:

server {
    listen 80;
    server_name example.com;

    location / {
        proxy_pass http://localhost:5432;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    }
}

This configuration will cause Nginx to listen for incoming requests on port 80 and forward them to the Postgres server listening on port 5432. You will need to replace example.com with the domain that you want to use for your Postgres server.

3. Restart Nginx: After saving the configuration file, you will need to restart Nginx to apply the changes. You can do this by running the following command:

sudo systemctl restart nginx

Test the configuration: To test the configuration, you can use a tool such as curl to send a request to the Postgres server through Nginx. For example:

curl http://example.com

If the configuration is working correctly, you should receive a response from the Postgres server.

Allowing remote connections to the Postgres server

By default, Postgres is configured to only listen for connections from localhost. To allow remote connections to the Postgres server, you will need to perform the following steps:

Edit the postgresql.conf file: Open the postgresql.conf file in a text editor and locate the listen_addresses parameter. Set this parameter to '*' to allow Postgres to listen for connections from any host.
Edit the pg_hba.conf file: Open the pg_hba.conf file in a text editor and add a line to allow connections from the IP address of the server where Nginx is running. For example:

host all all 1.2.3.4/32 md5

Replace 1.2.3.4 with the actual IP address of the server where Nginx is running.

3. Restart the Postgres server: After making these changes, you will need to restart the Postgres server to apply the changes. You can do this by running the following command:

sudo service postgresql restart

4. Test the configuration: To test the configuration, you can use a tool such as psql to connect to the Postgres server from a remote location. For example:

psql -h example.com -U postgres

If the configuration is working correctly, you should be able to connect to the Postgres server from a remote location.

Configuring SSL/TLS for HTTPS

To secure the connection between the client and the Postgres server using HTTPS, you will need to obtain an SSL/TLS certificate and configure Nginx to use it. There are two main options for obtaining a certificate:

Obtain a certificate from a trusted certificate authority (CA): One option is to obtain a certificate from a trusted CA such as Let's Encrypt or DigiCert. These CAs offer free and low-cost certificates that are widely recognized as trusted by web browsers and other clients. To obtain a certificate from a CA, you will need to follow the CA's specific instructions for generating and installing a certificate.
Use a self-signed certificate: Another option is to generate a self-signed certificate for testing or development purposes. While self-signed certificates are not trusted by web browsers and other clients by default, they can be useful for testing or prototyping. To generate a self-signed certificate, you can use the openssl tool. For example:

openssl req -x509 -nodes -days 365 -newkey rsa:2048 -keyout example.key -out example.crt

This command will generate a self-signed certificate and a private key, which you can use to configure Nginx.

To configure Nginx to use an SSL/TLS certificate, you will need to modify the Nginx configuration file that you created earlier. Specifically, you will need to add the following lines to the server block:

listen 443 ssl;
    ssl_certificate /path/to/example.crt;
    ssl_certificate_key /path/to/example.key;

Make sure to replace /path/to/example.crt and /path/to/example.key with the actual paths to the certificate and private key files.

After making these changes, you will need to restart Nginx to apply the changes. You can do this by running the following command:

sudo systemctl restart nginx

To test the configuration, you can use a tool such as curl to send a request to the Postgres server over HTTPS. For example:

curl --insecure https://example.com

If the configuration is working correctly, you should receive a response from the Postgres server.

Testing the setup

To test the setup and ensure that everything is working as expected, you can use a tool such as psql to connect to the Postgres server from a remote location.

To connect to the Postgres server using psql, you will need to specify the hostname of the server (e.g., example.com) and the username of a Postgres user that has the necessary privileges to connect to the server.

For example:

psql -h example.com -U postgres

If the setup is working correctly, you should be able to connect to the Postgres server and perform database tasks such as creating tables and inserting data.

You can also use other tools such as pgadmin or a web-based administration tool to connect to the Postgres server and perform database tasks.

Conclusion

In this blog post, we explored some ideas for setting up a Postgres server for remote access on a custom domain with Nginx as a reverse proxy. We considered factors such as security, performance, and maintenance, and brainstormed different approaches to configuring Nginx and Postgres to allow remote connections.

Please note that these ideas are for brainstorming purposes only and have not been tested. If you have tried any of these approaches and encountered any issues, or if you have any suggestions for improving the setup, please leave a comment below. We'd love to hear from you!

Whether you are a seasoned Postgres administrator or just getting started, we hope that you found this post helpful and informative. If you have any further questions or need more assistance with this topic, please don't hesitate to reach out. We'd be happy to help!

References

Why Software Developers should make time for side projects

Prémices N. Kamasuwa — Fri, 02 Dec 2022 13:55:00 GMT

As a software developer, it's easy to get caught up in the daily grind of working on client projects or tasks assigned by your employer. While it's important to focus on your job and deliver quality work, it's also essential to make time for side projects - those little side ventures that allow you to explore new technologies, improve your skills, and work on something that is purely for your own enjoyment.

Many successful developers have credited their side projects as a crucial factor in their growth and career advancement. In this blog post, we'll explore the benefits of side projects and why it's important to never underestimate their potential. So let's get started!

The benefits of side projects

Side projects are a crucial aspect of the software development world, and they should never be underestimated.

There are countless benefits to pursuing side projects as a software developer. For one, side projects are a great way to learn new technologies and expand your knowledge set. Whether you want to try out a new programming language, explore a new framework, or experiment with a new library, side projects provide the perfect opportunity to do so. Not only will you have the freedom to try out new technologies and approaches, but you will also have the chance to apply your learning in a real-world context.

Another benefit of side projects is that they can help you keep your skills up to date. In the fast-paced world of software development, it is essential to stay current with the latest trends and best practices. Side projects provide the perfect opportunity to do so, as they allow you to experiment with new technologies and approaches that you might not have the chance to explore in your day-to-day work.

In addition to learning and improving your skills, side projects are also a great way to work on something you are passionate about. Whether you are interested in building a new product, solving a specific problem, or just exploring your own interests, side projects provide the perfect opportunity to do so. By pursuing a side project that you are passionate about, you can find personal fulfillment and enjoyment in your work, which can be especially rewarding in times when your day job might feel monotonous or unfulfilling.

Finally, side projects can also be a great way to make your resume stand out. By demonstrating your ability to take initiative and complete projects on your own, you can show potential employers that you are a self-starter who is capable of handling complex tasks and delivering results.

My personal story

As a software developer, I know firsthand the value of side projects. In 2019, I started learning NestJs as a way to expand my knowledge and have fun. As a result, in 2020 I was able to use this knowledge to get a new position as a backend engineer at Data Systems in Kigali, Rwanda, where I built the backend of a huge edTech platform that was planned to be used by multiple schools in the country as a management tool.

One reason I enjoyed working with NestJs was that, at my previous company (Andela), many of my colleagues hated working with the front-end framework Angular. Personally, I have a drive to tackle the things that others avoid. So, I decided to learn Angular and even built an application to help manage mentorship internally at Andela. This side project helped me gain attention from my manager, who recommended me to Data Systems.

In 2020, a group of friends and I came together to build a side project that could potentially help us manage our community, share job posts, and connect with each other. This project ended up being a key factor in my next interview, with DEJ Technology GmbH. During the interview, I talked about the project and did some technical tests, and I was offered the job.

While working at DEJ, my friends and I created an application that manages hospitals using NextJs and NestJs. We decided to add a blockchain layer to the project as a way to stand out in the market if we decided to sell the product one day. This project allowed me the flexibility to implement all the ideas I had for a project, so I built microservices in Elixir, Typescript, Python Django, and Golang just for fun. Little did I know that this would help me get my current job, where I work on multiple products in various languages.

This year, I started building a clone of Dropbox using multiple languages as a proof of concept for distributed systems. I want to invest my future in this area of the industry, and I know that this project will help me get my next job.

Conclusion

Whether you're looking to learn a new tech stack, expand your knowledge set, keep your skills up to date, work on something you're passionate about, or even earn some passive income, side projects can be a valuable asset.

To get started with your own side project, consider the following tips:

Find a project idea that aligns with your interests and goals.
Set clear goals and a timeline for your project.
Break your project into manageable chunks to make it more achievable.
Find resources and support to help you along the way.
Don't be afraid to pivot or adjust your project as you go.
Most importantly, have fun and enjoy the process!

Django: Testing, Factories, and Data Seeding (Pytest, Mixer)

Prémices N. Kamasuwa — Thu, 01 Dec 2022 22:17:00 GMT

If you're a Django developer, you know how important it is to have a solid suite of tests to ensure that your code is working as expected. But what do you do when you inherit a huge codebase that has zero tests? That was the situation I found myself in at some point in 2022.

Coming from a project that was built using Elixir/Phoenix Live view and React, I was familiar with the concept of factories and the benefits they provide for test data management. I decided to bring this approach over to my Django project and was pleased with the results.

In this blog post, I'll explain how I set up Pytest for my Django project, defined factories using the Mixer library, and used them in my tests to create test data and run assertions. I'll also cover some advanced techniques for using factories, such as seeding the database. If you're looking for a more efficient and organized way to manage test data in your Django projects, read on to learn more about using factories with Pytest and Mixer.

Setting up Pytest

Before we can start using factories in our Django tests, we need to set up Pytest as our testing framework. Pytest is a powerful, feature-rich testing tool that is well-suited for testing Django applications. It's easy to install and has a number of plugins and features that can make testing Django projects a breeze.

To get started with Pytest in a Django project, you'll need to install the pytest and pytest-django packages. You can do this using pip:

pip install pytest pytest-django

If your team uses flake8 and black you might need to add some extra configs because pytest's script can also execute flake8 and black rules to check for linting issues. To do that you first need to install flake8 and black with the following script

pip install flake8 black

And then create a file named pytest.ini

Here is an example:

[pytest]
filterwarnings =
    error
    ignore::UserWarning
    ignore:function ham\(\) is deprecated:DeprecationWarning
DJANGO_SETTINGS_MODULE = server.settings
flake8-max-line-length = 120

There we provide some configs for pytest. (Please read the official docs from the link under references to understand more about the filterwarnings setting.). And then we provide the settings.py file that Django uses for global configs, in our case, a file called settings.py that is under a directory called server . And then lastly we provide the maximum length for flake8.

In general, this should be enough. You can simply run pytest from the command line to execute your tests.

By default, Pytest will discover and run all tests within the tests directory and its subdirectories. If you want to run a specific test or group of tests, you can use the -k flag to specify a test function or method name:

pytest -k test_function_name

2. Creating factories

Now that we have Pytest set up in our Django project, let's look at how we can use factories to create test data using Pytest fixtures. A fixture is a function that returns test data and can be used in multiple tests. It allows us to easily create realistic, customizable test data without having to manually set up complex data structures or write repetitive test setup code.

To use factories as Pytest fixtures in a Django project, we can use the Mixer library. Mixer is a powerful and easy-to-use library that allows us to define factories for Django models, forms, views, and other objects. It also has a number of features for customizing factory data and creating relationships between factories.

To install Mixer, you can use pip:

pip install mixer

Once Mixer is installed, we can start defining factories for our Django models. Here is an example of a factory for a Person model with a first_name and last_name field:

from mixer.backend.django import mixer

def person_factory(**kwargs):
    return mixer.blend(
        'server.app.models.Person',
        first_name=mixer.sequence(lambda n: f'first_name_{n}'),
        last_name=mixer.sequence(lambda n: f'last_name_{n}'),
        **kwargs,
    )

This factory uses Mixer's blend function to create a new Person object with default values for the first_name and last_name fields. The mixer.sequence function generates unique values for these fields using a given lambda function. We can also pass additional keyword arguments to the factory function to override the default values for any field.

We can use this factory as a Pytest fixture by decorating it with the @pytest.fixture decorator:

import pytest

@pytest.fixture
def person(db):
    return person_factory()

This fixture returns a new Person object created by the person_factory function. The db fixture provided by Pytest-Django is used to ensure that the Person object is saved to the database. We can then use this fixture in our Pytest tests like this:

def test_person_model(person):
    assert person.first_name == 'first_name_0'
    assert person.last_name == 'last_name_0'

This test uses the person fixture to get a Person object and then runs assertions on its fields.

By using Mixer factories as Pytest fixtures, we can easily create test data for our tests and reuse it across multiple tests. In the next section, we'll look at more advanced techniques for using factories in tests.

3. Advanced techniques

In the previous sections, we looked at how to set up Pytest in a Django project and how to use Mixer factories to create test data. In this section, we'll explore some advanced techniques for using factories in tests, including how to refactor our factory code to support multiple keyword arguments and how to create custom Pytest fixtures for creating test data.

First, let's refactor our Person factory to support multiple keyword arguments. This will allow us to easily override the default values for any field in our Person model:

from mixer.backend.django import mixer

def person_factory(**kwargs):
    defaults = {
        'first_name': mixer.sequence(lambda n: f'first_name_{n}'),
        'last_name': mixer.sequence(lambda n: f'last_name_{n}'),
    }
    defaults.update(kwargs)
    return mixer.blend('server.app.models.Person', **defaults)

This updated version of the person_factory function defines default values for the first_name and last_name fields and then updates them with any keyword arguments passed to the factory. This allows us to easily create a Person object with custom values for any field by calling the factory with keyword arguments like this:

person = person_factory(first_name='John', last_name='Doe')

Next, let's create a custom Pytest fixture for creating test data using our person_factory . Let's call itinsert . This fixture will allow us to easily create multiple Person objects with different values for any field by calling it with keyword arguments.

First, we need a container for all factories: here is how you can create one:

model_factories = []

Here is an example of the insert fixture:

import pytest

@pytest.fixture
def insert(db):
    def _insert(model_name, count, persist=True, **kwargs):
        model_factory = next(f for f in model_factories if f.__name__ == f"{model_name.lower()}_factory")
        objects = [model_factory(**kwargs) for _ in range(count)]
        if persist:
            for obj in objects:
                obj.save()
        return objects
    return _insert

This fixture takes three required arguments:

model_name: The name of the model as a string.
count: The number of objects to return in a list.
persist: A boolean indicating whether the objects should be saved to the database (defaults to True).

It also supports additional keyword arguments that will be passed to the model factory.

Here is an example of how we can use the insert fixture to create multiple Person objects with different values for the first_name field:

def test_person_model(insert, db):
    persons = insert(
        'app.Person',
        count=3,
        first_name=mixer.sequence(lambda n: f'first_name_{n}'),
    )
    assert persons[0].first_name == 'first_name_0'
    assert persons[1].first_name == 'first_name_1'
    assert persons[2].first_name == 'first_name_2'

This test uses the insert fixture to create three Person objects with unique first_name values and then runs assertions on the first_name fields of the objects. The db fixture provided by Pytest-Django is used to ensure that the Person objects are saved to the database.

And that is how you can easily seed dynamic data directly in tests using pytest and mixer.

4. Bonus

As developers, we often join projects where we are not provided with a test database dump to use for local development. In these situations, it can be useful to create a Django management command that generates dummy data for use in local development.

To create a management command for generating dummy data, we can use our Pytest fixtures and the Faker library. Faker is a library that generates fake data, such as names, addresses, and phone numbers, for use in test environments. To install Faker, you can use pip:

$ pip install Faker

Here is an example of a Django management command called db_seed that generates dummy data using our Pytest fixtures and Faker:

import faker
import logging
from django.core.management.base import BaseCommand

class Command(BaseCommand):
    help = 'Seeds the database with fake data'

    def add_arguments(self, parser):
        parser.add_argument('--quantity', type=int, default=1, help='Number of items to create')
        parser.add_argument('--model', type=str, required=True, help='Model to use for creating data')
        parser.add_argument('--attributes', type=str, help='Model attributes')

    def handle(self, *args, **options):
        fake = faker.Faker()
        quantity = options['quantity']
        model_name = options['model']
        model_attributes = options['attributes']

        # Get the model factory function from the Pytest fixtures
        model_factory = pytest.fixture('model_factory')

        for _ in range(quantity):
            # Use the model factory function to create dummy data
            obj = model_factory(model_name=model_name, **model_attributes)
            obj.save()
        logging.info(f'Successfully seeded {quantity} {model_name} objects')

This management command takes three arguments:

quantity: The number of items to create
model: The name of the model to use for creating data. This argument is required.
attributes: A string of Python-like keyword arguments to pass to the model factory.

It uses the Pytest fixtures and Faker to create quantity number of dummy objects for the specified model, using the specified attributes. The objects are then saved to the database.

To use this management command, you would run the following command from the command line:

$ python manage.py db_seed --quantity=5 --model=app.Person --attributes='first_name="John", last_name="Doe"'

This command would create five Person objects with the first_name of "John" and the last_name of "Doe".

The db_seed management command uses the built-in Python logging module to log a message when the seeding is complete. You can configure the logging module to output log messages to different places, such as a file or the console, by setting up a logging configuration in your Django settings.

Furthermore, for safety, you can even handle all operations to the database in transactions. That way you keep the integrity of the data and also secure the CLI from unexpected errors.

5. Conclusion

We have explored how to use Pytest, Mixer, and Faker to create factories for generating dummy data in Django tests. We have covered how to create a simple model factory, how to use Pytest fixtures to create test data, and how to create a Django management command for generating dummy data for local development.

Using these techniques, you can easily create test data for your Django applications, making it easier to write and run tests that rely on data being present in the database. This can help you ensure that your code is working correctly and reduce the risk of regressions as you make changes to your codebase.

I hope you have found this blog post helpful, and that you have learned some useful techniques for testing Django applications. If you have any questions or suggestions, please feel free to leave a comment below.

References:

Uploading files to Digital Ocean Spaces, NestJs

Prémices N. Kamasuwa — Mon, 01 Nov 2021 12:15:00 GMT

If you've ever used the AWS SDK to upload files to S3, you know how convenient it can be. However, not everyone has the option to leverage AWS services, and that's where DigitalOcean's Spaces comes in. Spaces is an excellent alternative that offers similar functionality and is just as easy to use.

While uploading files to Spaces using a Node.js server is well-documented, resources covering the process with a NestJS server are harder to come by. In this article, I'll walk you through how to seamlessly upload files to DigitalOcean Spaces using the AWS SDK within a NestJS application.

Before we dive in, let's clarify a few assumptions: I'm assuming you're already familiar with NestJS, have experience with DigitalOcean Spaces, and have set up a Spaces instance with the necessary API keys. Additionally, you should have a working NestJS project ready to go.

With those basics covered, let's move on to creating a simple controller that accepts a file from a form-data request body and uploads it to DigitalOcean Spaces. This should be a straightforward exercise and a great way to get comfortable with the process. Let's get started!

1. Creating a Service to Handle File Uploads

One of NestJS's standout features is its strong emphasis on the Dependency Injection (DI) design pattern. DI simplifies managing dependencies in a TypeScript codebase by resolving them automatically based on their types.

For our file upload functionality, we'll create a custom service. Unlike built-in services that NestJS can automatically resolve, custom services require a bit more setup. We'll need to create a custom provider to ensure our service is correctly instantiated and injected where needed.

Before we dive into the code, let's start by installing the AWS SDK. You can do so with one of the following commands:

in case you are using npm:

// npm
npm install aws-sdk

// yarn
yarn add  aws-sdk

In our codebase, under src, let's create a directory called SpacesModule that contains a directory called SpacesService that contains 2 files, index.ts and doSpacesService.ts. index.ts will contain the provider and doSpacesService.ts will be the actual service.

// index.ts
import * as AWS from 'aws-sdk';
import { Provider } from '@nestjs/common';

// Unique identifier of the service in the dependency injection layer
export const DoSpacesServiceLib = 'lib:do-spaces-service';

// Creation of the value that the provider will always be returning.
// An actual AWS.S3 instance
const spacesEndpoint = new AWS.Endpoint('fra1.digitaloceanspaces.com');

const S3 = new AWS.S3({
  endpoint: spacesEndpoint.href,
  credentials: new AWS.Credentials({
    accessKeyId: '',
    secretAccessKey: '',
  }),
});

// Now comes the provider
export const DoSpacesServicerovider: Provider = {
  provide: DoSpacesServiceLib,
  useValue: S3,
};

// This is just a simple interface that represents an uploaded file object 
export interface UploadedMulterFileI {
  fieldname: string;
  originalname: string;
  encoding: string;
  mimetype: string;
  buffer: Buffer;
  size: number;
}

Now, let's create the service with a method called uploadFile

// doSpacesService.ts
import { Inject, Injectable } from '@nestjs/common';
import * as AWS from 'aws-sdk';
import {
  DoSpacesServiceLib,
  DoSpacesServicerovider,
} from './doSpacesService';

// Typical nestJs service
@Injectable()
export class DoSpacesService {
  constructor(@Inject(DoSpacesServiceLib) private readonly s3: AWS.S3) {}

  async uploadFile(file: UploadedMulterFileI) {
    // Precaution to avoid having 2 files with the same name
    const fileName = `${Date.now()}-${
      file.originalname
    }`;

    // Return a promise that resolves only when the file upload is complete
    return new Promise((resolve, reject) => {
      this.s3.putObject(
        {
          Bucket: '',
          Key: fileName,
          Body: file.buffer,
          ACL: 'public-read',
        },
        (error: AWS.AWSError) => {
          if (!error) {
            resolve(`/${fileName}`);
          } else {
            reject(
              new Error(
                `DoSpacesService_ERROR: ${error.message || 'Something went wrong'}`,
              ),
            );
          }
        },
      );
    });
  }
}

2. As the last pieces of the puzzle, let's create a module to wrap everything and the Controller

Under the SpacesModule directory, let's create 2 files, spaces.module.ts and spaces.controller.ts at this point, our SpacesModule directory looks like this:

src-
    |
    |SpacesModule
                |
                |-SpacesService
                |             |
                |             | doSpacesService.ts
                |             | index.ts
                | spaces.controller.ts
                | spaces.module.ts

in spaces.controller.ts let's have the following

import {
  Controller,
  UploadedFile,
  UseInterceptors,
  Post
} from '@nestjs/common';
import { FileInterceptor } from '@nestjs/platform-express';
import { DoSpacesService } from './SpacesService/doSpacesService';
import { DoSpacesServicerovider, UploadedMulterFileI } from './SpacesService';

// just a typical nestJs controller
@Controller('/api/v1/do')
export class CommonController {
  constructor(
    private readonly doSpacesService: DoSpacesService,
  ) {}

  @UseInterceptors(FileInterceptor('file'))
  @Post('spaces')
  async uploadFile(@UploadedFile() file: UploadedMulterFileI) {
    const url = await this.doSpacesService.uploadFile(file);

    return {
      url,
    };
  }
}

in spaces.module.ts let's have the following

import { Module } from '@nestjs/common';
import { SpacesController } from './spaces.controller';
import { DoSpacesService } from './SpacesService/doSpacesService';
import { DoSpacesServicerovider } from './SpacesService';

@Module({
  imports: [],
  controllers: [SpacesController],
  // provide both the service and the custom provider
  providers: [DoSpacesServicerovider, DoSpacesService],
})
export class SpacesModule {}

We've covered all the steps necessary to handle file uploads to Digital Ocean Spaces using a NestJS server and the AWS SDK. All that's left to do now is to add the module we created to the main app module and then send a POST request to the /do/spaces endpoint with the file attached as a form field named file. If everything is set up correctly, you should receive a URL back in the response, and you can check the file on Digital Ocean Spaces to confirm that the upload was successful.

References

NestJs Custom Providers: https://docs.nestjs.com/fundamentals/custom-providers#custom-providers
AWS NPM SDK: https://www.npmjs.com/package/aws-sdk
Digital Ocean Spaces: https://www.digitalocean.com/products/spaces/

Boilerplate of A Desktop App With Electron & React/Typescript (For busy developers)

Ghost — Tue, 12 Oct 2021 02:13:00 GMT

Hey busy developers! Looking for a quick and straightforward way to get started with Electron, React, and TypeScript? You're in the right place! In this article, I'll guide you through creating a boilerplate for a desktop app using these technologies.

Now, you might be thinking: "But I've never worked with Electron before, and I don't have time to read the documentation!" Don't worry—that's exactly why this tutorial exists. I've designed it to be as simple and streamlined as possible, so you can have your app up and running in no time.

It's important to note that this isn't the only way to set up an Electron app. However, it's one of the fastest methods, perfect for those short on time who just want to get things up and running quickly.

So, let's get started!

Before diving in, it's essential to understand the basic structure of an Electron app. Essentially, it's a web application embedded within a Chromium/Electron/Node.js framework. To help illustrate this, check out the image below, which shows the build process of an Electron app:

Let's start by creating a basic react app using create-react-app with the command

npx create-react-app ./ --template typescript

This will create an empty react app in the current directory.

Now let's install electron with the command

npm install electron

and then create a directory called electron with the following files:

index.ts
preload.ts
tsconfig.json

index.ts

The index.ts file is the entry point for Electron. It contains all the necessary instructions to start a basic Electron browser window and load the index.html file from the build folder. This file, in turn, loads all the bundles created by the build script of React-scripts. The content of the index.ts file includes inline comments that provide explanations for each step of the process.

Here is the code:

import { app, BrowserWindow } from 'electron';
import * as path from 'path';

let mainWindow: Electron.BrowserWindow | null;

function createWindow() {
  // Create the browser window.electron
  mainWindow = new BrowserWindow({
    webPreferences: {
      preload: path.join(__dirname, 'preload.js'),
    },
  });

  // and load the index.html of the app.
  mainWindow.loadFile(path.join(__dirname, 'index.html'));

  // Open the DevTools.
  // mainWindow.webContents.openDevTools();

  // Emitted when the window is closed.
  mainWindow.on('closed', () => {
    // Dereference the window object, usually you would store windows
    // in an array if your app supports multi windows, this is the time
    // when you should delete the corresponding element.
    mainWindow = null;
  });
  mainWindow.maximize();
}

// This method will be called when Electron has finished
// initialization and is ready to create browser windows.
// Some APIs can only be used after this event occurs.
app.on('ready', () => {
  createWindow();
});

// Quit when all windows are closed.
app.on('window-all-closed', () => {
  // On OS X it is common for applications and their menu bar
  // to stay active until the user quits explicitly with Cmd + Q
  if (process.platform !== 'darwin') {
    app.quit();
  }
});

app.on('activate', () => {
  // On OS X it"s common to re-create a window in the app when the
  // dock icon is clicked and there are no other windows open.
  if (mainWindow === null) {
    createWindow();
  }
});

2. preload.ts

You can use preload.ts which will be loaded before other scripts run on the main page. This script will always have access to both electron APIs and node APIs(and also the browser APIs) no matter whether node integration is turned on or off.

Since the app that we are building is pretty basic, it will not have much content.

// All of nodeJS APIs are available in the preload process
// it has the same sanbox as chrome extension
window.addEventListener('DOMContentLoaded', () => {});

export {};

3. tsconfig.json

If you've worked with TypeScript before, you're probably familiar with the tsconfig.json file. For those who may not be familiar, tsconfig.json is a configuration file that allows you to specify the root-level files and compiler options needed to compile a TypeScript project. The presence of this file in a directory indicates that the directory is the root of a TypeScript project. You can find more information about tsconfig.json and its various properties in the TypeScript documentation.

here

In our case, the content should look like the following:

{
  "compilerOptions": {
    "module": "commonjs",
    "noImplicitAny": true,
    "sourceMap": true,
    "outDir": "../build",
    "baseUrl": ".",
    "paths": {
      "*": ["node_modules/*"]
    }
  },
  "include": [
    "**/*"
  ]
}

Now the last thing that we have to do is configure the scripts in package.json exactly as the image at the beginning of this post describes it.

   {
      "scripts": {
         "build:web": "PUBLIC_URL=./ react-scripts build",
         "build:desktop": "tsc -p electron/tsconfig.json",
         "start:desktop": "npm run build:web && npm run build:desktop && electron ./build/index.js",
     }
   }

Now that we have everything set up, we can run the command npm run start:desktop and see that app is built in an electron frame.

And that's it, you have created a desktop app with Electron & React/Typescript.

Bonus

Creating executables for all platforms (Windows, Linux, macOS) with electron-builder

There are multiple packages to use to create executables for an electron app. There are lots of tutorials online on how to use them but I found electron-builder quite easy to set up and to understand.

Let's start by adding the configurations in the package.json file. It's a build property added to the root of the JSON file.

  "build": {
    "extraMetadata": {
      "homepage": "./",
      "main": "build/index.js"
    },
    "productName": "my-app-name",
    "appId": "my-app-id-or-version",
    "files": [
      "build/**/*",
      "node_modules/**/*"
    ],
    "mac": {
      "category": "public.app-category.productivity",
      "target": [
        "dmg",
        "zip"
      ],
      "icon": "src/Assets/img/unix.icns"
    },
    "linux": {
      "maintainer": "john@doe.com",
      "target": [
        "tar.gz",
        "deb"
      ],
      "icon": "src/Assets/img/unix.icns"
    },
    "win": {
      "target": [
        "zip",
        "dir"
      ],
      "icon": "src/Assets/img/windows.ico"
    }
  }

Now, let's add scripts to trigger the creation of the files for all operating systems

    "prebuild:package": "npm run build:web && npm run build:desktop",
    "build:package:windows": "npm run prebuild:package && electron-builder --win",
    "build:package:linux": "npm run prebuild:package && electron-builder --linux",
    "build:package:mac": "npm run prebuild:package && electron-builder --mac",
    "build:package:all": "npm run prebuild:package && electron-builder --win --linux --mac"

And there you have it. I think the names of the scripts speak for themselves. No further explanation is needed.

2. Automatic updates of the electron app in production

Automatic updates of the electron app in production can also be achieved in many ways but the most popular one is doing it via an "update server". I think that one is subject to a whole new blog post that I will publish in the future if I have time.

References:

Dynamically Set Angular Environment Variables in Docker

Ghost — Tue, 12 Oct 2021 02:13:00 GMT

When using Angular, environent variables are kinda cooked into the application bundles on every single build that they are not meant to be changed afterward.

In a recent project, I encountered a challenge when trying to manage multiple environment settings in an Angular app running in a Docker container.

I will try here to solve the problem in a step-by-step, self-explanatory way.

Problem description

The application is a single-page app with a .Net Core backend API. An example of one variable that has to vary from one environment to another is the backend URL.

Here is how the environment.ts file looks like:

export const environment = {
  production: false,
  backendBaseUrl: 'http://localhost:5151'
};

As you might have guessed, environnment.backendBaseUrl is the part that will be changing.

Locally we could have http://localhost:5151
On staging we could have staging.some-app-domain.com
And on production: some-app-domain.com

According to the Twelve-Factor App, configuration should be stored in the environment. By default, this is not possible with Angular's built-in environment variables. So we need to feed them from an external source.

Solution: Feeding Angular Env Variables from an external source

Let's consider the following:

After a successful build, the default environment.ts gets cross-compiled into a nearly un-editable JavaScript file. One way around that can be externalizing the configuration from the cross-compiled app bundle files.
The content of the /assets directory never gets changed. It just gets copied into the build directory.

In a Docker environment, we could take advantage of behavior number 2 and create a volume on the assets directory. In that scenario, we just add an "extra environment file" that the main environment files of the Angular app can read from. And hence, whenever we change the config file in the /assets directory (from the volume) the whole application will adapt. And that way, we have flexibility over environments.

Furthermore, we can make it a little bit better by creating a docker command that can provide values to placeholders in the file every time the docker image of the app is being built. (This will not even require to have a volume)

Okay 🤓, no more talking, let's make that happen.

Let's create a new env.js file in the /assets directory with the following content.

(function (window) {
  window['env'] = window['env'] || {};

  // Environment variables
  window['env']['backendBaseUrl'] = 'https://dot-net-backend.com/api/v1';
})(this);

The Javascript function that we just created defines our future environment variables. It won't be cross-compiled but simply copied to the /dist directory since it is part of the /assets folder, and can be edited in clear text later.

Now, let's call the function at application startup by adding it to the index.html file:

Now let's feed the env variables from the new env.js in the normal Angular environemnt.*.ts files.

export const environment = {
  production: false,
  backendBaseUrl:
    window['env']['backendBaseUrl'] || 'http://localhosst:5151/api/v1'
};

Let's now create a template file for our environment variables.

Let's create a env.sample.js in the /assets director

(function (window) {
  window['env'] = window['env'] || {};

  // Environment variables
  window['env']['backendBaseUrl'] = '${BACKEND_BASE_URL}';
})(this);

The ${PLACEHOLDER} variables can now be overwritten during our Docker image creation. We are going to use the envsubst shell command for that.

What the command will basically do is copy-paste the content of the env.sample.js into env.js replacing the ${PLACEHOLDER} values in the process while picking them from the standard environment.

Here the code comes:

# Dockerfile to build and server the Angular application


###############
### STAGE 1: Build app
###############
FROM node:14-alpine as build

WORKDIR /usr/local/app
# Add the source code to app
COPY ./ /usr/local/app/
# Install all the dependencies
RUN npm install
# Generate the build of the application
RUN npm run build

###############
### STAGE 2: Serve app with nginx ###
###############
FROM nginx:1.19.3-alpine
COPY  --from=build /usr/local/app/dist /usr/share/nginx/html

# Expose port 80
EXPOSE 80

# When the container starts, replace the env.js with values from environment variables
CMD ["/bin/sh",  "-c",  "envsubst < /usr/share/nginx/html/assets/env.sample.js > /usr/share/nginx/html/assets/env.js && exec nginx -g 'daemon off;'"]

There we have everything set up and ready to fly!!!!

I've come to like docker-compose because of the way it helps managing multiple services while also easily controlling the environment. I would suggest that we use it.

Let's create a docker-compose.yml file that will build our image and start its container and make it listen on port 4200.

version: '3.5'

services:
  web:
    env_file: ".env"
    build:
      context: .
      dockerfile: Dockerfile
    ports:
      - '4200:80'

With the .env file having the following content:

BACKEND_BASE_URL=http://the-bakend-url.com/api/v1

Now if you just run docker-compose up --build at the root of the project, you will find the app on port 4200 with the right environment variable that you set in the .env file.

Thanks for taking the time to read.

Ciao 👋🏽

References:

Investigating the TMDB movie dataset, part 2

Prémices N. Kamasuwa — Tue, 03 Aug 2021 01:25:16 GMT

This blog post is the second part of a series. I recommend reading the first part to fully understand the context of this one. In this post, we'll focus on data cleaning, using the results from the first part as our foundation.

Data Cleaning

Step 1. Remove Columns with Excessive Null Values.

df.head(1)

Step 2. Remove duplicated data

df.drop_duplicates(inplace=True)

Step 3. Eliminate Rows with Null Values in Essential Columns

df.dropna(subset = ['cast', 'director', 'genres'], how='any', inplace=True)

Let's check if there are still null values

df.isnull().sum()

Step 4. Replace zero values with null values in the budget and revenue column.

df['revenue'] = df['revenue'].replace(0, np.NaN)df.info()

Step 5. Drop the runtime column.

df.query('runtime == 0')

df.info()

df.describe()

From the table above, we can see that replacing the zeros with null values in the budget and revenue distribution made them look better. We can also see that the minimum makes now more sense

This is the end of the second part. If you got some good time reading, stay tuned. I will post the third part soon.

Thank you for reading.

Investigating the TMDB movie dataset

Prémices N. Kamasuwa — Tue, 03 Aug 2021 01:19:50 GMT

I participated in the Data analyst nanodegree program from Udacitywhere I worked on a number of projects. In the coming weeks, I will be writing blog posts to share my experiences and insights from these projects.

Note: This blog post is the first part of a series where I analyze a dataset. The goal is to demonstrate how straightforward data analysis can be.

Introduction

Are you curious about what makes a movie successful? In this series of blog posts, we'll use data from The Movie Database (TMDb) to explore the factors that contribute to a film's popularity, ratings, and revenue. Our dataset includes information on over 5,000 movies, covering aspects like budget, cast, director, keywords, runtime, genres, production companies, release date, and more.

In this first post, we'll take a closer look at the TMDb movie data and introduce some of the questions we'll be addressing in the coming weeks, such as:

How has movie popularity changed over the years?
How does revenue vary across different ratings and popularity levels?
What characteristics are associated with high-popularity movies?
How many movies are released each year?
What are the keyword trends by generation?

Using tools like Numpy, Pandas, and Matplotlib, we'll dive into the data to uncover valuable insights. But before we begin, let's introduce the dataset and discuss its contents.

LET'S GO!!

First, let's import the necessary packages.

import matplotlib.pyplot as plt
import seaborn as sns
from collections import Counter

%matplotlib inline

Data Wrangling

General Properties

Let's load the info of the dataset

df.info()

The TMDB movie data includes 10866 entries and 21 columns, with data types including integers, floats, and strings. A significant number of columns have null values, as indicated by the number of entries per column. In the next step, we will examine the exact number of null records per column.

list(df.isnull().sum().items())

After examining the null values in the TMDB movie data, we found that several columns contain null records, including cast, homepage, director, tagline, keywords, overview, genres, and production companies. In particular, the homepage, tagline, keywords, and production_companies columns have a large number of null records. In order to move forward with our analysis, we decided to remove the tagline and keywords columns, which had a high number of null values.

Next, we will try to gather more descriptive information from the dataset.

df.describe()

After examining the popularity column in the TMDB movie data, we observed some outliers that appear to be valid data points. Therefore, we decided to retain the original data rather than remove these outliers.

We also noticed that the budget, revenue, and runtime columns contain many zero values. Initially, we considered the possibility that these movies were not released, but upon examining the release_year column, we found that the minimum value (1996) is a valid year and that there were no null values. This suggests that these movies were indeed released, but may have missing data for budget, revenue, and runtime. In order to determine the cause of these zero values, we will closely examine these records and try to gather more information about them.

df_budget_zero.head(3)

Then for the revenue

df_revenue_zero.head(3)

After investigating the records with zero values for budget and revenue, we found that these values were likely missing data rather than indicating that the movies were not released. However, we also discovered that some of these records had other inconsistencies or missing data that could potentially affect the results of our analysis. As a result, we decided to drop these records rather than impute the missing values or set them to zero.

Next, we will check the number of null values in the dataset to determine whether we should drop or impute these values as well.

First for the budget zero values

df_budget_0count.head(2)

As suggested by the results, there are a lot of zero values than non-zero values. Dropping them out would corrupt the results. I better set them as null instead.

Then for the revenue zero values

df_revenue_0count.head(2)

Same situation. Set to null

Finally for the runtime

The number of zeroes is negligible, they can be dropped out

Summary

In this first part of the series, we focused on preparing the TMDB movie data for analysis. We removed some columns that had a lot of null values or were not necessary for answering our research questions, and we also dropped duplicated data. We then removed null values in certain columns and replaced zero values with null values in the budget and revenue columns. Finally, we dropped any rows with a runtime of zero.

Thank you for reading! In the next part of the series, we will continue with data cleaning and begin to explore the data in more depth. Stay tuned!

Ciao 👋🏾

Prémices Kamasuwa

Building a Cron Job Scheduler Using Redis and Node.js

Introduction

Prerequisites

Step 1 — Setting Up Your Node.js Application

1.1. Create a New Project Directory

1.2. Initialize a New Node.js Project

1.3. Install Necessary Node.js Packages

1.4. Initialize a TypeScript Project

1.5. Project Structure

Step 2 — Implementing the Task Interface

Explanation of Task Interface Fields:

Step 3 — Implementing the Redis Client

Explanation:

Step 4 — Implementing the Scheduler Class

Explanation:

Step 5 — Implementing a Sample Task Handler

Explanation:

Step 6 — Implementing the Redis Listener

Explanation:

Step 7 — Running and Testing the Scheduler

7.1. Configure Environment Variables

7.2. Schedule a Task Using TypeScript Code

Step-by-Step Guide to Scheduling a Task via TypeScript:

Explanation of the Script:

7.3. Start the Redis Listener

7.4. Observe the Output

Conclusion

Ensuring Data Integrity in Real-time Synchronization: A Phoenix LiveView Tale

The Challenge of Synchronization

Encountering Duplication

Crafting the Solution

The Genesis of the Deduplicator module

Tutorial: Implementing the Deduplicator module

Checking for Duplicates

Cleaning up ETS to save up on memory usage

Implementing Periodic Cleanup

Handling the Cleanup Process

Integrating Cleanup with Initialization

Utilizing the Solution in a Phoenix LiveView application

Integrating MyDeduplicator with a Supervisor

Adding Deduplicator to the Phoenix Application's Supervision Tree

Utilizing Deduplicator in the Controller

Lessons Learned and Concluding Thoughts

Stop right there and think a bit!!

Understanding Distributed Systems, my comments

Creating a cronjob micro-service using Elixir

Serving an unsupported third-party middleware to the NestJs dependency injection layer

Brainstorming Ideas for Exposing a Postgres 9.6 Server for Remote Access on a Custom Domain with Nginx Reverse Proxy (Ubuntu 18.04)

Setting up Nginx as a reverse proxy

Testing the setup

Conclusion

Why Software Developers should make time for side projects

Django: Testing, Factories, and Data Seeding (Pytest, Mixer)

Uploading files to Digital Ocean Spaces, NestJs

1. Creating a Service to Handle File Uploads

2. As the last pieces of the puzzle, let's create a module to wrap everything and the Controller

Boilerplate of A Desktop App With Electron & React/Typescript (For busy developers)

Bonus

Dynamically Set Angular Environment Variables in Docker

Problem description

Solution: Feeding Angular Env Variables from an external source

Investigating the TMDB movie dataset, part 2

Data Cleaning

Step 1. Remove Columns with Excessive Null Values.

Step 2. Remove duplicated data

Step 3. Eliminate Rows with Null Values in Essential Columns

Step 4. Replace zero values with null values in the budget and revenue column.

Step 5. Drop the runtime column.

Investigating the TMDB movie dataset

Introduction

Data Wrangling

General Properties

Summary

Adding `Deduplicator` to the Phoenix Application's Supervision Tree

Utilizing `Deduplicator` in the Controller