the Ultimate Guide to Web Workers — SitePoint

Donations Make us online

In this tutorial, we’ll introduce web workers and demonstrate how you can use them to address execution speed issues.

Contents:

  1. JavaScript Non-blocking I/O Event-loop
  2. Long-running JavaScript Functions
  3. Web Workers
  4. Browser Worker Demonstration
  5. Server-side Web Worker Demonstration
  6. Alternatives to Node.js Workers
  7. Conclusion

JavaScript programs in browsers and on the server run on a single processing thread. This means that the program can do one thing at a time. In simplistic terms, your new PC may have a 32-core CPU, but 31 of those are sitting idle when your JavaScript application runs.

JavaScript’s single thread avoids complex concurrency situations. What would happen if two threads attempted to make incompatible changes at the same time? For example, a browser could be updating the DOM while another thread redirects to a new URL and wipes that document from memory. Node.js, Deno, and Bun inherit the same single-thread engine from browsers.

This isn’t a JavaScript-specific restriction. Most languages are single-threaded, but web options such as PHP and Python typically run on a web server which launches separate instances of the interpreter on a new thread for every user request. This is resource-intensive, so Node.js apps usually define their own web server, which runs on a single thread and asynchronously handles every incoming request.

The Node.js approach can be more efficient at handling higher traffic loads, but long-running JavaScript functions will negate efficiency gains.

Before we demonstrate how you can address execution speed issues with web workers, we’ll first examine how JavaScript runs and why long-running functions are problematic.

JavaScript Non-blocking I/O Event-loop

You might think that doing one thing at once would cause performance bottlenecks, but JavaScript is asynchronous, and this averts the majority of single-thread processing problems, because:

  • There’s no need to wait for a user to click a button on a web page.

    The browser raises an event which calls a JavaScript function when the click occurs.

  • There’s no need to wait for a response to an Ajax request.

    The browser raises an event which calls a JavaScript function when the server returns data.

  • There’s no need for a Node.js application to wait for the result of a database query.

    The runtime calls a JavaScript function when data is available.

JavaScript engines run an event loop. Once the last statement of code has finished executing, the runtime loops back and checks for outstanding timers, pending callbacks, and data connections before executing callbacks as necessary.

Other OS processing threads are responsible for calls to input/output systems such as HTTP requests, file handlers, and database connections. They don’t block the event loop. It can continue and execute the next JavaScript function waiting on the queue.

In essence, JavaScript engines have a single responsibility to run JavaScript code. The operating system handles all other I/O operations which may result in the engine, calling a JavaScript function when something occurs.

Long-running JavaScript Functions

JavaScript functions are often triggered by an event. They’ll do some processing, output some data and, most of the time, will complete within milliseconds so the event loop can continue.

Unfortunately, some long-running functions can block the event loop. Let’s imagine you were developing your own image processing function (such as sharpening, blurring, grayscaling, and so on). Asynchronous code can read (or write) millions of bytes of pixel data from (or to) a file — and this will have little impact on the JavaScript engine. However, the JavaScript code which processes the image could take several seconds to calculate every pixel. The function blocks the event loop — and no other JavaScript code can run until it completes.

  • In a browser, the user wouldn’t be able to interact with the page. They’d be unable to click, scroll, or type, and may see an “unresponsive script” error with an option to stop processing.

  • The situation for a Node.js server application is worse. It can’t respond to other requests as the function executes. If it took ten seconds to complete, every user accessing at that point would have to wait up to ten seconds — even when they’re not processing an image.

You could solve the problem by splitting the calculation into smaller sub-tasks. The following code processes no more than 1,000 pixels (from an array) using a passed imageFn function. It then calls itself with a setTimeout delay of 1 millisecond. The event loop blocks for a shorter period so the JavaScript engine can handle other incoming events between iterations:



function processImage( callback, imageFn = i => {}, imageIn = [] ) {

  const chunkSize = 1000; 

  
  let
    imageOut = [],
    pointer = 0;

  processChunk();

  
  function processChunk() {

    const pointerEnd = pointer + chunkSize;

    
    imageOut = imageOut.concat(
      imageFn( imageIn.slice( pointer, pointerEnd ) )
    );

    if (pointerEnd < imageIn.length) {

      
      pointer = pointerEnd;
      setTimeout(processChunk, 1);

    }
    else if (callback) {

      
      callback( null, imageOut );

    }

  }

}

This can prevent unresponsive scripts, but it’s not always practical. The single execution thread still does all the work, even though the CPU may have capacity to do far more. To solve this problem, we can use web workers.

Web Workers

Web workers allow a script to run as a background thread. A worker runs with its own engine instance and event loop separate from the main execution thread. It executes in parallel without blocking the main event loop and other tasks.

To use a worker script:

  1. The main thread posts a message with all necessary data.
  2. An event handler in the worker executes and starts the computations.
  3. On completion, the worker posts a message back to the main thread with returned data.
  4. An event handler in the main thread executes, parses the incoming data, and takes necessary actions.

worker processing

The main thread — or any worker — can spawn any number of workers. Multiple threads could process separate chunks of data in parallel to determine a result faster than a single background thread. That said, each new thread has a start-up overhead, so determining the best balance can require some experimentation.

All browsers, Node.js 10+, Deno, and Bun support workers with a similar syntax, although the server runtimes can offer more advanced options.

Browser Worker Demonstration

The following demonstration shows a digital clock with milliseconds that update up to 60 times per second. At the same time, you can launch a dice emulator which throws any number of dice any number times. By default, it throws ten six-sided dice ten million times and records the frequency of totals.

View the above demo on CodeSandbox:

Click start throwing and watch the clock; it will pause while the calculation runs. Slower devices and browsers may throw an “unresponsive script” error.

Now check the use web worker checkbox and start throwing again. The clock continues to run during the calculation. The process can take a little longer, because the web worker must launch, receive data, run the calculation, and return results. This will be less evident as calculation complexity or iterations increase. At some point, the worker should be faster than the main thread.

Dedicated vs shared workers

Browsers provide two worker options:

  • dedicated workers: a single script launched, used, and terminated by another

  • shared workers: a single script accessible to multiple scripts in different windows, iframes, or workers

Each script communicating with a shared worker passes a unique port, which a shared worker must use to pass data back. However, shared workers aren’t supported in IE or most mobile browsers, which makes them unusable in typical web projects.

Client-side worker limitations

A worker runs in isolation to the main thread and other workers; it can’t access data in other threads unless that data is explicitly passed to the worker. A copy of the data is passed to the worker. Internally, JavaScript uses its structured clone algorithm to serialize the data into a string. It can include native types such as strings, numbers, Booleans, arrays, and objects, but not functions or DOM nodes.

Browser workers can use APIs such as console, Fetch, XMLHttpRequest, WebSocket, and IndexDB. They can’t access the document object, DOM nodes, localStorage, and some parts of the window object, since this could lead to the concurrency conflict problems JavaScript solved with single threading — such as a DOM change at the same time as a redirect.

IMPORTANT: workers are best used for CPU-intensive tasks. They don’t benefit intensive I/O work, because that’s offloaded to the browser and runs asynchronously.

How to use a client-side web worker

The following demonstration defines src/index.js as the main script, which starts the clock and launches the web worker when a user clicks the start button. It defines a Worker object with the name of the worker script at src/worker.js (relative to the HTML file):


const worker = new Worker("./src/worker.js");

An onmessage event handler follows. This runs when the worker sends data back to the main script — typically when the calculation is compete. The data is available in the event object’s data property, which it passes to the endDiceRun() function:


worker.onmessage = function(e) {
  endDiceRun(e.data);
};

The main script launches the worker using its postMessage() method to send data (an object named cfg):


worker.postMessage(cfg);

The src/worker.js defines worker code. It imports src/dice.js using importScripts() — a global worker method which synchronously imports one or more scripts into the worker. The file reference is relative to the worker’s location:

importScripts('./dice.js');

src/dice.js defines a diceRun() function to calculate the throwing statistics:


function diceRun(runs = 1, dice = 2, sides = 6) {
  const stat = [];

  while (runs > 0) {
    let sum = 0;

    for (let d = dice; d > 0; d--) {
      sum += Math.floor(Math.random() * sides) + 1;
    }
    stat[sum] = (stat[sum] || 0) + 1;
    runs--;
  }

  return stat;
}

Note that this is not an ES module (see below).

src/worker.js then defines a single onmessage() event handler. This runs when the main calling script (src/index.js) sends data to the worker. The event object has a .data property which provides access to the message data. In this case, it’s the cfg object with the properties .throws, .dice, and .sides, which get passed as arguments to diceRun():

onmessage = function(e) {

  
  const cfg = e.data;
  const stat = diceRun(cfg.throws, cfg.dice, cfg.sides);

  
  postMessage(stat);

};

A postMessage() function sends the result back to the main script. This calls the worker.onmessage handler shown above, which runs endDiceRun().

In summary, threaded processing occurs by sending message between the main script and the worker:

  1. The main script defines a Worker object and calls postMessage() to send data.
  2. The worker script executes an onmessage handler which starts a calculation.
  3. The worker calls postMessage() to send data back to the main script.
  4. The main script executes an onmessage handler to receive the result.

worker message calls

Web worker error handling

Unless you’re using an old application, developer tools in modern browsers support web worker debugging and console logging like any standard script.

The main script can call a .terminate() method to end the worker at any time. This may be necessary if a worker fails to respond within a specific time. For example, this code terminates an active worker if it hasn’t received a response within ten seconds:


const worker = new Worker('./src/worker.js');


const workerTimer = setTimeout(() => worker.terminate(), 10000);


worker.onmessage = function(e) {

  
  clearTimeout(workerTimer);

};


worker.postMessage({ somedata: 1 });

Worker scripts can use standard error handling techniques such as validating incoming data, try, catch, finally, and throw to gracefully handle issues as they arise and report back to the main script if required.

You can detect unhandled worker errors in the main script using these:

  • onmessageerror: fired when the worker receives a data it cannot deserialize

  • onerror: fired when an JavaScript error occurs in the worker script

The returned event object provides error details in the .filename, .lineno, and .message properties:


worker.onerror = function(err) {
  console.log(`${ err.filename }, line ${ err.lineno }: ${ err.message }`);
}

Client-side web workers and ES modules

By default, browser web workers are not able to use ES modules (modules that use the export and import syntax).

The src/dice.js file defines a single function imported into the worker:

importScripts('./dice.js');

Somewhat unusually, the src/dice.js code is also included in the main src/index.js script, so it can launch the same function as worker and non-worker processes. src/index.js loads as an ES module. It can’t import the src/dice.js code, but it can load it as an HTML <script> element so it becomes available within the module:

const diceScript = document.createElement('script');
diceScript.src = './src/dice.js';
document.head.appendChild(diceScript);

This scenario isn’t likely to happen in most applications unless you need to share code libraries between the main and worker scripts.

It’s possible to support ES6 modules in workers by appending a { type: "module" } argument to the worker constructor:

const worker = new Worker('./src/worker.js', { type: 'module' });

You can then export the diceRun() function in src/dice.js:

export function diceRun(runs = 1, dice = 2, sides = 6) {
  
}

You then import it in the worker.js module using a fully qualified or relative URL reference:

import { diceRun } from './dice.js';

In theory, ES6 modules are a great choice, but unfortunately they’re only supported in Chromium-based browsers from version 80 (released in 2020). You can’t use them in Firefox or Safari, which makes them impractical for the example code shown here.

A better option is use a bundler such as esbuild or rollup.js. These can resolve ES module references and pack them into a single worker (and main) JavaScript file. This simplifies coding and has the benefit of making workers noticeably faster, because they don’t need to resolve imports before execution.

Client-side service workers

Service workers are special web workers used by Progressive Web Apps to offer offline functionality, background data synchronization, and web notifications. They can:

  • act as a proxy between the browser and the network to manage cached files
  • run in the background even when a browser or page isn’t loaded to update data and receive incoming messages

Like web workers, service workers run on a separate processing thread and can’t use APIs such as the DOM. However, that’s where the similarities end:

  • The main thread can declare the availability of a service worker, but there isn’t any direct communication between the two. The main thread doesn’t necessarily know that a service worker is running.

  • Service workers are not typically used for CPU-intensive calculations. They may indirectly improve performance by caching files and making other network optimizations.

  • A specific domain/path can use many web workers for different tasks, but it can only register one service worker.

  • Service workers must be on the same HTTPS domain and path, while a web worker could operate from any domain or path over HTTP.

Service workers are beyond the scope of this article, but you can find more information:

Server-side Web Worker Demonstration

Node.js is the most-used server JavaScript runtime, and it has offered workers from version 10.

Node.js isn’t the only server runtime:

  • Deno replicates the Web Worker API, so the syntax is identical to browser code. It also offers a compatibility mode which polyfills Node.js APIs if you want to use that runtime’s worker thread syntax.

  • Bun is in beta, although the intention is to support both browser and Node.js worker APIs.

  • You may be using JavaScript serverless services such as AWS Lambda, Azure functions, Google Cloud functions, Cloudflare workers, or Netlify edge functions etc. These may provide web worker-like APIs, although there’s less benefit, because each user request launches a separate isolated instance.

The following demonstration shows a Node.js process which writes the current time to the console every second: Open Node.js demonstration in a new browser tab.

A dice throwing calculation then launches on the main thread. This pauses the current time being output:

  timer process 12:33:18 PM
  timer process 12:33:19 PM
  timer process 12:33:20 PM
NO THREAD CALCULATION STARTED...
┌─────────┬──────────┐
│ (index) │  Values  │
├─────────┼──────────┤
│    2    │ 2776134  │
│    3    │ 5556674  │
│    4    │ 8335819  │
│    5    │ 11110893 │
│    6    │ 13887045 │
│    7    │ 16669114 │
│    8    │ 13885068 │
│    9    │ 11112704 │
│   10    │ 8332503  │
│   11    │ 5556106  │
│   12    │ 2777940  │
└─────────┴──────────┘
processing time: 2961ms
NO THREAD CALCULATION COMPLETE

timer process 12:33:24 PM

Once complete, the same calculation launches on a worker thread. In this case, the clock continues to run while dice processing occurs:

WORKER CALCULATION STARTED...
  timer process 12:33:27 PM
  timer process 12:33:28 PM
  timer process 12:33:29 PM
┌─────────┬──────────┐
│ (index) │  Values  │
├─────────┼──────────┤
│    2    │ 2778246  │
│    3    │ 5556129  │
│    4    │ 8335780  │
│    5    │ 11114930 │
│    6    │ 13889458 │
│    7    │ 16659456 │
│    8    │ 13889139 │
│    9    │ 11111219 │
│   10    │ 8331738  │
│   11    │ 5556788  │
│   12    │ 2777117  │
└─────────┴──────────┘
processing time: 2643ms
WORKER CALCULATION COMPLETE

  timer process 12:33:30 PM
  timer process 12:33:31 PM
  timer process 12:33:32 PM

The worker process is often a little faster than the main thread.

How to use a server-side web worker

The demonstration defines src/index.js as the main script, which starts a timer process (if it’s not already running) when the server receives a new HTTP request:


timer = setInterval(() => {
  console.log(`  timer process ${ intlTime.format(new Date()) }`);
}, 1000);

The runWorker() function defines a Worker object with the name of the worker script at src/worker.js (relative to the project root). It passes a workerData variable as a single value which, in this case, is an object with three properties:

const worker = new Worker("./src/worker.js", {
  workerData: { throws, dice, sides }
});

Unlike browser web workers, this starts the script. There’s no need to run worker.postMessage(), although you can use that to run the parentPort.on("message") event handler defined in the worker.

The src/worker.js code calls diceRun() with the workerData values and passes the result back to the main thread using parentPort.postMessage():


import { workerData, parentPort } from "node:worker_threads";
import { diceRun } from "./dice.js";


const stat = diceRun(workerData.throws, workerData.dice, workerData.sides);


parentPort.postMessage(stat);

This raises a "message" event in the main src/index.js script, which receives the result:


worker.on("message", result => {
  console.table(result);
});

The worker terminates after sending the message, which raises an "exit" event:

worker.on("exit", code => {
  
});

You can define other error and event handlers as necessary:

  • messageerror: fired when the worker receives data it can’t deserialize
  • online: fired when the worker thread starts to execute
  • error: fired when a JavaScript error occurs in the worker script.

Inline worker scripts

A single script file can contain both main and worker code. The code can check whether it’s running on the main thread using isMainThread, then call itself as a worker (using import.meta.url as the file reference in an ES module, or __filename in CommonJS):

import { Worker, isMainThread, workerData, parentPort } from "node:worker_threads";

if (isMainThread) {

  
  
  const worker = new Worker(import.meta.url, {
    workerData: { throws, dice, sides }
  });

  worker.on("message", msg => {});
  worker.on("exit", code => {});

}
else {

  
  const stat = diceRun(workerData.throws, workerData.dice, workerData.sides);
  parentPort.postMessage(stat);

}

Personally, I prefer to separate the files, since the main and worker threads probably require different modules. Inline workers could be an option for simple, one-script projects.

Server-side worker limitations

Server workers still run in isolation and receive limited copies of data as they do in the browser.

Server-side worker threads in Node.js, Deno, and Bun have fewer API restrictions than browser workers, because there’s no DOM. You could have issues when two or more workers attempt to write data to the same file at the same time, but that’s unlikely to occur in most apps.

You won’t be able to pass and share complex objects such as database connections, since most will have methods and functions which can’t be cloned. However, you could do one of the following:

  • Asynchronously read database data in the main thread and pass the resulting data to the worker.

  • Create another connection object in the worker. This will have a start-up cost, but may be practical if your function requires further database queries as part of the calculation.

IMPORTANT: please remember that workers are best used for CPU-intensive tasks. They don’t benefit intensive I/O work, because that’s offloaded to the OS and runs asynchronously.

Sharing data between threads

Communication between the main and worker threads shown above results in cloned data on both sides. It’s possible to share data between threads using a SharedArrayBuffer object representing fixed-length raw binary data. The following main thread defines 100 numeric elements from 0 to 99, which it sends to a worker:

import { Worker } from "node:worker_threads";

const
  buffer = new SharedArrayBuffer(100 * Int32Array.BYTES_PER_ELEMENT),
  value = new Int32Array(buffer);

value.forEach((v,i) => value[i] = i);

const worker = new Worker("./worker.js");

worker.postMessage({ value });

The worker can receive the value object:

import { parentPort } from 'node:worker_threads';

parentPort.on("message", value => {
  value[0] = 100;
});

At this point, either the main or worker threads can change elements in the value array, and it’s changed on both sides.

This technique results in some efficiency gains, because it’s not necessary to serialize data in either thread. There are downsides:

  • You can only share integers.
  • It’s still necessary to send a message to indicate that data has changed.
  • There’s a risk two threads could change the same value at the same time and lose synchronization.

That said, the process could benefit high-performance games which need to process a high quantity of image or other data.

Alternatives to Node.js Workers

Not every Node.js application needs or can use a worker. A simple web server app may not have complex calculations. It continues to run on a single processing thread and will become less responsive as the number of active users increases. The device may have considerably more processing power, with multiple CPU cores which remain unused.

The following sections describe generic multi-threading options.

Node.js child processes

Node.js supported child processes before workers and both Deno and Bun have similar facilities.

In essence, they can launch another application (not necessarily in JavaScript), pass data, and receive a result. They operate in a similar way to workers but are generally less efficient and more process-intensive.

Workers are best used when you’re running complex JavaScript functions — probably within the same project. Child processes become necessary when you’re launching another application, such as a Linux or Python command.

Node.js clustering

Node.js clusters allow you to fork any number of identical processes to handle loads more efficiently. The initial primary process can fork itself — perhaps once for each CPU returned by os.cpus(). It can also handle restarts when an instance fails and broker communication messages between forked processes.

The cluster standard library offers properties and methods including:

  • .isPrimary: returns true for the main primary process (the older .isMaster is also supported)

  • .fork(): spawns a child worker process

  • .isWorker: returns true for worker processes

This example starts a web server worker process for each CPU/core available on the device. A 4-core machine will spawn four instances of the web server so it can handle up to four times the processing load. It also restarts any process which fails, to make the application more robust:


import cluster from 'node:cluster';
import process from 'node:process';
import { cpus } from 'node:os';
import http from 'node:http';

const cpus = cpus().length;

if (cluster.isPrimary) {

  console.log(`Started primary process: ${ process.pid }`);

  
  for (let i = 0; i < cpus; i++) {
    cluster.fork();
  }

  
  cluster.on('exit', (worker, code, signal) => {
    console.log(`worker ${ worker.process.pid } failed`);
    cluster.fork();
  });

}
else {

  
  http.createServer((req, res) => {

    res.writeHead(200);
    res.end('Hello!');

  }).listen(8080);

  console.log(`Started worker process:  ${ process.pid }`);

}

All processes share port 8080 and any can handle an incoming HTTP request. The log when running the applications shows something like:

$ node app.js
Started primary process: 1001
Started worker process:  1002
Started worker process:  1003
Started worker process:  1004
Started worker process:  1005

...etc...

worker 1002 failed
Started worker process:  1006

Few Node.js developers attempt clustering. The example above is simple and works well, but code can become increasingly complex as you attempt to handle messages, failures, and restarts.

Process managers

A Node.js process manager can help run multiple instances of a Node.js application without having to manually write cluster code. The most well known is PM2. The following command starts an instance of your application for every CPU/core and restarts any when they fail:

pm2 start ./app.js -i max

The app instances start in the background, so it’s ideal for using on a live server. You can examine which processes are running by entering pm2 status (abridged output shown):

$ pm2 status

┌────┬──────┬───────────┬─────────┬─────────┬──────┬────────┐
│ id │ name │ namespace │ version │ mode    │ pid  │ uptime │
├────┼──────┼───────────┼─────────┼─────────┼──────┼────────┤
│ 1  │ app  │ default   │ 1.0.0   │ cluster │ 1001 │ 4D     │
│ 2  │ app  │ default   │ 1.0.0   │ cluster │ 1002 │ 4D     │
└────┴──────┴───────────┴─────────┴─────────┴──────┴────────┘

PM2 can also run non-Node.js applications written in Deno, Bun, Python, or any other language.

Container managers

Clusters and process managers bind an application to a specific device. If your server or an OS dependency fails, your application will fail regardless of the number of running instances.

Containers are a similar concept to virtual machines, except that, rather than emulating a full hardware device, they emulate an operating system. A container is a lightweight wrapper around a single application with all necessary OS, library, and executable files. It provides an isolated instance of Node.js (or any other runtime) and your application. A single device can run many containers, so there’s less need for clustering or process management.

Containers are beyond the scope of this article, but well known solutions include Docker and Kubernetes. They can launch and monitor any number of containers across any number of devices, even in different locations, while distributing incoming traffic.

Conclusion

JavaScript workers can improve application performance on both the client and server by running CPU-intensive calculations in parallel threads. Server-side workers can also make applications more robust by running more dangerous functions in separate threads and terminating them when processing times exceed certain limits.

Using workers in JavaScript is straightforward, but:

  • Workers can’t access all APIs such as the browser DOM. They’re best used for long-running calculation tasks.

  • Workers are less necessary for intensive but asynchronous I/O tasks such as HTTP requests and database queries.

  • Starting a worker has an overhead, so some experimentation may be necessary to ensure they improve performance.

  • Options such as process and container management may be a better option than server-side multi-threading.

That said, workers are a useful tool to consider when you encounter performance issues.


Source link