Skip to content

ZetaCloud Documentation

Overview

ZetaCloud is a versatile command-line tool that simplifies the process of training or fine-tuning machine learning models on remote GPU clusters. With just a few commands, you can effortlessly manage your tasks and harness the computational power of various GPUs. This comprehensive documentation will guide you through every aspect of the ZetaCloud CLI, from installation to advanced usage.

Table of Contents

  1. Installation
  2. ZetaCloud CLI
  3. Options
  4. Basic Usage
  5. Example 1: Starting a Task
  6. Example 2: Stopping a Task
  7. Example 3: Checking Task Status
  8. Advanced Usage
  9. Example 4: Cluster Selection
  10. Example 5: Choosing the Cloud Provider
  11. Additional Information
  12. References

1. Installation

Getting started with ZetaCloud is quick and straightforward. Follow these steps to set up ZetaCloud on your machine:

  1. Open your terminal or command prompt.

  2. Install the zetascale package using pip:

pip install zetascale
  1. After a successful installation, you can access the ZetaCloud CLI by running the following command:
zeta -h

This command will display a list of available options and basic usage information for ZetaCloud.

2. ZetaCloud CLI

The ZetaCloud Command-Line Interface (CLI) provides a set of powerful options that enable you to manage tasks on GPU clusters effortlessly. Below are the available options:

Options

  • -h, --help: Display the help message and exit.
  • -t TASK_NAME, --task_name TASK_NAME: Specify the name of your task.
  • -c CLUSTER_NAME, --cluster_name CLUSTER_NAME: Specify the name of the cluster you want to use.
  • -cl CLOUD, --cloud CLOUD: Choose the cloud provider (e.g., AWS, Google Cloud, Azure).
  • -g GPUS, --gpus GPUS: Specify the number and type of GPUs required for your task.
  • -f FILENAME, --filename FILENAME: Provide the filename of your Python script or code.
  • -s, --stop: Use this flag to stop a running task.
  • -d, --down: Use this flag to terminate a cluster.
  • -sr, --status_report: Check the status of your task.

3. Basic Usage

ZetaCloud's basic usage covers essential tasks such as starting, stopping, and checking the status of your tasks. Let's explore these tasks with examples.

Example 1: Starting a Task

To start a task, you need to specify the Python script you want to run and the GPU configuration. Here's an example command:

zeta -f train.py -g A100:8

In this example: - -f train.py indicates that you want to run the Python script named train.py. - -g A100:8 specifies that you require 8 NVIDIA A100 GPUs for your task.

Example 2: Stopping a Task

If you need to stop a running task, you can use the following command:

zeta -s

This command will stop the currently running task.

Example 3: Checking Task Status

To check the status of your task, use the following command:

zeta -sr

This command will provide you with a detailed status report for your active task.

4. Advanced Usage

ZetaCloud also offers advanced options that allow you to fine-tune your tasks according to your specific requirements.

Example 4: Cluster Selection

You can select a specific cluster for your task by providing the cluster name with the -c option:

zeta -f train.py -g A100:8 -c my_cluster

This command will run your task on the cluster named my_cluster.

Example 5: Choosing the Cloud Provider

ZetaCloud supports multiple cloud providers. You can specify your preferred cloud provider using the -cl option:

zeta -f train.py -g A100:8 -cl AWS

This command will execute your task on a cloud provider's infrastructure, such as AWS.

5. Additional Information

  • ZetaCloud simplifies the process of utilizing GPU clusters, allowing you to focus on your machine learning tasks rather than infrastructure management.

  • You can easily adapt ZetaCloud to various cloud providers, making it a versatile tool for your machine learning needs.