Most of the datasets, e.g. cifar100, oxford_iiit_pet or imagenet_v2 can be fully automatically downloaded and prepared by running

Each configuration file contains a comment at the top with a COMMAND snippet to run it, and some hint of expected runtime and results. See below for more details, but generally speaking, running on a GPU machine involves calling python -m COMMAND while running on TPUs, including multi-host, involves

Since 1979, Rokinon has been committed to value. Our phenomenal reputation doesn’t just stem from our lens and optic products’ performance—we are just as dedicated to providing our customers with excellent service. For assistance in your search for a Rokinon lens for Nikon F mounts or any of our other premium optic products, please reach out to our team. We are always happy to help.

To train your own big_vision models on a large dataset, e.g. imagenet2012 (prepare the TFDS dataset), run the following command line.

Specifically, the seven TFDS datasets used during evaluations will be generated under ~/tensorflow_datasets on TPU machine with this command:

The first release contains the core part of pre-training, transferring, and evaluating classification models at scale on Cloud TPU VMs.

Some datasets, like imagenet2012 or imagenet2012_real, require the data to be downloaded manually and placed into $TFDS_DATA_DIR/downloads/manual/, which defaults to ~/tensorflow_datasets/downloads/manual/. For example, for imagenet2012 and imagenet2012_real one needs to place the official ILSVRC2012_img_train.tar and ILSVRC2012_img_val.tar files in that directory and then run python3 -m big_vision.tools.download_tfds_datasets imagenet2012 imagenet2012_real (which may take ~1 hour).

Many datasets can be downloaded and preprocessed automatically when used for the first time. Nevertheless, we intentionally disable this feature and recommend doing dataset preparation step separately, ahead of the first run. It will make debugging easier if problems arise and some datasets, like imagenet2012, require manually downloaded data.

big_vision supports flexible parameter and model sharding strategies. Currently, we support a popular FSDP sharding via a simple config change, see this config example. For example, to run FSDP finetuning of a pretrained ViT-L model, run the following command (possible adjusting batch size depending on your hardware):

This codebase is designed for training large-scale vision models using Cloud TPU VMs or GPU machines. It is based on Jax/Flax libraries, and uses tf.data and TensorFlow Datasets for scalable and reproducible input pipelines.

Training jobs are robust to interruptions and will resume seamlessly from the last saved checkpoint (assuming a user provides the correct --workdir path).

Finally, after installing all python dependencies and preparing tfds data, the user can run the job using config of their choice, e.g. to train ViT-S/16 model on ImageNet data, one should run the following command:

Note that big_vision is quite dynamic codebase and, while we intend to keep the core code fully-functional at all times, we can not guarantee timely updates of the project-specific code that lives in the .../proj/... subfolders. However, we provide a table with last known commits where specific projects were known to work.

Unless explicitly noted otherwise, everything in the big_vision codebase (including models and colabs) is released under the Apache2 license. See the LICENSE file for the full license text.

You may need a different jax package, depending on CUDA and cuDNN libraries installed on your machine. Please consult official jax documentation for more information.

If you use Google Cloud and, TPUs in particular, you can then upload the preprocessed data (stored in $TFDS_DATA_DIR) to "Google Cloud Bucket" and use the bucket on any of your (TPU) virtual machines to access the data.

We recommend preparing tfds data locally as described above and then uploading the data to Google Cloud bucket. However, if you prefer, the datasets which do not require manual downloads can be prepared automatically using a TPU machine as described below. Note that TPU machines have only 100 GB of disk space, and multihost TPU slices do not allow for external disks to be attached in a write mode, so the instructions below may not work for preparing large datasets. As yet another alternative, we provide instructions on how to prepare tfds data on CPU-only GCP machine.

For unified and reproducible access to standard datasets we opted to use the tensorflow_datasets (tfds) library. It requires each dataset to be downloaded, preprocessed and then to be stored on a hard drive (or, if you use "Google Cloud", preferably stored in a "GCP bucket".).

At Rokinon, we have a wide selection of Nikon F mounts for you to choose from. To modify your search for a Rokinon lens for Nikon F mounts, simply reference the left-sided column to filter by collections, format, focal length, and more.

The last known commit where the specific project code is expected to work. The core code and configs are expected to work at head.

For the full list of pre-trained models check out the load function defined in the same module as the model code. And for example config on how to use these models, see configs/transfer.py.

Next, ssh to the machine gcloud compute ssh $NAME_CPU_HOST --zone=$ZONE and follow instructions to format and mount the disk. Let's assume it was mounted to /mnt/disks/tfds.

By default we write checkpoints and logfiles. The logfiles are a list of JSON objects, and we provide a short and straightforward example colab to read and display the logs and checkpoints.

We provide a well-tuned ViT-S/16 baseline in the config file named vit_s16_i1k.py. It achieves 76.5% accuracy on ImageNet validation split in 90 epochs of training, being a strong and simple starting point for research on the ViT models.

big_vision aims to support research projects at Google. We are unlikely to work on feature requests or accept external contributions, unless they were pre-approved (ask in an issue first). For a well-supported transfer-only codebase, see also vision_transformer.

To create a single machine with 8 TPU cores, follow the following Cloud TPU JAX document: https://cloud.google.com/tpu/docs/run-calculation-jax

One of the many benefits of Nikon F mounts is the ability to customize your perception with interchangeable lenses. With this feature, users have many options for capturing images.

We first discuss how to setup and run big_vision on a (local) GPU machine, and then discuss the setup for Cloud TPUs. Note that data preparation step for (local) GPU setup can be largely reused for the Cloud TPU setup. While the instructions skip this for brevity, we highly recommend using a virtual environment when installing python dependencies.

Finally, prepare the dataset (e.g. coco_captions) using the utility script and copy the result to you google cloud bucket:

Project-specific code resides in the .../proj/... namespace. It is not always possible to keep project-specific in sync with the core big_vision libraries, Below we provide the last known commit for each project where the project code is expected to work.

We have a powerful configuration system, with the configs living in the configs/ directory. Custom trainers and modules can directly extend/modify the configuration options.

All models, evaluators and preprocessing operations live in the corresponding subdirectories and can often be reused between different projects. We encourage compatible APIs within these directories to facilitate reusability, but it is not strictly enforced, as individual projects may need to introduce their custom APIs.

To support large-scale vision research, more cores with multiple hosts are recommended. Below we provide instructions on how to do it.

The main entry point is a trainer module, which typically does all the boilerplate related to creating a model and an optimizer, loading the data, checkpointing and training/evaluating the model inside a loop. We provide the canonical trainer train.py in the root folder. Normally, individual projects within big_vision fork and customize this trainer.