Managing local and online repositories
The Repository
class is a helper class that wraps git
and git-lfs
commands. It provides tooling adapted
for managing repositories which can be very large.
It is the recommended tool as soon as any git
operation is involved, or when collaboration will be a point
of focus with the repository itself.
The Repository class
class huggingface_hub.Repository
< source >( local_dir: typing.Union[str, pathlib.Path] clone_from: typing.Optional[str] = None repo_type: typing.Optional[str] = None token: typing.Union[bool, str] = True git_user: typing.Optional[str] = None git_email: typing.Optional[str] = None revision: typing.Optional[str] = None skip_lfs_files: bool = False client: typing.Optional[huggingface_hub.hf_api.HfApi] = None )
Helper class to wrap the git and git-lfs commands.
The aim is to facilitate interacting with huggingface.co hosted model or dataset repos, though not a lot here (if any) is actually specific to huggingface.co.
__init__
< source >( local_dir: typing.Union[str, pathlib.Path] clone_from: typing.Optional[str] = None repo_type: typing.Optional[str] = None token: typing.Union[bool, str] = True git_user: typing.Optional[str] = None git_email: typing.Optional[str] = None revision: typing.Optional[str] = None skip_lfs_files: bool = False client: typing.Optional[huggingface_hub.hf_api.HfApi] = None )
Parameters
-
local_dir (
str
orPath
) — path (e.g.'my_trained_model/'
) to the local directory, where theRepository
will be initialized. -
clone_from (
str
, optional) — Either a repository url orrepo_id
. Example:"https://huggingface.co/philschmid/playground-tests"
"philschmid/playground-tests"
-
repo_type (
str
, optional) — To set when cloning a repo from a repo_id. Default is model. -
token (
bool
orstr
, optional) — A valid authentication token (see https://huggingface.co/settings/token). IfNone
orTrue
and machine is logged in (throughhuggingface-cli login
or login()), token will be retrieved from the cache. IfFalse
, token is not sent in the request header. -
git_user (
str
, optional) — will override thegit config user.name
for committing and pushing files to the hub. -
git_email (
str
, optional) — will override thegit config user.email
for committing and pushing files to the hub. -
revision (
str
, optional) — Revision to checkout after initializing the repository. If the revision doesn’t exist, a branch will be created with that revision name from the default branch’s current HEAD. -
skip_lfs_files (
bool
, optional, defaults toFalse
) — whether to skip git-LFS files or not. -
client (
HfApi
, optional) — Instance of HfApi to use when calling the HF Hub API. A new instance will be created if this is left toNone
.
Raises
- —
EnvironmentError
if the remote repository set inclone_from
does not exist.
- —
Instantiate a local clone of a git repo.
If clone_from
is set, the repo will be cloned from an existing remote repository.
If the remote repo does not exist, a EnvironmentError
exception will be thrown.
Please create the remote repo first using create_repo().
Repository
uses the local git credentials by default. If explicitly set, the token
or the git_user
/git_email
pair will be used instead.
Returns the current checked out branch.
add_tag
< source >( tag_name: str message: typing.Optional[str] = None remote: typing.Optional[str] = None )
Add a tag at the current head and push it
If remote is None, will just be updated locally
If no message is provided, the tag will be lightweight. if a message is provided, the tag will be annotated.
auto_track_binary_files
< source >(
pattern: str = '.'
)
→
List[str]
Automatically track binary files with git-lfs.
auto_track_large_files
< source >(
pattern: str = '.'
)
→
List[str]
Automatically track large files (files that weigh more than 10MBs) with git-lfs.
Checks that git
and git-lfs
can be run.
clone_from
< source >( repo_url: str token: typing.Union[bool, str, NoneType] = None )
Parameters
-
repo_url (
str
) — The URL from which to clone the repository -
token (
Union[str, bool]
, optional) — Whether to use the authentication token. It can be:- a string which is the token itself
False
, which would not use the authentication tokenTrue
, which would fetch the authentication token from the local folder and use it (you should be logged in for this to work).None
, which would retrieve the value ofself.huggingface_token
.
Clone from a remote. If the folder already exists, will try to clone the repository within it.
If this folder is a git repository with linked history, will try to update the repository.
Raises the following error:
ValueError
if an organization token (starts with “api_org”) is passed. Use must use your own personal access token (see https://hf.co/settings/tokens).EnvironmentError
if you are trying to clone the repository in a non-empty folder, or if thegit
operations raise errors.
commit
< source >( commit_message: str branch: typing.Optional[str] = None track_large_files: bool = True blocking: bool = True auto_lfs_prune: bool = False )
Parameters
-
commit_message (
str
) — Message to use for the commit. -
branch (
str
, optional) — The branch on which the commit will appear. This branch will be checked-out before any operation. -
track_large_files (
bool
, optional, defaults toTrue
) — Whether to automatically track large files or not. Will do so by default. -
blocking (
bool
, optional, defaults toTrue
) — Whether the function should return only when thegit push
has finished. -
auto_lfs_prune (
bool
, defaults toTrue
) — Whether to automatically prune files once they have been pushed to the remote.
Context manager utility to handle committing to a repository. This
automatically tracks large files (>10Mb) with git-lfs. Set the
track_large_files
argument to False
if you wish to ignore that
behavior.
Examples:
>>> with Repository(
... "text-files",
... clone_from="<user>/text-files",
... token=True,
>>> ).commit("My first file :)"):
... with open("file.txt", "w+") as f:
... f.write(json.dumps({"hey": 8}))
>>> import torch
>>> model = torch.nn.Transformer()
>>> with Repository(
... "torch-model",
... clone_from="<user>/torch-model",
... token=True,
>>> ).commit("My cool model :)"):
... torch.save(model.state_dict(), "model.pt")
delete_tag
< source >(
tag_name: str
remote: typing.Optional[str] = None
)
→
bool
Delete a tag, both local and remote, if it exists
git add
Setting the auto_lfs_track
parameter to True
will automatically
track files that are larger than 10MB with git-lfs
.
git_checkout
< source >( revision: str create_branch_ok: bool = False )
git checkout a given revision
Specifying create_branch_ok
to True
will create the branch to the
given revision if that revision doesn’t exist.
git_commit
< source >( commit_message: str = 'commit files to HF hub' )
git commit
git_config_username_and_email
< source >( git_user: typing.Optional[str] = None git_email: typing.Optional[str] = None )
Sets git username and email (only in the current repo).
Sets the git credential helper to store
Get URL to last commit on HEAD. We assume it’s been pushed, and the url scheme is the same one as for GitHub or HuggingFace.
Get commit sha on top of HEAD.
git_pull
< source >( rebase: bool = False lfs: bool = False )
Parameters
-
rebase (
bool
, optional, defaults toFalse
) — Whether to rebase the current branch on top of the upstream branch after fetching. -
lfs (
bool
, optional, defaults toFalse
) — Whether to fetch the LFS files too. This option only changes the behavior when a repository was cloned without fetching the LFS files; callingrepo.git_pull(lfs=True)
will then fetch the LFS file from the remote repository.
git pull
git_push
< source >( upstream: typing.Optional[str] = None blocking: bool = True auto_lfs_prune: bool = False )
Parameters
-
upstream (
str
, optional) — Upstream to which this should push. If not specified, will push to the lastly defined upstream or to the default one (origin main
). -
blocking (
bool
, optional, defaults toTrue
) — Whether the function should return only when the push has finished. Setting this toFalse
will return anCommandInProgress
object which has anis_done
property. This property will be set toTrue
when the push is finished. -
auto_lfs_prune (
bool
, optional, defaults toFalse
) — Whether to automatically prune files once they have been pushed to the remote.
git push
If used without setting blocking
, will return url to commit on remote
repo. If used with blocking=True
, will return a tuple containing the
url to commit and the command object to follow for information about the
process.
Get URL to origin remote.
Return whether or not the git status is clean or not
HF-specific. This enables upload support of files >5GB.
lfs_prune
< source >( recent = False )
Parameters
-
recent (
bool
, optional, defaults toFalse
) — Whether to prune files even if they were referenced by recent commits. See the following link for more information.
git lfs prune
lfs_track
< source >( patterns: typing.Union[str, typing.List[str]] filename: bool = False )
Tell git-lfs to track files according to a pattern.
Setting the filename
argument to True
will treat the arguments as
literal filenames, not as patterns. Any special glob characters in the
filename will be escaped when writing to the .gitattributes
file.
lfs_untrack
< source >( patterns: typing.Union[str, typing.List[str]] )
Tell git-lfs to untrack those files.
list_deleted_files
< source >(
)
→
List[str]
Returns
List[str]
A list of files that have been deleted in the working directory or index.
Returns a list of the files that are deleted in the working directory or index.
push_to_hub
< source >( commit_message: str = 'commit files to HF hub' blocking: bool = True clean_ok: bool = True auto_lfs_prune: bool = False )
Parameters
-
commit_message (
str
) — Message to use for the commit. -
blocking (
bool
, optional, defaults toTrue
) — Whether the function should return only when thegit push
has finished. -
clean_ok (
bool
, optional, defaults toTrue
) — If True, this function will return None if the repo is untouched. Default behavior is to fail because the git command fails. -
auto_lfs_prune (
bool
, optional, defaults toFalse
) — Whether to automatically prune files once they have been pushed to the remote.
Helper to add, commit, and push files to remote repository on the HuggingFace Hub. Will automatically track large files (>10MB).
tag_exists
< source >(
tag_name: str
remote: typing.Optional[str] = None
)
→
bool
Check if a tag exists or not.
Blocking method: blocks all subsequent execution until all commands have been processed.
Helper methods
huggingface_hub.repository.is_git_repo
< source >(
folder: typing.Union[str, pathlib.Path]
)
→
bool
Check if the folder is the root or part of a git repository
huggingface_hub.repository.is_local_clone
< source >(
folder: typing.Union[str, pathlib.Path]
remote_url: str
)
→
bool
Check if the folder is a local clone of the remote_url
huggingface_hub.repository.is_tracked_with_lfs
< source >(
filename: typing.Union[str, pathlib.Path]
)
→
bool
Check if the file passed is tracked with git-lfs.
huggingface_hub.repository.is_git_ignored
< source >(
filename: typing.Union[str, pathlib.Path]
)
→
bool
Check if file is git-ignored. Supports nested .gitignore files.
huggingface_hub.repository.files_to_be_staged
< source >(
pattern: str = '.'
folder: typing.Union[str, pathlib.Path, NoneType] = None
)
→
List[str]
Returns a list of filenames that are to be staged.
huggingface_hub.repository.is_tracked_upstream
< source >(
folder: typing.Union[str, pathlib.Path]
)
→
bool
Check if the current checked-out branch is tracked upstream.
huggingface_hub.repository.commits_to_push
< source >(
folder: typing.Union[str, pathlib.Path]
upstream: typing.Optional[str] = None
)
→
int
Check the number of commits that would be pushed upstream
The name of the upstream repository with which the comparison should be made.
Following asynchronous commands
The Repository
utility offers several methods which can be launched asynchronously:
git_push
git_pull
push_to_hub
- The
commit
context manager
See below for utilities to manage such asynchronous methods.
class huggingface_hub.Repository
< source >( local_dir: typing.Union[str, pathlib.Path] clone_from: typing.Optional[str] = None repo_type: typing.Optional[str] = None token: typing.Union[bool, str] = True git_user: typing.Optional[str] = None git_email: typing.Optional[str] = None revision: typing.Optional[str] = None skip_lfs_files: bool = False client: typing.Optional[huggingface_hub.hf_api.HfApi] = None )
Helper class to wrap the git and git-lfs commands.
The aim is to facilitate interacting with huggingface.co hosted model or dataset repos, though not a lot here (if any) is actually specific to huggingface.co.
Returns the asynchronous commands that failed.
Returns the asynchronous commands that are currently in progress.
Blocking method: blocks all subsequent execution until all commands have been processed.
class huggingface_hub.repository.CommandInProgress
< source >( title: str is_done_method: typing.Callable status_method: typing.Callable process: Popen post_method: typing.Optional[typing.Callable] = None )
Utility to follow commands launched asynchronously.