EN

Harbor - A Complete Cloud Native Registry

-

Getup


This article provides an overview of the knowledge, information, and instructions on how to implement Harbor in a Docker container environment orchestrated with Kubernetes.



Introduction



It is interesting to note what is said on the official Harbor website:



“Our mission is to be the most secure, performant, scalable, and available cloud native repository for Kubernetes” - Our mission is to be the most secure, best performing, scalable, and available cloud native repository for Kubernetes (Free Translation)



Being a Docker Registry is one of the most mature and complete features of Harbor, but its possibilities go beyond that! According to the documentation, it is also possible to use it as a repository for Helm Charts (ChartMuseum), replicate in Registries As Service on public Cloud providers, create automated jobs for Image deletion & garbage collection, RBAC (permissions, users, groups, and projects structure), integrate with ActiveDirectory/LDAP/OIDC, perform Vulnerability Scanning (Clair), and integrate with various security solutions (OPAAqua SecurityHashiCorp Vault, among others).



Pretty cool, huh? 🙃



Architecture



From all those I've seen, I believe the diagram below best describes Harbor's current architecture. (version 1.10 up to the creation of this article)





As noted in the figure, it is interesting to highlight that Harbor has several OpenSource components, with the possibility of decoupling and using these components “As Services” in cloud providers.



Installation



Harbor provides an official Helm Chart for its installation. You can add it to your local helm using the command below:



helm repo add harbor https://helm.goharbor.io



You can find the default values.yaml file in the Chart repository, but I will discuss the most important fields and configurations for an initial installation below.



Service Exposure



You can expose Harbor using the most traditional Kubernetes strategies (ingress, clusterIp, nodePort, loadBalancer). By default, it uses the Ingress strategy. In this case, you need to specify the configurations of the Ingress Controller already existing in the environment, or a Cloud Computing provider service. You can adjust these settings in the “expose:{}” block according to the chosen strategy.



If you do not use the “ingress” option, an Nginx Ingress Controller will be provisioned. You can customize this service's configuration in the “nginx:{}” block, and I recommend using the default values.yaml as a reference.



URLs



Another fundamental point is to specify a value for “externalURL:{}”. According to the documentation, this URL will be used by the Harbor core service to:



1) populate the docker/helm commands showed on portal
2) populate the token service URL returned to docker/notary client



The format is quite standard for URLs:



protocol://domain[:port]



Best practices indicated in the documentation:



1) If “expose.type” is “ingress”, the “domain” must have the value of “expose.ingress.hosts.core”
2) If “expose.type” is “clusterIP”, the “domain” must have the value of “expose.clusterIP.name”
3) If “expose.type” is “nodePort”, the “domain” must be the IP address of the k8s node.



# If Harbor is installed behind a proxy, configure “domain” as the proxy URL or IP.



It is also necessary to specify the URLs for the Notary and Core services; I will talk more about them, but you need to indicate the URLs in “expose.ingress.hosts.core:{}” and “expose.ingress.hosts.notary:{}



TLS



Especially for the Registry service, configuring TLS is essential. If there is no TLS implementation in your own Ingress Controller, or if you use a specific domain for Harbor, you need to specify a secret containing the *.crt and *.key files in the “expose.tls.secretName:{}” field.



Persistence



Harbor is compatible with several types of storages, and you can configure these options in the “persistence:{}” block, indicating PVC configurations for each service that requires persistence (Redis, Registry, Charmuseum, etc.).



Data Storage (PVCs/PVs)



Note that, by default, the “persistence.resourcePolicy:{}” field is set to “keep”, so PVCs are not removed when the helm chart is deleted. This raises an important point for data preservation; in the tests I conducted, the default value “keep” for this field indeed retains the PVCs during chart deletion. However, when reinstalling the chart, Helm returns an error stating that the volumes already exist. The strategy I adopted was to create the PVCs during installation (or manually) and specify their respective names in values.yaml. This way, the reinstallation happens without issues and the data remains intact.



Other Types of Backends



It is also possible to use other types of storage “backends” in addition to the default, which is the “filesystem” with PVCs. In “persistence.imageChartStorage:{}” you will find the fields to configure other backends, such as s3, azure, gcs, swift and others.



Database (PostgreSQL / Redis)



Harbor needs to store metadata related to its objects, such as projects, users, roles, replication policies, tag retention policies, scanners, charts, and images. For this, PostgreSQL is used. The chart includes the installation of its own image (“goharbor/harbor-db”) of PostgreSQL via the StatefulSet controller. You can customize its settings inside the “database:{}” block, and it is also possible to use an external database by simply indicating the value “external” in “database.type{}” and passing the access details inside the “database.external{}” block.



You can do the same with Redis, inside the “redis:{}” block.



Core Services



As I mentioned in the architecture section, Harbor has some internal services; they are responsible for specific routines and have configuration blocks inside the values.yaml.



Core is a service responsible for Authentication, configuration management, quotas, projects, and many other features. However, it is not possible to configure any of this via the Helm Chart, only in the interface after installing Harbor. This worries me a bit for resilience reasons, as the settings will be stored in the database, requiring extra attention on it.



You can perform basic configurations regarding core deployment inside the “core:{}” block in the values.yaml.



Just like core, the same goes for the other services. The configurations possible in the values.yaml are basic, aimed only at deploying the applications.



User Interface



One of the most interesting aspects of Harbor is providing a very comprehensive graphical user interface, which gives users greater autonomy and confidence while using the features.







Harbor and Docker Images





Interaction with Harbor via CLI works like any docker registry (Docker Hub for example), with traditional PUSH and PULL possibilities. The key focus areas are the management and security features of the images hosted on it.



As soon as an image is added to Harbor, it undergoes Security Scans that map open CVEs for vulnerabilities found in the image. Harbor utilizes solutions pre-available in the installation, such as Clair and/or Trivy.





Vulnerability Scans (Clair / Trivy / Notary)



Clair is an Open Source project from CoreOS for the static vulnerability analysis of appc and docker containers.



Trivy (tri-pronounced like trigger, vy pronounced like envy) is a simple and comprehensive vulnerability scanner for containers. Currently, Trivy is part of the  Aqua Security stack, but you can use it individually, or embedded in solutions like Harbor.



Both Clair and Trivy detect vulnerabilities in OS packages (Alpine, RHEL, CentOS etc.) and application dependencies (Bundler, Composer, npm, yarn etc.).



On Harbor's official channel, there is a demonstration on using these Scans.





Signed Docker Images (Notary)



An important strategy is “Docker Content Trust”, which is mentioned in the docker documentation under the guidance of using it with Notary. The Notary is a tool to deliver and manage collections of trusted content. The premise is that an image is digitally signed, and anyone consuming it can verify the integrity and origin of the content. This resource is built on a direct key management and signing interface to create signed collections and configure trusted publishers.



The most basic operation is listing the signed tags in a repository. The Docker documentation shows a basic example where it is possible to relate some images present on Docker Hub with signatures that already exist on the public Notary server.





You can understand the architecture of this service better through the documentation; it's a good read.



Conclusion



There are many interesting solutions for image repositories, but Harbor presents itself as one of the most complete. Its structure, installation, and the tools that compose it reflect a Cloud Native nature, without reinventing the wheel, utilizing consolidated market solutions for storage and security tasks. This helps enormously in charting a management and high-availability strategy on top of Harbor.



 Author: Bruno S. Brasil





Newsletter Getup.

Atualizações sobre Kubernetes e Software Supply Chain Security todos os meses.

Operating Kubernetes in production for more than 13 years. With Quor, this experience extends to software supply chain security as well.