Introduction
This is Part 3 of a blog series about Google Cloud Professional Cloud Architect certification. Part 1 focuses on my experience—how I prepared and what the exam was like the first time. Part 2 covers my renewal experience with the AI-centred exam. Part 4 includes my notes on case studies.
These are my personal study notes for the Google Cloud Professional Cloud Architect certification, based on public Google Cloud documentation and learning resources. They are rewritten in my own words and do not contain real exam questions. Some are just bullet points, while others are more detailed. They’re not exhaustive, but I think they are mainly sufficient. As I mentioned in Part 1, I primarily worked through example questions and case scenarios, identified my blind spots, read the documentation, and created these notes. Before starting Part 2, I reviewed them and added additional content on AI. For now it's a copy of my Notion doc, but I think I will do some formatting.
Compute and Storage (VMs, GKE, Cloud Storage, Filestore...)
Compute engine (VMs)
-
To change CPU/RAM -> need to STOP vm, increase size of a disk -> no need
-
Access scopes are used only with the Compute Engine Default service account
-
Compute Engine instance is associated with only one service account
-
Preemptible (Spot) VM instances are cheaper, but can be stopped by Google -> perfect for batch jobs that can be rerun.
-
2 vCPUs is equal to 1 core (2 hyperthreads)
-
- MIG (managed instance group) - identical vm, autoscaling, autohealing, regional, automatic updates, flexible range of rollout scenarios (rolling updates and canary updates). Based on the configuration that you specify in an instance template and optional stateful configuration ( preserve each instance's unique state (instance name, attached persistent disks, and metadata) on machine restart, recreation, auto-healing, and update events)
-
Managed instance group health checks proactively signal to delete and recreate instances that become
UNHEALTHY. Not the same as Load balancing health checks that help direct traffic away from non-responsive instances and toward healthy instances; these health checks do not cause Compute Engine to recreate instances (need to be less aggressive). -
2,000 VMs in a regional MIG and 1,000 VMs in a zonal MIG
-
Depending on the type of load balancer you choose, you can add instance groups to a target pool or to a backend service
-
Scale the group based on CPU utilization, load balancing capacity, Cloud Monitoring metrics, schedules, or, for zonal MIGs, by using a queue-based workload like Pub/Sub.
-
The instance template defines the VPC network and subnet that member instances use.
- Unmanaged instance groups do not offer autoscaling, autohealing, rolling update support, multi-zone support, or the use of instance templates and are not a good fit for deploying highly available and scalable workloads. Use unmanaged instance groups if you need to apply load balancing to groups of heterogeneous instances, or if you need to manage the instances yourself. You can add up to 2,000 VMs to a group.
GKE
- Autopilot Google manages infrastructure:
- Google provisions, configures, scales and manages the cluster’s compute nodes
- Google secures cluster with best practices for example restricts Low-Level Node Access from privileged pods.
- Google manages updates
- you pay for requested resources by pods ( per-pod billing), not for pool of nodes
- GKE Enterprise edition offers all of the above, plus management, governance, security, and configuration for multiple teams and clusters—all with a unified console experience and integrated service mesh
- GKE supports GPUs and TPUs and makes it easy to run ML, GPGPU, HPC,
- GKE Enterprise (Anthos) offers a set of capabilities that helps you and your organization (from infrastructure operators and workload developers to security and network engineers) manage clusters, infrastructure, and workloads, on Google Cloud and across public cloud and on-premises environments.
- Backup for GKE
- Manage, observe, and secure your services with Google’s implementation of the powerful Istio. Simplify traffic management and monitoring with a fully managed service mesh. Anthos Service Mesh is Google’s fully-supported distribution of Istio
- Integrated logging and monitoring
- Auto - upgrade, repair, scale
Cloud Run (Next gen app engine).
- k8s without configuration overhead.
- Web, API, microservices
- Multi-region load-balancing, Committed use discounts (CUDs).
- Any container, Automatic scaling
- Cloud Run Jobs - executes container until completion (not request-driven) batch jobs / background processing
App Engine
- Platform as a Service (PaaS) before v0 of Cloud Run
- web, API, microservices
- standard: sandboxed runtime (can scale to zero) and flexible: does not scale to zero
Cloud Functions
- Event-driven serverless functions paid per execution
- Real-time stream processing, AI, IoT, ETL, Webhooks, lightweight
- Gen 2 based on Cloud Run, better scaling, higher concurrency
Storages
Disks
- zonal or regional
- Persistent: standard (HDD, cheap, slow, backups); balanced (SSD, general usage); persistent (high perf SSD, for DBs and analytics); extreme (++ perf, DBs)
- Hyperdisk (new generation) balanced or extreme (fully customizable, even IOPS and throughput, performance is independent of provisioned capacity)
- snapshots and machine images
Filestore 99.99%, regional
- fully managed Network Attached Storage (NAS). Connect multiple VMs, shared file system for unstructured data; > 100,000 IOPS, 10 GB/s throughput, 100 TBs. Used for high performance computing (HPC), media processing, electronics design automation (EDA), application migrations, web content management, life science data analytics, and more.
Cloud Storage not a file system, no updates, object storage
- STANDARD, NEARLINE - 30 days, COLDLINE - 90, ARCHIVE - 365
- Lifecycle Management example: after 30 days move to NEARLINE, after a year delete → reduce costs, keep 3 recent copies.
- use CMEK (Customer-Managed Encryption Keys) if you want to manage your own keys
Data Transfer
- Storage Transfer Service > 1TB, from other cloud providers, online resources, or local data (for example: S3, Blob, Data Lake, on-premises file systems). Incremental transfer, Data encryption and validation, Metadata preservation
- Transfer Appliance > 20 TBs and up to 1 petabyte, is a hardware appliance you can use to migrate large volumes of data (recommended for data that exceeds 20 TBs and up to 1 petabyte) to Google Cloud without disrupting business operations. Ask for appliance, receive by postal service, load, send, Google uploads.
- Image Import lets you import virtual disks in your on-premises environment with software and configurations that you need (a.k.a. golden disks or golden images) into Google Cloud and uses the resulting image to create virtual machines. The tool supports most virtual disk file formats, including VMDK and VHD.
DBs like (Cloud SQL, Spanner, Bigquery, Bigtable, Firestore, Memorystore)
Relational
-
Cloud SQL (CRM, Web, SaaS) - including MySQL, PostgreSQL, and SQL Server with rich support for extensions, configuration flags, and popular developer tools, 99.99% availability SLA, regional — Database Migration Service (DMS) - migrate it securely and with minimal downtime. Automatic replication, backup, and failover. Cloud SQL for MySQL also leverages flash memory on your database instance to lower read latency and improve throughput by intelligently caching data across memory and high speed storage. Configure where your data is stored to comply with data residency requirements. Cloud SQL Auth Proxy good practice.
-
Spanner (gaming, banking, retail, supply chain/inventory management) > 100,000 ops, up to 3 billion requests per second, scale horizontally to petabytes, globally, expensive, automatic sharding, replication, dynamic schema, strongly consistent (True Time timestamps), ACID transactions—at any scale. Use Google Standard SQL or a PostgreSQL interface. 99.999% SLA.
-
Bigquery Serverless data warehouse up to petabytes. Analytics. Built-in ML. Columns. Multi-regional. SQL interface. Sub-second latency. Partition (to reduce cost) and Cluster (for performant filtering).
Key Value:
- Bigtable (HBase or Cassandra.). Wide-column store. 5 billion RPS, petabytes. Personalisation, Adtech, Recommendation, Fraud detection, IoT. Zonal. Integrates with Hadoop, Dataflow, Dataproc. Analytical and Operational Workloads. Real-time analytics like personalization and recommendations, used for Spotify’s Discover Weekly
Document:
- Firestore next gen of Datastore. Multi-Regional. ACID. Replace MongoDB (but less queryable), Strong consistency. User profiles, catalogs...
InMemory (Gaming, Caching, Leaderboard, Social chat, news feed):
- Memorystore (Redis Cluster, Redis, and Memcached) Redis Cluster provides zero-downtime scaling up to 250 nodes, terabytes of keyspace, and 60x more throughput with microsecond latencies. Memorystore for Redis standalone Read Replicas along with Redis 7 allow applications to scale read requests to more than a million read requests. Memorystore for Redis and Memcached enable scaling on-demand with minimal downtime.
Networks are a little dense, too many details.
Networks (VPC, Load Balancing, Network Connectivity...)
Network
- Subnets cross zones, VMs in different zones on the same subnet can communicate and share firewall rules
- Google reserves 4 IP addresses in every subnet → the first two and the last two
- Network – First address in the CIDR range (x.x.x.0)
- Gateway – Second address in the CIDR range (x.x.x.1)
- Second-to-last – Reserved for future use – in CIDR range (x.x.x.254)
- Broadcast – Last address in the CIDR range (x.x.x.255)
- Private IP space can’t overlap in peered VPCs or with connected on-prem networks
VPC
- networks, including their associated routes and firewall rules, are global resources. (not associated with any particular region or zone)
- auto and custom. For auto - on project creation, can be disabled, create subnet in every region with 10.128.0.0/9 and with defaults firewalls (ssh, internal, rdp, icmp). Limits: the subnets of every auto mode VPC network use the same predefined range of IP addresses. Two auto-mode VPC networks cannot be peered because their subnet ranges overlap (10.128.0.0/9). Auto mode VPC networks do not support dual-stack subnets, so no IPv6 ranges
RFC 1918 10.0.0.0 - 10.255.255.255 (10/8 prefix) class A 172.16.0.0 - 172.31.255.255 (172.16/12 prefix) class B 192.168.0.0 - 192.168.255.255 (192.168/16 prefix) class C
- routes and firewall rules
- routes - define the paths that network traffic takes from a vm. A route consists of a single destination prefix in CIDR format and a single next hop
VPC Service Controls
- reduces data exfiltration risks, ensures sensitive data can only be accessed from authorized networks, restrict resource access to allowed IP addresses, identities, and trusted client devices, control which Google Cloud services are accessible from a VPC network.
Network Connectivity Center
- offers the unique ability to easily connect your on-premises, Google Cloud, and other cloud enterprise networks and manage them as spokes through a single, centralized, logical hub on Google Cloud.
Connect networks.
- one host project shares its VPC with service projects in the same organization. Use private IP. Centralized firewall rules, routes, and NAT live in the host project. Service projects cannot create subnets
- projects of different organisations. Uses internal IP only. No overlapping CIDR. Does not provide transitive routing (A ↔ B and B ↔ C does NOT allow A ↔ C.), and you can't connect two auto mode VPC networks. You can connect a custom mode VPC network to an auto mode VPC network as long as the custom mode VPC network doesn't have any subnet IP address ranges that fit within 10.128.0.0/9. VPC Network Peering doesn't exchange VPC firewall rules or firewall policies. Peering uses Google's private backbone network. Traffic never goes over the public internet.
- Network connectivity - connect GCP VPC with on-premises network or another cloud (hybrid)
- Cloud VPN - low cost, lower bandwidth, 1.5–3.0 Gbps over an encrypted IPsec tunnel over public internet, static or dynamic routing.
- HA (high availability) VPN (IPsec tunnel), 99.99% or 99.9% - internet connection
- Classic VPN
- Cloud interconnect:
- Dedicated Interconnect - highest bandwidth (10 or 100Gbps), Requires colocation facility.
- Partner Interconnect: less than 10Gb, you don't need to install and maintain routing equipment in a colocation facility.
- Cross Cloud Interconnect Private connectivity to AWS, Azure, OCI, no public internet path.
- Cloud Router - dynamic routing using Border Gateway Protocol (BGP) over Cloud Interconnect and VPN gateways (Required for Interconnect + HA VPN)
- Peering: connecting to Google Workspace and Google API (Services), no SLA, outside of GCP, does not provide direct access to VPC network resources that have only internal IP addresses.
- Direct Peering - via Google Edge connection (100 locations in 33 countries, Physical connection to Google at an IXP, Requires data center presence)
- Carrier Peering - via provider ISP (Public service access (not private VPC No private IP access))
- Connecting sites using GCP using Google network as WAN
- CDN interconnect – populate CDN from GCP
- Private Service Connect (PSC) - Expose one private app to another org, No public IP, Service-level only. Allows private access to services across VPCs, organizations, or Google services using private IP.
- Cloud VPN - low cost, lower bandwidth, 1.5–3.0 Gbps over an encrypted IPsec tunnel over public internet, static or dynamic routing.
| Scenario | Solution |
|---|---|
| Same org, many projects | Shared VPC |
| Different orgs, private | VPC Peering / PSC |
| Hybrid low latency | Interconnect |
| Hybrid quick & cheap | Cloud VPN |
| Only Google APIs | Peering |
| Global WAN | NCC |
Cloud Load Balancing
No pre-warming needed. Distribute load-balanced compute resources in single or multiple regions, and near users, while meeting high-availability requirements.
Google Cloud HTTP(S) load balancing is implemented at the edge of Google's network in Google's points of presence (POP) around the world. User traffic directed to an HTTP(S) load balancer enters the POP closest to the user and is then load-balanced over Google's global network to the closest backend that has sufficient available capacity.
Features
- Single anycast IP address, users connect to the nearest Google edge POP automatically.
- Software-defined load balancing - distributed and not locked to device
- Autoscaling – no pre-warming, scale from zero fast
- Layer 4 and Layer 7 load balancing. Use Layer 4-based load balancing to direct traffic based on data from network and transport layer protocols such as TCP, UDP, ESP, GRE, ICMP, and ICMPv6 . Use Layer 7-based load balancing to add request routing decisions based on attributes, such as the HTTP header and the uniform resource identifier.
- External and internal load balancing.
- Global (HTTP(S), SSL Proxy, TCP Proxy) and regional load balancing (Internal L7, Internal TCP/UDP, External passthrough NLB)
Advanced feature support. Cloud Load Balancing supports features such as IPv6 global load balancing, source IP-based traffic steering, weighted load balancing, WebSockets, user-defined request headers, and protocol forwarding for private virtual IP addresses (VIPs).
Integrations with:
- Cloud CDN for cached content delivery at Google edge POPs. Cloud CDN is supported with the global external Application LB and the classic Application LB.
- Google Cloud Armor to protect your infrastructure from distributed denial-of-service (DDoS) attacks and other targeted application attacks. Always-on DDoS protection is available for the global external Application Load Balancer, the classic Application Load Balancer, the external proxy Network Load Balancer, and the external passthrough Network Load Balancer. Additionally, Google Cloud Armor supports advanced network DDoS protection only for external passthrough Network Load Balancers. For more information, see Configure advanced network DDoS protection.
Types
- Application Load Balancers are proxy-based Layer 7 external and internal (you can specify how traffic is routed with URL maps)
- Network Load Balancers - Network Load Balancers are Layer 4 load balancers that can handle TCP, UDP, or other IP protocol traffic.
- Proxy Network Load Balancers external and internal
- Passthrough Network Load Balancers - These load balancers distribute traffic among backends in the same region as the load balancer. Load-balanced packets are received by backend VMs with the packet's source and destination IP addresses, protocol, and, if the protocol is port-based, the source and destination ports unchanged. Load-balanced connections are terminated at the backends. Responses from the backend VMs go directly to the clients, not back through the load balancer. The industry term for this is direct server return (DSR)
Cloud CDN
Content delivery for websites and applications served out of Compute Engine by leveraging Google's globally distributed edge caches. Cloud CDN lowers network latency, offloads origins, and reduces serving costs. Once you’ve set up Application Load Balancer, simply enable Cloud CDN with a single checkbox. Cloud CDN is the right choice for workloads that tend to deliver very small objects at high rates, such as adtech and e-commerce platforms. Its strength is serving static web content, such as JavaScript, CSS, fonts, and inline images
Media CDN.
Serve streaming video, downloads, or other high-throughput content. Do not use CDN for Real-Time Messaging Protocol (RTMP), WebRTC, or WebSockets. Do not serve sensitive workloads.
Cloud NAT.
Allows VMs without external IPs to access the internet for updates or downloads. Outbound only. Managed service. Works with Cloud Router.
AUTHs (IAM, Roles, Principals, IAP)
Authentication methods at Google
Authorization – IAM
who - principal has an access - role ( roles/viewer, roles/service.roleName: role/pubsub.publisher, roles/appengine.appAdmin) that contains a set of permissions (appengine.applications.listRuntimes, appengine.applications.create) on what resource.
-
Types of principals:
- user:user@gmail.com
- group:admins@example.com
- serviceAccount: sa@project.iam.gserviceaccount.com
- domain:example.com
- allAuthenticatedUsers
- allUsers
-
Types of Roles:
- basic (viewer, editor, owner) – Legacy, too broad, not recommended
- predefined (role/pubsub.publisher) – recommended
- custom – when you cannot use predefined
-
It is recommended to make use of Group and grant roles over granting permissions to individual users.
-
Allow policies grant permissions. Deny policies explicitly block permissions even if another role grants them.
-
IAM permissions inherit down the resource hierarchy: Organization -> Folder -> Project -> Resource
"role": "roles/storage.objectAdmin", "members": [ "user:ali@example.com", "serviceAccount:my-other-app@appspot.gserviceaccount.com", "group:admins@example.com", "domain:google.com" ]
Cloud Identity is the identity provider (IdP) for Google Cloud. It also is the Identity-as-a-Service (IDaaS) solution that powers Google Workspace.
Identity-Aware Proxy (IAP) lets you establish a central authorization layer for applications accessed by HTTPS. Controls access to applications based on user identity instead of network location. Users authenticate with Google identity before reaching the application.
Cloud Armor Layer 7 security; protects against DDoS, XSS, SQL injection, IP filtering, WAF rules.
OPs (Logs, Metrics, Traces)
Google Cloud's operations suite (formerly Stackdriver) – maintains application reliability and meets your current objectives for availability and resilience to failures.
-
Metrics (Monitoring) - health or performance measured at regular intervals over time (CPU, requests latency): alerts, define SLO, visualisation. System metrics from gcp services and VMs (with Ops Agent), user-defined metrics, log-based metrics, external, Prometheus.
- SLO = Service Level Objective
- SLI = Service Level Indicator
- SLA = Service Level Agreement
-
Logs - records of systems or application events in time, visualise them with IAM permission
- Types: Platform, Component, Security (Cloud Audit logs), User-written logs (Ops Agent or the Logging agent, Cloud Logging API, Cloud Logging client libraries.)
- Interfaces: Logs Explorer – individual log entries. Or Log Analytics interface - stats on logs: average latency for HTTP requests issued to a specific URL over time
- log-based alerts and log-based metrics (convert log entries into Monitoring metrics, e.g. count of HTTP 500)
Integrated logging agent by default in Google Kubernetes Engine, App Engine flexible environment, and Cloud Functions.
Outside of Google Cloud - BindPlane by observIQ
-
Traces - trace path of a request across the parts of your distributed application. Useful for microservices architectures.
-
Other data - context of traces, or logs (severity of an alert, customer ID)
Other Services (Pub/sub, Artifact Registry, Cloud Composer, Dataflow, Dataproc ... )
-
Pub/sub - use cases combines the horizontal scalability of Apache Kafka and Pulsar with features found in messaging middleware such as Apache ActiveMQ and RabbitMQ
-
- Migrate to Virtual Machines (formerly Migrate for Compute Engine) from VM to VM lift and shift
- Migrate to Containers replatform
- Database Migration Service MySQL, PostgreSQL, and Oracle —> Cloud SQL, AlloyDB
-
- extends capabilities of Container Registry, it is a fully-managed service with support for both container images and non-container artifacts.
- you can store multiple artifact formats, including OS packages for Debian and RPM, as well as language packages for popular languages like Python, Java, and Node
- granular permission model with Cloud IAM
-
- Manage, observe, and secure your services without having to change your application code
- Google's implementation of the Istio open-source project
- commonly with k8s, multi-cluster Kubernetes environments
-
Cloud Composer ( Apache Airflow )
- managed workflow orchestration service (data orchestration). Use Python scripts to schedule, monitor pipelines spanning hybrid and multi-cloud environments, with a strong focus on finite batch-oriented tasks. A workflow represents a series of tasks for ingesting, transforming, analyzing, or utilizing data. In Airflow, workflows are created using directed acyclic graphs (DAGs). To run workflows, you first need to create an environment. Airflow depends on many microservices to run, so Cloud Composer provisions Google Cloud components to run your workflows. These components are collectively known as a Cloud Composer environment. Environments are self-contained Airflow deployments based on Google Kubernetes Engine. They work with other Google Cloud services using connectors built into Airflow.
-
Dataflow (Apache Beam, Flink and Spark)
- serverless, fast and cost-effective service that supports both stream and batch processing (use Apache Beam** libraries Java or Python and removes operational overhead from your data engineering teams by automating the infrastructure provisioning and cluster management). Once you have created your pipeline, you can use Dataflow to deploy and execute that pipeline, which is called a Dataflow job. Dataflow then assigns the worker virtual machines to execute the data processing. You can customize the shape and size of these machines. If your traffic pattern is spiky, Dataflow autoscaling automatically increases or decreases the number of worker instances required to run your job. Dataflow streaming engine separates compute from storage and moves parts of pipeline execution out of the worker VMs and into the Dataflow service backend. This improves autoscaling and data latency! Dataflow also guarantees exactly-once processing of every record. Typical use cases for Dataflow include the following:
- Data movement: Ingesting data or replicating data across subsystems.
- ETL (extract-transform-load) workflows that ingest data into a data warehouse such as BigQuery.
- Powering BI dashboards.
- Applying ML in real time to streaming data.
- Processing sensor data or log data at scale.
-
Dataproc (Apache Spark, Hadoop, Hive, Pig, Presto)
- Fully managed Apache Spark and Hadoop service on Google Cloud.
- Raw data → Dataproc processing (Spark/Hadoop) → Output to BigQuery / Cloud Storage
- Used for big data processing and large-scale data analytics.
- Run familiar open-source tools without managing cluster infrastructure manually.
-
Dataprep
- Visual data preparation and transformation tool (UI excel) connects to BigQuery, Cloud Storage, Google Sheets, and hundreds of other cloud applications and traditional databases so you can transform and clean any data you want. In the UI, visually explore, clean, and prepare big data. Uses Dataflow or BigQuery under the hood, enabling you to process structured or unstructured datasets of any size with the ease of clicks, not code.
- Raw data → Dataprep cleaning → Structured dataset → BigQuery analytics
-
The Transcoder API allows you to convert video files and package them for optimized delivery to web, mobile and connected TVs, create consumer streaming formats, like MPEG-4 (MP4), Dynamic Adaptive Streaming over HTTP (DASH, also known as MPEG-DASH), and HTTP Live Streaming (HLS). Encode or decode a digital data stream with multiple techniques (codecs)
These notes were added to renew the certification exam, concentrated around AI. Not exhaustive but gives an overview.
AI (Vertex AI, Endpoints, Registry, Vector Search etc.)
Vertex AI → Vai
- AI platform to ingest, train, register, deploy, scale, monitor
- Generative AI (generate content) and Traditional ML (classification, clustering)
- Serverless, managed, secured, integrated with GCP (BigQuery, Storage, etc.)
- Components:
- Vai Workbench:
- Jupyter notebooks, managed or user-managed for exploration, experimentation, prototyping and testing. Hosted on configurable VM.
- input: data from GCP (BigQuery, Storage, streams) → output code, exploratory results, pipeline definitions.
- Colab Enterprise
- Easy, managed collaborative notebook space for teamwork, minimal setup
- Vai Training
- train without infrastructure management
- AutoML or custom (your code):
- AutoML: higher cost per hour/node, 100M per row, might not support specialized model architectures. Train models without writing ML code. Best for structured data, images, text, and video.
- custom: Vertex Hyperparameter Tuning, metrics to V TensorBoard, can be part of V Pipelines
- input: data from GCP (BigQuery, Storage, streams) → output model artifacts, V “model” or model registry
- Vai Model Registry
- artifact store for models (stored in Storage), hub between training and serving.
- version, organise, deploy
- inputs: trained models → output: model with metadata
- integrated with V Endpoints, V Pipeline, V Experiments, Data Catalog
- Vai Endpoints
- real-time serving, online prediction (real-time inference), scale, load balance, versioning.
- logging, prediction logging, explanation (feature scores), IAM-based or VPC integration (private endpoint, or Private Service Connect), canary deployment, A/B testing
- input: HTTPS, RPC prediction request → response
- Vai Batch Prediction
- job-based, offline prediction for large datasets.
- model of AutoML, Custom or Bigquery
- input as usual (bigquery, storage, streams) → Cloud Storage (CSV/JSON with model outputs) or BigQuery
- Vai Feature Store
- manage feature lifecycle and consistency (training and serving)
- for online (Bigtable) or batch serving (Bigquery)
- Vai ML Metadata
- Stores metadata about ML workflows: datasets, models, pipelines, experiments, artifacts, executions, contexts of lifecycle to reproduce
- managed MLMD lib
- Vai Experiments
- user-friendly layer on top of ML Metadata
- Tracks model iterative development and compares results.
- input: training job runs
- Vai Pipelines
- serverless runner
- Automates ML workflows such as data prep → training → evaluation → deployment.
- Kubeflow Pipelines (KFP) SDK or TensorFlow Extended (TFX) DSL
- runs and artifacts are recorded in ML Metadata automatically
- Pipelines are not interactive, stateless
- Vai Model Monitoring
- detects training-serving skews (features are not the same (e.g. not normalised age, or no default values)), drifts (feature statistical property changes), accuracy metrics, data quality issues
- alerts
- input: stream of predictions from Vertex Endpoint → alerts
- Vai AI Model Garden & Foundation Models - catalog of pre-trained and foundation models
- gemini, imagen, veo
- MaaS API
- Vertex AI Prompt Editor UI for tests
- prototype AI apps
- Vai Agent Builder & Agent Engine
- use Agent Development Kit (ADK) - orchestrate LLM “thought processes” langchain
- Vai Vector Search
- High-performance similarity search for embeddings used in RAG systems.
- objects → embeddings → vector database → retrieval
- Responsible AI and Security Features: Model Armor, content filtering, Safety controls, Prompt injection protection
- Video AI Video Intelligence API
- Recognize 20,000 objects, places, and actions in stored and streaming video. Extract rich metadata at the video, shot, or frame level
- stored: face, person, logo, speech transcription; streaming only: label, shot, explicit content detection, object tracking ~ $0.10–0.15/min
- content moderation, recommended content, media archive, contextual pubs
- Vision API
- logo, face, landmark, text, document text, safe content detection
- $1.50 for 1000 units
- Natural Language API
- entity, sentiment, entity sentiment, syntax, classification, text moderation
- Vai Vision:
- platform: stream, analyse, store.
- Ray on Vertex AI
- managed distributed computing service integrated into Google Cloud Vertex AI that lets you run Ray clusters for scalable Python-based distributed machine learning and data workloads. Ray itself is an open-source Python framework for distributed computing — useful for distributed model training, hyperparameter tuning, data processing, reinforcement learning, serving models, and more.
- Vai Workbench:
