Skip to main content

Ocean AWS K8s Cloud Cluster Overview Dashboard

The Cloud Cluster Overview dashboard provides enhanced functionality to analyze the Ocean Autoscaler actions with high granularity and gain deeper insights into its behavior. This topic delves into this dashboard's various components and sections, offering a detailed exploration of its capabilities.

Ocean continuously analyzes the utilization of your nodes in the cloud infrastructure. It automatically scales compute resources to optimize utilization and availability. It achieves this by intelligently combining Spot, Reserved, and On-Demand compute instances.

The Ocean Cloud Cluster Overview dashboard is divided into the following panels:

  • Ocean Savings panel: Shows the amount of money, CPU, memory, and GPU compute resources saved when you utilize Ocean to manage your Kubernetes cluster. Specifically, these are savings from running spot instances, bin packing, and reverting to lower-cost nodes.

  • Autoscaling Activity panel: The Ocean Autoscaler automatically performs scale-up and scale-down, autohealing, and reverts nodes to lower-cost nodes, spots, and commitments based on resources or availability. This panel shows a summary of these autoscaling activities as widgets.

  • Interactive Autoscaling Activity graph: This graph displays autoscaling activity over time, providing insights into the Ocean infrastructure/applications and cluster activity.

Access the Ocean Cloud Cluster Overview

To access the Ocean Cluster Overview dashboard:

  1. In the left main menu, click Ocean, and click Cloud Clusters.

  2. Select a cluster from the list of clusters.

Ocean Savings Panel

savings-panel-14-08

The Ocean Savings panel contains a set of savings widgets (displayed as tabs), which show your savings according to Ocean’s main autoscaling processes for cluster optimization in a selected time range:

  • Running on Spot: Savings from running on spot nodes instead of OD nodes.

  • Reverting to Lower-Cost: This process is applied to nodes with underutilized compute resources that cannot be scaled down from the cluster's set of nodes to optimize cluster utilization and reduce costs.

  • Right-sizing: Ocean savings from right-sizing your workloads.

  • Commitments: This process prioritizes available reserved instances and savings plan commitments over spot or full-priced on-demand instances.

In this panel:

  • You can select a time range for displayed data from the drop-down list to the right of the savings widgets. Available options are:

    • Last 30 Days
    • Last 7 Days
    • This Month.
  • For each savings widget, click on How is it calculated?

Ocean Savings from Running on Spot

The Ocean autoscaler intelligently leverages spot instances when appropriate, reducing costs while maintaining high availability. This tab lets you see the cost benefits of using spot instances in your cluster.

Ocean savings from running on spot are calculated as the difference between the price of on-demand and spot nodes within a specified time range. This calculation considers the number of CPUs, memory, and GPUs running as spot nodes.

To see these savings, click the Running on Spot tab, which shows:

  • Total estimated cluster savings = Total price if the running instances were on-demand – total price for running on spots.

  • Pie chart: VNG (Virtual Node Group) percentage savings breakdown for nodes.

  • Nodes: If one or more nodes are running on spots, the number of nodes appears. Click to see a listing of the nodes running on spots, and their types and costs, in the selected time range.

    The nodes list shows:

    • Instance ID.
    • Instance type.
    • $ spot hourly cost.
    • $ On-demand hourly cost.
    • Running hours: running on the spot node.
    • $ Savings: for running on the spot node.
    • VNG (click on the link to see the VNG details in the cluster overview - VNG tab).
Click to view image
putalttext

Ocean Savings from Reverting to a Lower Cost Node

This applies to nodes with underutilized compute resources that cannot be scaled down from the cluster's set of nodes. For example, suppose a pod was initially placed on a more expensive node due to resource constraints. In that case, Ocean autoscaler can replace it with a less expensive node when it becomes available, consequently saving costs. This tab lets you see how much you save from this dynamic resource allocation.

To see these savings, click the Revert to Lower Cost tab, which shows:

  • Total estimated cluster savings: = Running hours on the new instance x (hourly price for old instance - hourly price for new instance). For more information, see Revert to Lower Cost Node.

  • Pie chart: VNG (Virtual Node Group) percentage savings breakdown for nodes.

  • Resource savings: from reverting to lower cost:

    • CPU resources saved in vCPU Hours.
    • Memory resources saved in GiB.
    • GPU resources saved in GPU Hours.
  • Nodes: If one or more nodes were reverted to lower cost, the number of reverted nodes appears. Click to see a listing of the nodes reverted to lower cost, and their types and costs, in the selected time range.

    The nodes list shows an entry for each reverted node:

    • Instance ID: for the original and reverted nodes.
    • Instance Type: for the original and reverted nodes.
    • Lifecycle: for the original and reverted nodes.
    • $ Hourly cost: for the original and reverted nodes.
    • Running hours: on the new node.
    • $ Savings: from reverting to a lower cost node.
    • VNG (click on the link to see the VNG details in the cluster overview - VNG tab).
Click to view image
putalttext

Ocean Savings from Right-Sizing Your Workloads

note

This tab will only be available if the optional Ocean automatic right-sizing feature is installed and enabled in your deployment.

These are your actual savings from using the Ocean automatic right-sizing feature.

note

If no workloads have been optimized with automatic right-sizing, create and attach right-sizing rules to your workloads in the right-sizing Advanced Optimization tab.

To see these savings, click the Right-Sizing tab (unless already displayed), which shows:

  • Total estimated cluster savings: from down-sizing recommendations and the saved cluster costs in the selected time range = [costs for workloads based on original requests – cost for workloads based on Ocean-optimized requests].

  • Resource savings: from automatic right-sizing:

    • CPU resources in vCPU Hours:
    • Memory resources in GiB.
    • GPU resources in GPU Hours.
  • Savings per namespace: Namespace percentage savings breakdown for workloads.

  • Workloads: If one or more workloads were optimized for right-sizing, the number of workloads appears. Click to see a listing of the optimized workloads, and their types and costs, in the selected time range.

    • Workload Name.
    • Namespace.
    • Workload Type (if the workload was deleted, it is marked as Deleted).
    • CPU:
      • With Ocean (green color): The sum of all the CPU requests that Ocean recommended and applied automatically.
      • Without Ocean: The sum of all CPU requests in your original workloads.
    • Memory:
      • With Ocean (green color): The sum of all the memory requests that Ocean recommended and applied automatically.
      • Without Ocean: The sum of all the memory requests in your original workloads.
  • Right-sizing savings ($) for the workload.

Click to view image
putalttext

For more information about automatic right-sizing, see and

Ocean Savings from Commitments

These are savings achieved by utilizing reservations that you already purchased, and applying them to workloads running in Ocean. The savings are in accordance with Ocean commitment mechanism, which prioritizes available reserved instances and savings plan commitments over spot or full-priced on-demand instances.

To see these savings, click the Revert to Commitments tab (unless already displayed), which shows:

  • Total estimated cluster savings: = Hourly price of the on-demand instance leveraging RI/SP x Total running hours of the on-demand instance that leverages RI/SP.

    • Pie chart: VNG (Virtual Node Group) percentage savings breakdown for nodes.
  • Nodes (instances that leveraged unused commitments): If one or more nodes were reverted using reserved capacity (RI) and savings plans (SP), the number of reverted nodes appears. Click to see a listing of the nodes reverted to commitments, and their types and costs, in the selected time range.

    • Instance ID
    • Instance Type
    • $ Hourly cost for the on-demand instance.
    • Running hours for the on-demand instance.
    • VNG (click on the link to see the VNG details in the cluster overview - VNG tab).
Click to view image
putalttext

Autoscaling Activity Panel

The Autoscaling Activity panel contains the following widgets:

  • Scale Up widget.
  • Set of Continuous Optimization widgets.
  • Autoscaler graph.

Scale Up

This widget displays scaling-up events, which are instances when the Ocean Autoscaler increases the resources (CPU, memory, storage) in your cluster to accommodate higher workloads. A scale-up event is triggered when there are 'unschedulable' pods, i.e., pods that cannot be placed due to insufficient resources like CPU, memory, or GPU on the existing nodes. In response, Ocean Autoscaler identifies the optimal nodes to place these pods and increases the resources accordingly. This process ensures efficient resource allocation, improved performance, and shorter processing times.

Scaling up nodes also enables allocating more resources to specific tasks, improving performance and reducing processing time.

The Scale Up widget shows the number of scale-up events and scaled-up nodes in the selected time range. Tracking these events on the Autoscaling Activity panel allows you to monitor how effectively your cluster responds to workload demands.

For more information, see Scale Up.

Continuous Optimization

Ocean Autoscaler performs several automatic processes to optimize cluster resources.

Scale Down

Scaling down nodes refers to reducing the resources (such as CPU, memory, GPU) of individual nodes in a cluster.

Ocean autoscaler scales down nodes when the cluster is over-provisioned and does not utilize the full capacity of the nodes. In this case, scaling down right-sizes the resources to match the workload. In addition, if specific nodes in a cluster are consistently underutilized, Ocean autoscaler scales down to redistribute the resources to other nodes or consolidate the workload onto fewer nodes.

For more information, see Scale Down.

The Scale Down widget shows the number of scale-down events and the number of scaled-down nodes in the selected time range. By monitoring these events on the Autoscaling Activity panel, you can track how your cluster optimizes resources and reduces costs.

Revert to Spots

Suppose a node was launched as OD due to the unavailability of spot nodes in the market. In that case, Ocean continuously scans the market for an available spot node and reverts promptly upon finding one.

The Revert to Spots widget shows the number of events for which an OD node was reverted to a spot node in the selected time range.

By monitoring these events on the Autoscaling Activity panel, you can see when and how often your cluster takes advantage of cost-saving opportunities by utilizing Spot nodes.

Revert to Commitments

If a node was launched as spot due to the absence of available commitments (Reserved Instances or Savings Plans) to utilize, Ocean persistently scans the market for an available commitment and promptly reverts upon finding one.

The Revert to Commitments widget shows the number of events for which a spot node was reverted to a commitment in the selected time range.

By monitoring these events on the Autoscaling Activity panel, you can understand how your cluster is dynamically adapting to changes in the cloud market to maintain optimal operation and taking advantage of cost-saving opportunities by utilizing commitments.

For more information, see Reserved Instances.

Revert to Lower Cost

Ocean Autoscaler searches for nodes with underutilized compute resources that cannot be scaled down from the cluster's set of nodes and reverts these nodes to lower-cost nodes.

The Revert to Lower Cost widget shows the number of events for which a node was reverted to a lower-cost node in the selected time range. For more information, see Revert to Lower Cost Node.

Tracking these events on the Autoscaling Activity panel lets you see when and how often your cluster is optimizing for cost savings.

Dynamic Commitments

Ocean reverts from on-demand reserved instances and savings plans when they can be used in other workloads in your AWS account(s) to increase the overall coverage of their commitments.

Ocean continuously tracks commitments and initiates proactive replacements to increase the account’s commitment coverage and decrease excessive on-demand node usage. Ocean does this by reverting to a different allocation plan or potentially using spot instances based on risk configuration to provide ongoing optimal adjustments. For more information, see Dynamic Commitments.

Autohealing

Auto Healing monitors the status of each Kubernetes node by examining the condition object, which provides information about various aspects of the node. Specifically, Autohealing focuses on the Ready condition. If the Ready condition indicates a status of False or Unknown, the instance is deemed unhealthy, prompting the initiation of a replacement process.

The Autohealing widget shows the number of events for which AWS autohealing replaced an unhealthy instance with a healthy instance. By monitoring these autohealing events on the Autoscaling Activity panel, you can track how effectively Ocean autoscaler maintains your cluster's health and resilience to failures.

Autoscaling Activity Graph

The Autoscaling Activity Graph provides intuitive insights into the interaction between the Ocean infrastructure and the applications it supports. It also provides cluster activity insights at a granular level so you can see why the Ocean autoscaler triggered a specific scale event within the cluster.

The graph displays a breakdown by lifecycle, which you can view by vCPU, Memory, or GPU, together with workload and headroom requests.

The upper graph displays the autoscaling activity to analyze. By default, it displays a full set of activity curves. Click an item on the legend above the graph to turn its associated curve on or off.

These are the curves:

  • Lifecycle Types:

    • Spot.
    • On-demand.
    • Reserved.
    • Savings Plans.
  • Workload Requests: includes running pods and pending pods requests.

  • Workloads with Headroom: includes running pods, running Headroom pods, pending pods, and pending Headroom pods.

note

The pending pods include all the pods in the Kubernetes cluster (pods that will be scheduled on Ocean nodes and those that won’t).

To set the zoom level for the graph
  1. On the top-right of the screen, select to zoom by 1 hour / 12 hours / 7 days.

  2. Use the lower graph to zoom in or out of the selected period by dragging the sizing handles left or right.

    tip

    Set the period on the graph to 12 hours or less to view autoscaling events, which are displayed as dots on the graph.

To display the autoscaling activity at a specific point on the graph
  1. Mouse over the upper graph to view the main details.
  2. Click Cluster State Details to view more information.

This is the information displayed:

  • Autoscaling events (displayed for a period of 12 hours or less):One or more autoscaling events are displayed in blue rectangles. These can be any Ocean autoscaling events displayed as widgets above the graph. For example, Scale Up, Scale Down, Dynamic Commitments, or Autohealing.

For vCPU/memory(GiB)/GPU:

  • Nodes Allocation (vCPU/GiB/GPU):
    • Total node allocation.
    • Split according to lifecycle types.
  • vCPU/GiB/GPU of Workload:
    • deamonSets Requested.
    • Pods Requested.
    • Headroom requested.
    • Total workload allocation percentage (includes only running pods, running headroom pods, and running DaemonSets pods that are scheduled on Ocean nodes).

Total Allocation Calculation:

  • Sum of vCPU/Memory/GPU allocation - (running pods + running headroom + running DaemonSets)/nodes allocation.
To display detailed information about a specific autoscaling event
  • Click on an event displayed in a blue rectangle, for example, for a Scale Down event.

    note

    The information shown in the box depends on the type of event.

    The above example shows the pre and post scale resource utilization for the scale-down action. Ocean simulated two running pods that could be moved to other nodes for better node utilization.

    Each pod was rescheduled on a different target node. All the terminated instances were spots.

Scaling (Kubernetes)