Kiali can be run directly on your machine without being installed into a Kubernetes cluster. It uses your kubeconfig to connect to your cluster(s). If needed, it can port-forward into the cluster to connect to your external services (prometheus, tracing, istio, grafana).

Running Kiali locally is currently experimental. Functionality may change between releases.

Download the Kiali binary from the Kiali GitHub releases page for your OS and Arch.

Start Kiali which runs the backend server on localhost and opens your default browser to the Kiali UI.

kiali run

To see the full list of options

kiali run --help

If the cluster name in your kubeconfig does not match the cluster name in Istio you can override this with --cluster-name-overrides kubeconfig-name=istio-cluster-name. The flag is a comma separated list so you can override as many names as you need.

Install Kiali

You can quickly install Kiali into your cluster via one of the following two methods.

These instructions are not recommended for production environments. Find more detailed information on installing Kiali, see the installation guide.

Before you install Kiali you must have already installed Istio along with its telemetry storage addon (i.e. Prometheus). You might also consider installing Istio’s optional tracing addon (i.e. Jaeger) and optional Grafana addon but those are not required by Kiali. Refer to the Istio documentation for details.

Install via Istio Addons

If you downloaded Istio, the easiest way to install and try Kiali is by running:

kubectl apply -f ${ISTIO_HOME}/samples/addons/kiali.yaml

To uninstall:

kubectl delete -f ${ISTIO_HOME}/samples/addons/kiali.yaml --ignore-not-found

Install via Helm

Only Helm v3 has been tested. Previous Helm versions may or may not work.

To install the latest version of Kiali Server using Helm, run the following command:

helm install \
  --namespace istio-system \
  --set auth.strategy="anonymous" \
  --repo https://kiali.org/helm-charts \
  kiali-server \
  kiali-server

If you get a validation error, you may have to pass the option --disable-openapi-validation (this is needed on some versions of OpenShift, for example).

To uninstall:

helm uninstall --namespace istio-system kiali-server

Access to the UI

Run the following command:

kubectl port-forward svc/kiali 20001:20001 -n istio-system

Then, access Kiali by visiting https://localhost:20001/ in your preferred web browser.

1.2 - Installation Guide

Installing Kiali for production.

This section describes the production installation methods available for Kiali.

The recommended way to deploy Kiali is via the Kiali Operator, either using Helm Charts or OperatorHub.

The Kiali Operator is a Kubernetes Operator and manages your Kiali installation. It watches the Kiali Custom Resource (Kiali CR), a YAML file that holds the deployment configuration.

It is only necessary to install the Kiali Operator once. After the operator is installed you only need to create or edit the Kiali CR. Never manually edit resources created by the Kiali Operator.

If you previously installed Kiali via a different mechanism, you must first uninstall Kiali using the original mechanism’s uninstall procedures. There is no migration path between older installation mechanisms and the install mechanisms explained in this documentation.

1.2.1 - Prerequisites

Hardware and Software compatibility and requirements.

Istio

Before you install Kiali you must have already installed Istio along with its telemetry storage addon (e.g. Prometheus). You might also consider installing Istio’s optional tracing addon (e.g. Tempo) and optional Grafana addon but those are not required by Kiali. Refer to the Istio documentation for details.

Optionally Enable the Debug Interface

Like istioctl, Kiali can make use of Istio’s port 8080 “Debug Interface” API. Despite the naming, this is required for accessing the status of the proxies.

The ENABLE_DEBUG_ON_HTTP setting controls the relevant API access. Istio suggests to disable this for security, but Kiali requires ENABLE_DEBUG_ON_HTTP=true, which is the default.

If you prefer not to enable the Istio API then certain Kiali features will be unavailable. If disabled, set spec.external_services.istio.istio_api_enabled: false in the Kiali CR.

For more information, see the Istio documentation.

Version Compatibility

It is always recommended that users run a supported version of Istio. The Istio news page posts end-of-support (EOL) dates. Supported Kiali versions include only the Kiali versions associated with supported Istio versions.

Starting with Kiali v2.4, each Kiali release is tested against the currently supported Istio releases. Unless otherwise noted, a Kiali release will be compatible with those releases. Older, untested Istio versions may also be compatible. Known incompatibilities will be noted in the table below. Prior to Kiali v2.4, compatibility is guaranteed only against the latest Istio release at the time. Although compatibility may be fine with other versions.

Istio	Tested Kiali Versions	Notes
1.28	2.17 and higher
1.27	2.12 and higher
1.26	2.9 and higher
1.25	2.5-2.16	Istio 1.25 is out of support.
1.24	2.0-2.13	Istio 1.24 is out of support.
1.23	1.87, 2.4-2.8	Istio 1.23 is out of support. Kiali v2 requires migration from Kiali v1 non-default namespace management (i.e. accessible_namespaces) to Discovery Selectors.
1.22	1.87, 2.4-2.5	Istio 1.22 is out of support. Kiali v1.86 is the recommended minimum for Istio Ambient users. Starting with Kiali v1.86,.1 Istio v1.22 is required.
1.21	1.81	Istio 1.21 is out of support.
1.20	1.78	Istio 1.20 is out of support.
1.19	1.75	Istio 1.19 is out of support.
1.18	1.73	Istio 1.18 is out of support.
1.17	1.66	Istio 1.17 is out of support. Avoid 1.63.0,1.63.1 due to a regression.
1.16	1.63	Istio 1.16 is out of support. Avoid 1.62.0,1.63.0,1.63.1 due to a regression.
1.15	1.59	Istio 1.15 is out of support.
1.14	1.54	Istio 1.14 is out of support.
1.13	1.49	Istio 1.13 is out of support.
1.12	1.44	Istio 1.12 is out of support.
1.11	1.41	Istio 1.11 is out of support.
1.10	1.37	Istio 1.10 is out of support.
1.9	1.33	Istio 1.9 is out of support.
1.8	1.28	Istio 1.8 is out of support. It removes all support for mixer/telemetry V1, as does Kiali 1.26.0. Use earlier versions of Kiali for mixer.
1.7	1.25	Istio 1.7 is out of support. Istioctl no longer installs Kiali. Use the Istio samples/addons for quick demo installs.
1.6	1.21	Istio 1.6 is out of support. Kiali 1.17 is recommended for Istio < 1.6.

OpenShift Service Mesh Version Compatibility

OpenShift

If you are running Red Hat OpenShift Service Mesh (OSSM), use only the bundled, supported version of Kiali.

OSSM	Kiali	Notes
3.2	2.17
3.1	2.11
3.0	2.4
2.6	1.73
2.5	1.73	OSSM 2.5 is out of support
2.4	1.65	OSSM 2.4 is out of support
2.3	1.57	OSSM 2.3 is out of support
2.2	1.48	OSSM 2.2 is out of support

OpenShift Console Plugin (OSSMC) Version Compatibility

Kiali server with the same version of OSSMC plugin must be installed previously in your OpenShift cluster.

OpenShift	OSSMC Min	OSSMC Max	Notes
4.19+	2.20
4.15+	1.84	2.19
4.12-4.18	1.73	1.83	All OSSMC versions from v1.73 to v1.83 are only compatible with Kiali server v1.73

Maistra Version Compatibility

Maistra	SMCP CR	Kiali	Notes
2.6	2.6	1.73	Using Maistra 2.6 to install service mesh control plane 2.6 requires Kiali Operator v1.73. Other versions are not compatible.
2.6	2.5	1.73	Using Maistra 2.6 to install service mesh control plane 2.5 requires Kiali Operator v1.73. Other versions are not compatible.
2.6	2.4	1.65	Using Maistra 2.6 to install service mesh control plane 2.4 requires Kiali Operator v1.73. Other versions are not compatible.
2.5	2.5	1.73	Using Maistra 2.5 to install service mesh control plane 2.5 requires Kiali Operator v1.73. Other versions are not compatible.
2.5	2.4	1.65	Using Maistra 2.5 to install service mesh control plane 2.4 requires Kiali Operator v1.73. Other versions are not compatible.
2.4	2.4	1.65	Using Maistra 2.4 to install service mesh control plane 2.4 requires Kiali Operator v1.65. Other versions are not compatible.
n/a	2.3	n/a	Service mesh control plane 2.3 is out of support.
n/a	2.2	n/a	Service mesh control plane 2.2 is out of support.
n/a	2.1	n/a	Service mesh control plane 2.1 is out of support.
n/a	2.0	n/a	Service mesh control plane 2.0 is out of support.
n/a	1.1	n/a	Service mesh control plane 1.1 is out of support.
n/a	1.0	n/a	Service mesh control plane 1.0 is out of support.

Browser Compatibility

Kiali requires a modern web browser and supports the last two versions of Chrome, Firefox, Safari or Edge.

Hardware Requirements

Any machine capable of running a Kubernetes based cluster should also be able to run Kiali.

However, Kiali tends to grow in resource usage as your cluster grows. Usually the more namespaces and workloads you have in your cluster, the more memory you will need to allocate to Kiali.

Platform-specific requirements

OpenShift

If you are installing on OpenShift, you must grant the cluster-admin role to the user that is installing Kiali. If OpenShift is installed locally on the machine you are using, the following command should log you in as user system:admin which has this cluster-admin role:

$ oc login -u system:admin

For most commands listed on this documentation, the Kubernetes CLI command kubectl is used to interact with the cluster environment. On OpenShift you can simply replace kubectl with oc, unless otherwise noted.

Google Cloud Private Cluster

Private clusters on Google Cloud have network restrictions. Kiali needs your cluster’s firewall to allow access from the Kubernetes API to the Istio Control Plane namespace, for both the 8080 and 15000 ports.

To review the master access firewall rule:

gcloud compute firewall-rules list --filter="name~gke-${CLUSTER_NAME}-[0-9a-z]*-master"

To replace the existing rule and allow master access:

gcloud compute firewall-rules update <firewall-rule-name> --allow <previous-ports>,tcp:8080,tcp:15000

Istio deployments on private clusters also need extra ports to be opened. Check the Istio installation page for GKE to see all the extra installation steps for this platform.

1.2.2 - Install via Helm

Using Helm to install the Kiali Operator or Server.

Introduction

Helm is a popular tool that lets you manage Kubernetes applications. Applications are defined in a package named Helm chart, which contains all of the resources needed to run an application.

Kiali has a Helm Charts Repository at https://kiali.org/helm-charts. Two Helm Charts are provided:

The kiali-operator Helm Chart installs the Kiali operator which in turn installs Kiali when you create a Kiali CR.
The kiali-server Helm Chart installs a standalone Kiali without the need of the Operator nor a Kiali CR.

The kiali-server Helm Chart does not provide all the functionality that the Kiali Operator provides. Some features you read about in the documentation may only be available if you install the Kiali Server using the Kiali Operator (see this FAQ for details). Therefore, although the kiali-server Helm Chart is actively maintained, it is not recommended and is only provided for convenience. If using Helm, the recommended method is to install the kiali-operator Helm Chart and then create a Kiali CR to let the Operator deploy Kiali.

Make sure you have the helm command available by following the Helm installation docs.

Helm version 3.10 is the minimum required Helm version. Older versions will not work. Newer versions have not been tested.

Adding the Kiali Helm Charts repository

Add the Kiali Helm Charts repository with the following command:

$ helm repo add kiali https://kiali.org/helm-charts

All helm commands in this page assume that you added the Kiali Helm Charts repository as shown.

If you already added the repository, you may want to update your local cache to fetch latest definitions by running:

$ helm repo update

Installing Kiali using the Kiali operator

This installation method gives Kiali access to existing namespaces as well as namespaces created later. See Namespace Management for more information.

Once you’ve added the Kiali Helm Charts repository, you can install the latest Kiali Operator along with the latest Kiali server by running the following command:

$ helm install \
    --set cr.create=true \
    --set cr.namespace=istio-system \
    --set cr.spec.auth.strategy="anonymous" \
    --namespace kiali-operator \
    --create-namespace \
    kiali-operator \
    kiali/kiali-operator

The --namespace kiali-operator and --create-namespace flags instructs to create the kiali-operator namespace (if needed), and deploy the Kiali operator on it. The --set cr.create=true and --set cr.namespace=istio-system flags instructs to create a Kiali CR in the istio-system namespace. Since the Kiali CR is created in advance, as soon as the Kiali operator starts, it will process it to deploy Kiali. After Kiali has started, you can access Kiali UI through ‘http://localhost:20001’ by executing kubectl port-forward service/kiali -n istio-system 20001:20001 because of --set cr.spec.auth.strategy="anonymous". But realize that anonymous mode will allow anyone to be able to see and use Kiali. If you wish to require users to authenticate themselves by logging into Kiali, use one of the other auth strategies.

The Kiali Operator Helm Chart is configurable. Check available options and default values by running:

$ helm show values kiali/kiali-operator

You can pass the --version X.Y.Z flag to the helm install and helm show values commands to work with a specific version of Kiali.

The kiali-operator Helm Chart mirrors all settings of the Kiali CR as chart values that you can configure using regular --set flags. For example, the Kiali CR has a spec.server.web_root setting which you can configure in the kiali-operator Helm Chart by passing --set cr.spec.server.web_root=/your-path to the helm install command.

For more information about the Kiali CR, see the Creating and updating the Kiali CR page.

Operator-Only Install

To install only the Kiali Operator, omit the --set cr.create and --set cr.namespace flags of the helm command previously shown. For example:

$ helm install \
    --namespace kiali-operator \
    --create-namespace \
    kiali-operator \
    kiali/kiali-operator

This will omit creation of the Kiali CR, which you will need to create later to install Kiali Server. This option is good if you plan to do large customizations to the installation.

Installing Multiple Instances of Kiali

By installing a single Kiali operator in your cluster, you can install multiple instances of Kiali by simply creating multiple Kiali CRs. For example, if you have two Istio control planes in namespaces istio-system and istio-system2, you can create a Kiali CR in each of those namespaces to install a Kiali instance in each control plane.

If you wish to install multiple Kiali instances in the same namespace, or if you need the Kiali instance to have different resource names than the default of kiali, you can specify spec.deployment.instance_name in your Kiali CR. The value for that setting will be used to create a unique instance of Kiali using that instance name rather than the default kiali. One use-case for this is to be able to have unique Kiali service names across multiple Kiali instances in order to be able to use certain routers/load balancers that require unique service names.

Since the spec.deployment.instance_name field is used for the Kiali resource names, including the Service name, you must ensure the value you assign this setting follows the Kubernetes DNS Label Name rules. If it does not, the operator will abort the installation. And note that because Kiali uses this as a prefix (it may append additional characters for some resource names) its length is limited to 40 characters.

Standalone Kiali installation

To install the Kiali Server without the operator, use the kiali-server Helm Chart:

$ helm install \
    --namespace istio-system \
    kiali-server \
    kiali/kiali-server

The kiali-server Helm Chart mirrors all settings of the Kiali CR as chart values that you can configure using regular --set flags. For example, the Kiali CR has a spec.server.web_fqdn setting which you can configure in the kiali-server Helm Chart by passing the --set server.web_fqdn flag as follows:

$ helm install \
    --namespace istio-system \
    --set server.web_fqdn=example.com \
    kiali-server \
    kiali/kiali-server

Upgrading Helm installations

If you want to upgrade to a newer Kiali version (or downgrade to older versions), you can use the regular helm upgrade commands. For example, the following command should upgrade the Kiali Operator to the latest version:

$ helm upgrade \
    --namespace kiali-operator \
    --reuse-values \
    kiali-operator \
    kiali/kiali-operator

WARNING: No migration paths are provided. However, Kiali is a stateless application and if the helm upgrade command fails, please uninstall the previous version and then install the new desired version.

By upgrading the Kiali Operator, existent Kiali Server installations managed with a Kiali CR will also be upgraded once the updated operator starts.

Managing configuration of Helm installations

After installing either the kiali-operator or the kiali-server Helm Charts, you may be tempted to manually modify the created resources to modify the installation. However, we recommend using helm upgrade to update your installation.

For example, assuming you have the following installation:

$ helm list -n kiali-operator
NAME            NAMESPACE       REVISION        UPDATED                                 STATUS          CHART                   APP VERSION
kiali-operator  kiali-operator  1               2021-09-14 18:00:45.320351026 -0500 CDT deployed        kiali-operator-1.40.0   v1.40.0

Notice that the current installation is version 1.40.0 of the kiali-operator. Let’s assume you want to use your own mirrors of the Kiali Operator container images. You can update your installation with the following command:

$ helm upgrade \
    --namespace kiali-operator \
    --reuse-values \
    --set image.repo=your_mirror_registry_url/owner/kiali-operator-repo \
    --set image.tag=your_mirror_tag \
    --version 1.40.0 \
    kiali-operator \
    kiali/kiali-operator

Make sure that you specify the --reuse-values flag to take the configuration of your current installation. Then, you only need to specify the new settings you want to change using --set flags.

Make sure that you specify the --version X.Y.Z flag with the version of your current installation. Otherwise, you may end up upgrading to a new version.

Uninstalling

Removing the Kiali operator and managed Kialis

If you used the kiali-operator Helm chart, first you must ensure that all Kiali CRs are deleted. For example, the following command will agressively delete all Kiali CRs in your cluster:

$ kubectl delete kiali --all --all-namespaces

The previous command may take some time to finish while the Kiali operator removes all Kiali installations.

Then, remove the Kiali operator using a standard helm uninstall command. For example:

$ helm uninstall --namespace kiali-operator kiali-operator
$ kubectl delete crd kialis.kiali.io

You have to manually delete the kialis.kiali.io CRD because Helm won’t delete it.

If you fail to delete the Kiali CRs before uninstalling the operator, a proper cleanup may not be done.

Known problem: uninstall hangs (unable to delete the Kiali CR)

Typically this happens if not all Kiali CRs are deleted prior to uninstalling the operator. To force deletion of a Kiali CR, you need to clear its finalizer. For example:

$ kubectl patch kiali kiali -n istio-system -p '{"metadata":{"finalizers": []}}' --type=merge

This forces deletion of the Kiali CR and will skip uninstallation of the Kiali Server. Remnants of the Kiali Server may still exist in your cluster which you will need to manually remove.

Removing standalone Kiali

If you installed a standalone Kiali by using the kiali-server Helm chart, use the standard helm uninstall commands. For example:

$ helm uninstall --namespace istio-system kiali-server

1.2.3 - Install via OperatorHub

Using OperatorHub to install the Kiali Operator.

Introduction

The OperatorHub is a website that contains a catalog of Kubernetes Operators. Its aim is to be the central location to find Operators.

The OperatorHub relies in the Operator Lifecycle Manager (OLM) to install, manage and update Operators on any Kubernetes cluster.

The Kiali Operator is being published to the OperatorHub. So, you can use the OLM to install and manage the Kiali Operator installation.

Installing the Kiali Operator using the OLM

Go to the Kiali Operator page in the OperatorHub: https://operatorhub.io/operator/kiali.

You will see an Install button at the right of the page. Press it and you will be presented with the installation instructions. Follow these instructions to install and manage the Kiali Operator installation using OLM.

Afterwards, you can create the Kiali CR to install Kiali.

Installing the Kiali Operator in OpenShift

The OperatorHub is bundled in the OpenShift console. To install the Kiali Operator, simply go to the OperatorHub in the OpenShift console and search for the Kiali Operator. Then, click on the Install button and follow the instruction on the screen.

Afterwards, you can create the Kiali CR to install Kiali.

1.2.4 - The Kiali CR

Creating and updating the Kiali CR.

The Kiali Operator watches the Kiali Custom Resource (Kiali CR), a custom resource that contains the Kiali Server deployment configuration. Creating, updating, or removing a Kiali CR will trigger the Kiali Operator to install, update, or remove Kiali.

If you want the operator to re-process the Kiali CR (called “reconciliation”) without having to change the Kiali CR’s spec fields, you can modify any annotation on the Kiali CR itself. This will trigger the operator to reconcile the current state of the cluster with the desired state defined in the Kiali CR, modifying cluster resources if necessary to get them into their desired state. Here is an example illustrating how you can modify an annotation on a Kiali CR:

$ kubectl annotate kiali my-kiali -n istio-system --overwrite kiali.io/reconcile="$(date)"

The Operator provides comprehensive defaults for all properties of the Kiali CR. Hence, the minimal Kiali CR does not have a spec:

apiVersion: kiali.io/v1alpha1
kind: Kiali
metadata:
  name: kiali

Assuming you saved the previous YAML to a file named my-kiali-cr.yaml, and that you are installing Kiali in the same default namespace as Istio, create the resource with the following command:

$ kubectl apply -f my-kiali-cr.yaml -n istio-system

Often, but not always, Kiali is installed in the same namespace as Istio, thus the Kiali CR is also created in the Istio namespace.

Once created, the Kiali Operator should shortly be notified and will process the resource, performing the Kiali installation. You can wait for the Kiali Operator to finish the reconcilation by using the standard kubectl wait command and ask for it to wait for the Kiali CR to achieve the condition of Successful. For example:

kubectl wait --for=condition=Successful kiali kiali -n istio-system

You can check the installation progress by inspecting the status attribute of the created Kiali CR:

$ kubectl describe kiali kiali -n istio-system
Name:         kiali
Namespace:    istio-system
Labels:       <none>
Annotations:  <none>
API Version:  kiali.io/v1alpha1
Kind:         Kiali

  (...some output is removed...)

Status:
  Conditions:
    Last Transition Time:  2021-09-15T17:17:40Z
    Message:               Running reconciliation
    Reason:                Running
    Status:                True
    Type:                  Running
  Deployment:
    Instance Name:  kiali
    Namespace:      istio-system
  Environment:
    Is Kubernetes:       true
    Kubernetes Version:  1.27.3
    Operator Version:    v1.89.0
  Progress:
    Duration:    0:00:16
    Message:     5. Creating core resources
  Spec Version:  default
Events:        <none>

Never manually edit resources created by the Kiali Operator; only edit the Kiali CR.

You may want to check the example install page to see some examples where the Kiali CR has a spec and to better understand its structure. Most available attributes of the Kiali CR are described in the pages of the Installation and Configuration sections of the documentation. For a complete list, see the Kiali CR Reference.

It is important to understand the spec.deployment.cluster_wide_access setting in the CR. See the Namespace Management page for more information.

Once you created a Kiali CR, you can manage your Kiali installation by editing the resource using the usual Kubernetes tools:

$ kubectl edit kiali kiali -n istio-system

To confirm your Kiali CR is valid, you can utilize the Kiali CR validation tool.

1.2.5 - The OSSMConsole CR

Creating and updating the OSSMConsole CR.

OpenShift ServiceMesh Console (aka OSSMC) provides a Kiali integration with the OpenShift Console; in other words it provides Kiali functionality within the context of the OpenShift Console. OSSMC is applicable only within OpenShift environments.

The main component of OSSMC is a plugin that gets installed inside the OpenShift Console. Prior to installing this plugin, you are required to have already installed the Kiali Operator and Kiali Server in your OpenShift environment. Please the Installation Guide for details.

There are no helm charts available to install OSSMC. You must utilize the Kiali Operator to install it. Installing the Kiali Operator on OpenShift is very easy due to the Operator Lifecycle Manager (OLM) functionality that comes with OpenShift out-of-box. Simply elect to install the Kiali Operator from the Red Hat or Community Catalog from the OperatorHub page in OpenShift Console.

The Kiali Operator watches the OSSMConsole Custom Resource (OSSMConsole CR), a custom resource that contains the OSSMC deployment configuration. Creating, updating, or removing a OSSMConsole CR will trigger the Kiali Operator to install, update, or remove OSSMC.

Never manually edit resources created by the Kiali Operator, only edit the OSSMConsole CR.

Creating the OSSMConsole CR to Install the OSSMC Plugin

With the Kiali Operator and Kial Server installed and running, you can install the OSSMC plugin in one of two ways - either via the OpenShift Console or via the “oc” CLI. Both methods are described below. You choose the method you want to use.

You should specify the spec.version field of the OSSMConsole CR, and its value must be the same version as that of the Kiali Server (i.e. it must match the spec.version of the Kiali Server’s Kiali CR). Normally, you can just set spec.version to default which tells the Kiali Operator to install OSSMC whose version is the same as that of the operator itself. Alternatively, you may specify one of the supported versions in the format vX.Y.

Installing via OpenShift Console

From the Kiali Operator details page in the OpenShift Console, create an instance of the “OpenShift Service Mesh Console” resource. Accept the defaults on the installation form and press “Create”.

Install Plugin

Installing via “oc” CLI

To instruct the Kiali Operator to install the plugin, simply create a small OSSMConsole CR. A minimal CR can be created like this:

cat <<EOM | oc apply -f -
apiVersion: kiali.io/v1alpha1
kind: OSSMConsole
metadata:
  namespace: openshift-operators
  name: ossmconsole
spec:
  version: default
EOM

Note that the operator will deploy the plugin resources in the same namespace where you create this OSSMConsole CR - in this case openshift-operators but you can create the CR in any namespace.

For a complete list of configuration options available within the OSSMConsole CR, see the OSSMConsole CR Reference.

To confirm your OSSMConsole CR is valid, you can utilize the OSSMConsole CR validation tool.

Installation Status

After the plugin is installed, you can see the “OSSMConsole” resource that was created in the OpenShift Console UI. Within the operator details page in the OpenShift Console UI, select the OpenShift Service Mesh Console tab to view the resource that was created and its status. The CR status field will provide you with any error messages should the deployment of OSSMC fail.

Installed Plugin

Once the operator has finished processing the OSSMConsole CR, you must then wait for the OpenShift Console to load and initialize the plugin. This may take a minute or two. You will know when the plugin is ready when the OpenShift Console pops up this message - when you see this message, refresh the browser window to reload the OpenShift Console:

Plugin Ready

Uninstalling OSSMC

This section will describe how to uninstall the OpenShift Service Mesh Console plugin. You can uninstall the plugin in one of two ways - either via the OpenShift Console or via the “oc” CLI. Both methods are described in the sections below. You choose the method you want to use.

If you intend to also uninstall the Kiali Operator, it is very important to first uninstall the OSSMConsole CR and then uninstall the operator. If you uninstall the operator before ensuring the OSSMConsole CR is deleted then you may have difficulty removing that CR and its namespace. If this occurs then you must manually remove the finalizer on the CR in order to delete it and its namespace. You can do this via: oc patch ossmconsoles <CR name> -n <CR namespace> -p '{"metadata":{"finalizers": []}}' --type=merge

Uninstalling via OpenShift Console

Remove the OSSMConsole CR by navigating to the operator details page in the OpenShift Console UI. From the operator details page, select the OpenShift Service Mesh Console tab and then select the Delete option in the kebab menu.

Uninstall Plugin

Uninstalling via “oc” CLI

Remove the OSSMConsole CR via oc delete ossmconsoles <CR name> -n <CR namespace>. To make sure any and all CRs are deleted from any and all namespaces, you can run this command:

for r in $(oc get ossmconsoles --ignore-not-found=true --all-namespaces -o custom-columns=NS:.metadata.namespace,N:.metadata.name --no-headers | sed 's/  */:/g'); do oc delete ossmconsoles -n $(echo $r|cut -d: -f1) $(echo $r|cut -d: -f2); done

1.2.6 - Accessing Kiali

Accessing and exposing the Kiali UI.

Introduction

After Kiali is succesfully installed you will need to make Kiali accessible to users. This page describes some popular methods of exposing Kiali for use.

If exposing Kiali in a custom way, you may need to set some configurations to make Kiali aware of how users will access Kiali.

The examples on this page assume that you followed the Installation guide to install Kiali, and that you installed Kiali in the istio-system namespace.

Accessing Kiali using port forwarding

This method should work in any kind of Kubernetes cluster.

You can use port-forwarding to access Kiali by running any of these commands:

# If you have oc command line tool
oc port-forward svc/kiali 20001:20001 -n istio-system
# If you have kubectl command line tool
kubectl port-forward svc/kiali 20001:20001 -n istio-system

These commands will block. Access Kiali by visiting https://localhost:20001/ in your preferred web browser.

Please note that this method exposes Kiali only to the local machine, no external users. You must have the necessary privileges to perform port forwarding.

Accessing Kiali through an Ingress

You can configure Kiali to be installed with an Ingress resource defined, allowing you to access the Kiali UI through the Ingress. By default, an Ingress will not be created. You can enable a simple Ingress by setting spec.deployment.ingress.enabled to true in the Kiali CR (a similar setting for the server Helm chart is available if you elect to install Kiali via Helm as opposed to the Kiali Operator).

Exposing Kiali externally through this spec.deployment.ingress mechanism is a convenient way of exposing Kiali externally but it will not necessarily work or be the best way to do it because the way in which you should expose Kiali externally will be highly dependent on your specific cluster environment and how services are exposed generally for that environment.

When installing on an OpenShift cluster, an OpenShift Route will be installed (not an Ingress). This Route will be installed by default unless you explicitly disable it via spec.deployment.ingress.enabled: false. Note that the Route is required if you configure Kiali to use the auth strategy of openshift (which is the default auth strategy Kiali will use when installed on OpenShift).

The default Ingress that is created will be configured for a typical NGinx implementation. If you have your own Ingress implementation you want to use, you can override the default configuration through the settings spec.deployment.ingress.override_yaml and spec.deployment.ingress.class_name. More details on customizing the Ingress can be found below.

The Ingress IP or domain name should then be used to access the Kiali UI. To find your Ingress IP or domain name, as per the minikube documentation, try the following command (though this may not work if using Minikube without the ingress addon):

kubectl get ingress kiali -n istio-system -o jsonpath='{.status.loadBalancer.ingress[0].ip}'

If it doesn’t work, unfortunately, it depends on how and where you had setup your cluster. There are several Ingress controllers available and some cloud providers have their own controller or preferred exposure method. Check the documentation of your cloud provider. You may need to customize the pre-installed Ingress rule or expose Kiali using a different method.

Customizing the Ingress resource

The created Ingress resource will route traffic to Kiali regardless of the domain in the URL. You may need a more specific Ingress resource that routes traffic to Kiali only on a specific domain or path. To do this, you can specify route settings.

Alternatively, and for more advanced Ingress configurations, you can provide your own Ingress declaration in the Kiali CR. For example:

When installing on an OpenShift cluster, the deployment.ingress.override_yaml will be applied to the created Route. The deployment.ingress.class_name is ignored on OpenShift.

spec:
  deployment:
    ingress:
      class_name: "nginx"
      enabled: true
      override_yaml:
        metadata:
          annotations:
            nginx.ingress.kubernetes.io/secure-backends: "true"
            nginx.ingress.kubernetes.io/backend-protocol: "HTTPS"
        spec:
          rules:
            - http:
                paths:
                  - path: /kiali
                    backend:
                      serviceName: kiali
                      servicePort: 20001

Accessing Kiali in Minikube

If you enabled the Ingress addon, the default Ingress resource created by the installation (mentioned in the previous section) should be enough to access Kiali. The following command should open Kiali in your default web browser:

xdg-open https://$(minikube ip)/kiali

Accessing Kiali through a LoadBalancer or a NodePort

By default, the Kiali service is created with the ClusterIP type. To use a LoadBalancer or a NodePort, you can change the service type in the Kiali CR as follows:

spec:
  deployment:
    service_type: LoadBalancer

Once the Kiali operator updates the installation, you should be able to use the kubectl get svc -n istio-system kiali command to retrieve the external address (or port) to access Kiali. For example, in the following output Kiali is assigned the IP 192.168.49.201, which means that you can access Kiali by visiting http://192.168.49.201:20001 in a browser:

NAME    TYPE           CLUSTER-IP       EXTERNAL-IP      PORT(S)                          AGE
kiali   LoadBalancer   10.105.236.127   192.168.49.201   20001:31966/TCP,9090:30128/TCP   34d

If you are using the LoadBalancer service type to directly expose the Kiali service, you may want to check the available options for the HTTP Server and Metrics server.

Accessing Kiali through an Istio Ingress Gateway

If you want to take advantage of Istio’s infrastructure, you can expose Kiali using an Istio Ingress Gateway. The Istio documentation provides a good guide explaining how to expose the sample add-ons. Even if the Istio guide is focused on the sample add-ons, the steps are the same to expose a Kiali installed using this Installation guide.

Accessing Kiali in OpenShift

By default, Kiali is exposed through a Route if installed on OpenShift. The following command should open Kiali in your default web browser:

xdg-open https://$(oc get routes -n istio-system kiali -o jsonpath='{.spec.host}')/console

Specifying route settings

If you are using your own exposure method or if you are using one of the methods mentioned in this page, you may need to configure the route that is being used to access Kiali.

In the Kiali CR, route settings are broken in several attributes. For example, to specify that Kiali is being accessed under the https://apps.example.com:8080/dashboards/kiali URI, you would need to set the following:

spec:
  server:
    web_fqdn: apps.example.com
    web_port: 8080
    web_root: /dashboards/kiali
    web_schema: https

If you are letting the installation create an Ingress resource for you, the Ingress will be adjusted to match these route settings. If you are using your own exposure method, these spec.server settings are only making Kiali aware of what its public endpoint is.

It is possible to omit these settings and Kiali may be able to discover some of these configurations, depending on your exposure method. For example, if you are exposing Kiali via LoadBalancer or NodePort service types, Kiali can discover most of these settings. If you are using some kind of Ingress, Kiali will honor X-Forwarded-Proto, X-Forwarded-Host and X-Forwarded-Port HTTP headers if they are properly injected in the request.

The web_root receives special treatment, because this is the path where Kiali will serve itself (both the user interface and its api). This is useful if you are serving multiple applications under the same domain. It must begin with a slash and trailing slashes must be omitted. The default value is /kiali for Kubernetes and / for OpenShift.

Usually, these settings can be omitted. However, a few features require that the Kiali’s public route be properly discoverable or that it is properly configured; most notably, the OpenID authentication.

1.2.7 - Advanced Install

Advanced installation options.

Canary upgrades

During a canary upgrade where multiple controlplanes are present, Kiali will automatically detect both controlplanes. You can visit the mesh page to visualize your controlplanes during a canary upgrade.

Installing a Kiali Server of a different version than the Operator

When you install the Kiali Operator, it will be configured to install a Kiali Server that is the same version as the operator itself. For example, if you have Kiali Operator v1.34.0 installed, that operator will install Kiali Server v1.34.0. If you upgrade (or downgrade) the Kiali Operator, the operator will in turn upgrade (or downgrade) the Kiali Server.

There are certain use-cases in which you want the Kiali Operator to install a Kiali Server whose version is different than the operator version. Read the following section «Using a custom image registry» section to learn how to configure this setup.

Using a custom image registry

Kiali is released and published to the Quay.io container image registry. There is a repository hosting the Kiali operator images and another one for the Kiali server images.

If you need to mirror the Kiali container images to some other registry, you still can use Helm to install the Kiali operator as follows:

$ helm install \
    --namespace kiali-operator \
    --create-namespace \
    --set image.repo=your.custom.registry/owner/kiali-operator-repo
    --set image.tag=your_custom_tag
    --set allowAdHocKialiImage=true
    kiali-operator \
    kiali/kiali-operator

Notice the --set allowAdHocKialiImage=true which allows specifying a custom image in the Kiali CR. For security reasons, this is disabled by default.

Then, when creating the Kiali CR, use the following attributes:

spec:
  deployment:
    image_name: your.custom.registry/owner/kiali-server-repo
    image_version: your_custom_tag

Change the default image

As explained earlier, when you install the Kiali Operator, it will be configured to install a Kiali Server whose image will be pulled from quay.io and whose version will be the same as the operator. You can ask the operator to use a different image by setting spec.deployment.image_name and spec.deployment.image_version within the Kiali CR (as explained above).

However, you may wish to alter this default behavior exhibited by the operator. In other words, you may want the operator to install a different Kiali Server image by default. For example, if you have an air-gapped environment with its own image registry that contains its own copy of the Kiali Server image, you will want the operator to install a Kiali Server that uses that image by default, as opposed to quay.io/kiali/kiali. By configuring the operator to do this, you will not force the authors of Kiali CRs to have to explicitly define the spec.deployment.image_name setting and you will not need to enable the allowAdHocKialiImage setting in the operator.

To change the default Kiali Server image installed by the operator, set the environment variable RELATED_IMAGE_kiali_default in the Kiali Operator deployment. The value of that environment variable must be the full image tag in the form repoName/orgName/imageName:versionString (e.g. my.internal.registry.io/mykiali/mykialiserver:v1.50.0). You can do this when you install the operator via helm:

$ helm install \
    --namespace kiali-operator \
    --create-namespace \
    --set "env[0].name=RELATED_IMAGE_kiali_default" \
    --set "env[0].value=my.internal.registry.io/mykiali/mykialiserver:v1.50.0" \
    kiali-operator \
    kiali/kiali-operator

Development Install

This option installs the latest Kiali Operator and Kiali Server images which are built from the master branches of Kiali GitHub repositories. This option is good for demo and development installations.

helm install \
  --set cr.create=true \
  --set cr.namespace=istio-system \
  --set cr.spec.deployment.image_version=latest \
  --set image.tag=latest \
  --namespace kiali-operator \
  --create-namespace \
  kiali-operator \
  kiali/kiali-operator

1.2.8 - Example Install

Installing two Kiali servers via the Kiali Operator.

This is a quick example of installing Kiali. This example will install the operator and two Kiali Servers - one server will require the user to enter credentials at a login screen in order to obtain read-write access and the second server will allow anonymous read-only access.

For this example, assume there is a Minikube Kubernetes cluster running with an Istio control plane installed in the namespace istio-system and the Istio Bookinfo Demo installed in the namespace bookinfo:

$ kubectl get deployments.apps -n istio-system
NAME                   READY   UP-TO-DATE   AVAILABLE   AGE
grafana                1/1     1            1           8h
istio-egressgateway    1/1     1            1           8h
istio-ingressgateway   1/1     1            1           8h
istiod                 1/1     1            1           8h
jaeger                 1/1     1            1           8h
prometheus             1/1     1            1           8h

$ kubectl get deployments.apps -n bookinfo
NAME             READY   UP-TO-DATE   AVAILABLE   AGE
details-v1       1/1     1            1           21m
productpage-v1   1/1     1            1           21m
ratings-v1       1/1     1            1           21m
reviews-v1       1/1     1            1           21m
reviews-v2       1/1     1            1           21m
reviews-v3       1/1     1            1           21m

Install Kiali Operator via Helm Chart

First, the Kiali Operator will be installed in the kiali-operator namespace using the operator helm chart:

$ helm repo add kiali https://kiali.org/helm-charts
$ helm repo update kiali
$ helm install \
    --namespace kiali-operator \
    --create-namespace \
    kiali-operator \
    kiali/kiali-operator

Install Kiali Server via Operator

Next, the first Kiali Server will be installed. This server will require the user to enter a Kubernetes token in order to log into the Kiali dashboard and will provide the user with read-write access. To do this, a Kiali CR will be created that looks like this (file: kiali-cr-token.yaml):

apiVersion: kiali.io/v1alpha1
kind: Kiali
metadata:
  name: kiali
  namespace: istio-system
spec:
  auth:
    strategy: "token"
  deployment:
    cluster_wide_access: false
    discovery_selectors:
      default:
      - matchLabels:
          kubernetes.io/metadata.name: bookinfo
    view_only_mode: false
  server:
    web_root: "/kiali"

This Kiali CR will command the operator to deploy the Kiali Server in the same namespace where the Kiali CR is (istio-system). The operator will configure the server to: respond to requests to the web root path of /kiali, enable read-write access, use the authentication strategy of token, and be given access to the bookinfo namespace:

$ kubectl apply -f kiali-cr-token.yaml

Get the Status of the Installation

The status of a particular Kiali Server installation can be found by examining the status field of its corresponding Kiali CR. For example:

$ kubectl get kiali kiali -n istio-system -o jsonpath='{.status}'

When the installation has successfully completed, the status field will look something like this (when formatted):

$ kubectl get kiali kiali -n istio-system -o jsonpath='{.status}' | jq
{
  "conditions": [
    {
      "ansibleResult": {
        "changed": 21,
        "completion": "2021-10-20T19:17:35.519131",
        "failures": 0,
        "ok": 102,
        "skipped": 90
      },
      "lastTransitionTime": "2021-10-20T19:17:12Z",
      "message": "Awaiting next reconciliation",
      "reason": "Successful",
      "status": "True",
      "type": "Running"
    }
  ],
  "deployment": {
    "discoverySelectorNamespaces": "bookinfo,istio-system",
    "instanceName": "kiali",
    "namespace": "istio-system"
  },
  "environment": {
    "isKubernetes": true,
    "kubernetesVersion": "1.28.0",
    "operatorVersion": "v1.88.0"
  },
  "progress": {
    "duration": "0:00:14",
    "message": "7. Finished all resource creation"
  }
}

Access the Kiali Server UI

The Kiali Server UI is accessed by pointing a browser to the Kiali Server endpoint and requesting the web root /kiali:

xdg-open http://$(minikube ip)/kiali

Because the auth.strategy was set to token, that URL will display the Kiali login screen that will require a Kubernetes token in order to authenticate with the server. For this example, you can use the token that belongs to the Kiali service account itself:

$ kubectl get secret -n istio-system $(kubectl get sa kiali-service-account -n istio-system -o jsonpath='{.secrets[0].name}') -o jsonpath='{.data.token}' | base64 -d

The output of that command above can be used to log into the Kiali login screen.

Install a Second Kiali Server

The second Kiali Server will next be installed. This server will not require the user to enter any login credentials but will only provide a view-only look at the service mesh. To do this, a Kiali CR will be created that looks like this (file: kiali-cr-anon.yaml):

apiVersion: kiali.io/v1alpha1
kind: Kiali
metadata:
  name: kiali
  namespace: kialianon
spec:
  installation_tag: "Kiali - View Only"
  auth:
    strategy: "anonymous"
  deployment:
    cluster_wide_access: false
    discovery_selectors:
      default:
      - matchLabels:
          kubernetes.io/metadata.name: bookinfo
    view_only_mode: true
    instance_name: "kialianon"
  server:
    web_root: "/kialianon"

This Kiali CR will command the operator to deploy the Kiali Server in the same namespace where the Kiali CR is (kialianon). The operator will configure the server to: respond to requests to the web root path of /kialianon, disable read-write access, not require the user to authenticate, have a unique instance name of kialianon and be given access to the bookinfo namespace. The Kiali UI will also show a custom title in the browser tab so the user is aware they are looking at a “view only” Kiali dashboard. The unique deployment.instance_name is needed in order for this Kiali Server to be able to share access to the Bookinfo application with the first Kiali Server.

$ kubectl create namespace kialianon
$ kubectl apply -f kiali-cr-anon.yaml

The UI for this second Kiali Server is accessed by pointing a browser to the Kiali Server endpoint and requesting the web root /kialianon. Note that no credentials are required to gain access to this Kiali Server UI because auth.strategy was set to anonymous; however, the user will not be able to modify anything via the Kiali UI - it is strictly “view only”:

xdg-open http://$(minikube ip)/kialianon

Reconfigure Kiali Server

A Kiali Server can be reconfigured by simply editing its Kiali CR. The Kiali Operator will perform all the necessary tasks to complete the reconfiguration and reboot the Kiali Server pod when necessary. For example, to change the web root for the Kiali Server:

$ kubectl patch kiali kiali -n istio-system --type merge --patch '{"spec":{"server":{"web_root":"/specialkiali"}}}'

The Kiali Operator will update the necessary resources (such as the Kiali ConfigMap) and will reboot the Kiali Server pod to pick up the new configuration.

Uninstall Kiali Server

To uninstall a Kiali Server installation, simply delete the Kiali CR. The Kiali Operator will then perform all the necessary tasks to remove all remnants of the associated Kiali Server.

kubectl delete kiali kiali -n istio-system

Uninstall Kiali Operator

To uninstall the Kiali Operator, use helm uninstall and then manually remove the Kiali CRD.

You must delete all Kiali CRs in the cluster prior to uninstalling the Kiali Operator. If you fail to do this, uninstalling the operator will hang and remnants of Kiali Server installations will remain in your cluster and you will be required to perform some manual steps to clean it up.

$ kubectl delete kiali --all --all-namespaces
$ helm uninstall --namespace kiali-operator kiali-operator
$ kubectl delete crd kialis.kiali.io

1.3 - Deployment Options

Simple configuration options related to the Kiali deployment, like the installation namespace, logger settings, resource limits and scheduling options for the Kiali pod.

There are other, more complex deployment settings, described in dedicated pages:

All examples on this page are focused on the Kiali CR (when installing via the Kiali operator). Remember that Helm charts mirror all these configurations.

Kiali and Istio installation namespaces

By default, the Kiali operator installs Kiali in the same namespace where the Kiali CR is created. However, it is possible to specify a different namespace for installation:

spec:
  deployment:
    namespace: "custom-kiali-namespace"

Log level and format

By default, Kiali will print up to INFO-level messages in simple text format. You can change the log level, output format, and time format as in the following example:

spec:
  deployment:
    logger:
      # Supported values are "trace", "debug", "info", "warn", "error" and "fatal"
      log_level: error  
      # Supported values are "text" and "json".
      log_format: json  
      time_field_format: "2006-01-02T15:04:05Z07:00"

The syntax for the time_field_format is the same as the Time.Format function of the Go language.

The json format is useful if you are parsing logs of your applications for further processing.

In Kiali, there are some special logs called audit logs that are emitted each time a user creates, updates or deletes a resource through Kiali. Audit logs are INFO-level messages and are enabled by default. If audit logs are too verbose, you can disable them without reducing the log level as follows:

spec:
  server:
    audit_log: false

Kiali instance name

If you plan to install more than one Kiali instance on the same cluster, you may need to configure an instance name to avoid conflicts on created resources:

spec:
  deployment:
    instance_name: "secondary"

The instance_name will be used as a prefix for all created Kiali resources. The exception is the kiali-signing-key secret which will always have the same name and will be shared on all deployments of the same namespace, unless you specify a custom secret name.

Since the instance_name will be used as a name prefix in resources, it must follow Kubernetes naming constraints.

Since Kubernetes resources cannot be renamed, you cannot change the instance_name of an existing Kiali installation. The workaround is to uninstall Kiali and re-install with the desired instance_name.

Resource requests and limits

You can set the amount of resources available to Kiali using the spec.deployment.resources attribute, like in the following example:

spec:
  deployment:
    resources:
      requests:
        memory: "128Mi"
        cpu: "100m"
      limits:
        memory: "1Gi"
        cpu: "500m"

Please, read the Managing Resources for Containers section in the Kubernetes documentation for more details about possible configurations.

Custom labels and annotations on the Kiali pod and service

Although some labels and annotations are set on the Kiali pod and on its service (depending on configurations), you can add additional ones. For the pod, use the spec.deployment.pod_labels and spec.deployment.pod_annotations attributes. For the service, you can only add annotations using the spec.deployment.service_annotations attribute. For example:

spec:
  deployment:
    pod_annotations:
      a8r.io/repository: "https://github.com/kiali/kiali"
    pod_labels:
      sidecar.istio.io/inject: "true"
    service_annotations:
      a8r.io/documentation: "https://kiali.io/docs/installation/deployment-configuration"

Kiali page title (browser title bar)

If you have several Kiali installations and you are using them at the same time, there are good chances that you will want to identify each Kiali by simply looking at the browser’s title bar. You can set a custom text in the title bar with the following configuration:

spec:
  installation_tag: "Kiali West"

The installation_tag is any human readable text of your desire.

Kubernetes scheduler settings

Replicas and automatic scaling

By default, only one replica of Kiali is deployed. If needed, you can change the replica count like in the following example:

spec:
  deployment:
    replicas: 2

If you prefer automatic scaling, creation of an HorizontalPodAutoscaler resource is supported. For example, the following configuration automatically scales Kiali based on CPU utilization:

spec:
  deployment:
    hpa:
      api_version: "autoscaling/v1"
      spec:
        minReplicas: 1
        maxReplicas: 2
        targetCPUUtilizationPercentage: 80

You must omit the scaleTargetRef field of the HPA spec, because this field will be populated by the Kiali operator (or by Helm) depending on other configuration.

Read the Kubernetes Horizontal Pod Autoscaler documentation to learn more about the HPA.

Allocating the Kiali pod to specific nodes of the cluster

You can constrain the Kiali pod to run on a specific set of nodes by using some of the standard Kubernetes scheduler configurations.

The simplest option is to use the nodeSelector configuration which you can configure like in the following example:

spec:
  deployment:
    node_selector:
      worker-type: infra

You can also use the affinity/anti-affinity native Kubernetes feature if you prefer its more expressive syntax, or if you need more complex matching rules. The following is an example for configuring node affinity:

spec:
  deployment:
    affinity:
      node:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
          - matchExpressions:
            - key: worker-type
              operator: In
              values:
              - infra

Similarly, you can also configure pod affinity and pod anti-affinity using the spec.deployment.affinity.pod and spec.deployment.affinity.pod_anti attributes.

Finally, if you want to run Kiali in a node with taints, the following is an example to configure tolerations:

spec:
  deployment:
    tolerations: # Allow to run Kiali in a tainted master node
    - key: "node-role.kubernetes.io/master"
      operator: "Exists"
      effect: "NoSchedule"

Read the following Kubernetes documentation to learn more about these configurations:

Priority class of the Kiali pod

If you are using priority classes in your cluster, you can specify the priority class that will be set on the Kiali pod. For example:

spec:
  deployment:
    priority_class_name: high-priority

For more information about priority classes, read Pod Priority and Preemption in the Kubernetes documentation.

Adding host aliases to the Kiali pod

If you need to provide some static hostname resolution in the Kiali pod, you can use HostAliases to add entries to the /etc/hosts file of the Kiali pod, like in the following example:

spec:
  deployment:
   host_aliases:
   - ip: 192.168.1.100
     hostnames:
     - "foo.local"
     - "bar.local"

HTTP server

Kiali is served over HTTP. You can configure a few options of the HTTP server. The following are the defaults, but you can change them to suit your needs.

spec:
  server:
    # Listen/bind address of the HTTP server. By default it is empty, which means to
    # listen on all interfaces.
    address: ""

    # Listening port of the HTTP server. If you change it, also Kiali's Kubernetes
    # Service is affected to use this port.
    port: 20001

    # Use GZip compression for responses larger than 1400 bytes. You may want to disable
    # compression if you are exposing Kiali via a reverse proxy that is already
    # doing compression.
    gzip_enabled: true

    # For development purposes only. Controls if "Cross-Origin Resourse Sharing" is
    # enabled. 
    cors_allow_all: false

There is one additional spec.server.web_root option that affects the HTTP server, but that one is described in the Specifying route settings section of the Instalation guide.

Metrics server

Kiali emits metrics that can be collected by Prometheus. Most of these metrics are performance measurements.

The metrics server is enabled by default and listens on port 9090:

spec:
  server:
    metrics_enabled: true
    metrics_port: 9090

The bind address is the same as the HTTP server. Thus, make sure that the HTTP Server and the metrics server are not configured to the same port.

2 - Configuration

How to configure Kiali to fit your needs.

The pages in this Configuration section describe most available options for managing and customizing your Kiali installation.

Unless noted, it is assumed that you are using the Kiali operator and that you are managing the Kiali installation through a Kiali CR. The provided YAML snippets for configuring Kiali should be placed in your Kiali CR. For example, the provided configuration snippet for setting up the Anonymous authentication strategy is the following:

spec:
  auth:
    strategy: anonymous

You will need to take this YAML snippet and apply it to your Kiali CR. As an example, an almost minimal Kiali CR using the previous configuration snippet would be the following:

apiVersion: kiali.io/v1alpha1
kind: Kiali
metadata:
  namespace: kiali-namespace
  name: kiali
spec:
  deployment:
    namespace: kiali-namespace
  auth:
    strategy: anonymous

Then, you can save the finished YAML file and apply it with kubectl apply -f.

It is recommended that you read The Kiali CR and the Example Install pages of the Installation Guide for more information about using the Kiali CR.

Also, for reference, see Kiali CR Reference which documents all available options.

2.1 - Authentication Strategies

Choosing and configuring the appropriate authentication strategy.

Kiali supports five authentication mechanisms.

The default authentication strategy for OpenShift clusters is openshift.
The default authentication strategy for all other Kubernetes clusters is token.

All mechanisms other than anonymous support limiting per-user namespace access control.

For multi-cluster, only anonymous and openid are currently supported.

Read the dedicated page of each authentication strategy to learn more.

2.1.1 - Anonymous strategy

Access Kiali with no authentication.

Introduction

The anonymous strategy removes any authentication requirement. Users will have access to Kiali without providing any credentials.

Although the anonymous strategy doesn’t provide any access protection, it’s valid for some use-cases. Some examples known from the community:

Exposing Kiali through a reverse proxy, where the reverse proxy is providing a custom authentication mechanism.
Exposing Kiali on an already limited network of trusted users.
When Kiali is accessed through kubectl port-forward or alike commands that allow usage of the cluster’s RBAC capabilities to limit access.
When developing Kiali, where a developer has a private instance on his own machine.

It’s worth to emphasize that the anonymous strategy will leave Kiali unsecured. If you are using this option, make sure that Kiali is available only to trusted users, or access is protected by other means.

Set-up

To use the anonymous strategy, use the following configuration in the Kiali CR:

spec:
  auth:
    strategy: anonymous

The anonymous strategy doesn’t have any additional configuration.

Access control

When using the anonymous strategy, the content displayed in Kiali is based on the permissions of the Kiali service account. By default, the Kiali service account has cluster wide access and will be able to display everything in the cluster.

OpenShift

If you are running Kiali in OpenShift, access can be customized by changing privileges to the Kiali ServiceAccount. For example, to reduce permissions to individual namespaces, first, remove the cluster-wide permissions granted by default:

  oc delete clusterrolebindings kiali

Then grant the kiali role only in needed namespaces. For example:

  oc adm policy add-role-to-user kiali system:serviceaccount:istio-system:kiali-service-account -n ${NAMESPACE}

View only

You can tell the Kiali Operator to install Kiali in “view only” mode (this does work for either OpenShift or Kubernetes). You do this by setting the view_only_mode to true in the Kiali CR, which allows Kiali to read service mesh resources found in the cluster, but it does not allow any change:

spec:
  deployment:
    view_only_mode: true

2.1.2 - Header strategy

Run Kiali behind a reverse proxy responsible for injecting the user’s token, or a token with impersonation.

Introduction

The header strategy assumes a reverse proxy is in front of Kiali, such as OpenUnison or OAuth2 Proxy, injecting the user’s identity into each request to Kiali as an Authorization header. This token can be an OpenID Connect token or any other token the cluster recognizes.

In addition to a user token, the header strategy supports impersonation headers. If the impersonation headers are present in the request, then Kiali will act on behalf of the user specified by the impersonation (assuming the token supplied in the Authorization header is authorized to do so).

The header strategy allows for namespace access control.

The header strategy is only supported for single cluster.

Set-up

The header strategy will work with any Kubernetes cluster. The token provided must be supported by that cluster. For instance, most “on-prem” clusters support OpenID Connect, but cloud hosted clusters do not. For clusters that don’t support a token, the impersonation headers can be injected by the reverse proxy.

spec:
  auth:
    strategy: header

The header strategy doesn’t have any additional configuration.

HTTP Header

The header strategy looks for a token in the Authorization HTTP header with the Bearer prefix. The HTTP header should look like:

Authorization: Bearer TOKEN

Where TOKEN is the appropriate token for your cluster. This TOKEN will be submitted to the API server via a TokenReview to validate the token ONLY on the first access to Kiali. On subsequent calls the TOKEN is passed through directly to the API server.

Security Considerations

Network Policies

A policy should be put in place to make sure that the only “client” for Kiali is the authenticating reverse proxy. This helps limit potential abuse and ensures that the authenticating reverse proxy is the source of truth for who accessed Kiali.

Short Lived Tokens

The authenticating reverse proxy should inject a short lived token in the Authorization header. A shorter lived token is less likely to be abused if leaked. Kiali will take whatever token is passed into the reqeuest, so as tokens are regenerated Kiali will use the new token.

Impersonation

TokenRequest API

The authenticating reverse proxy should use the TokenRequest API instead of static ServiceAccount tokens when possible while using impersonation. The ServiceAccount that can impersonate users and groups is privileged and having it be short lived cuts down on the possibility of a token being leaked while it’s being passed between different parts of the infrastructure.

Drop Incoming Impersonation Headers

The authenticating proxy MUST drop any headers it receives from a remote client that match the impersonation headers. Not only do you want to make sure that the authenticating proxy can’t be overriden on which user to authenticate, but also what groups they’re a member of.

2.1.3 - OpenID Connect strategy

Access Kiali requiring authentication through a third-party OpenID Connect provider.

Introduction

The openid authentication strategy lets you integrate Kiali to an external identity provider that implements OpenID Connect, and allows users to login to Kiali using their existing accounts of a third-party system.

If your Kubernetes cluster is also integrated with your OpenId provider, then Kiali’s openid strategy can offer namespace access control.

Kiali only supports the authorization code flow of the OpenId Connect spec.

Requirements

The Kiali’s signing key needs to be 16, 24 or 32 byte long. If you install Kiali via the operator and don’t set a custom signing key, the operator should create a 16 byte long signing key.

If you don’t need namespace access control support, you can use any working OpenId Server where Kiali can be configured as a client application.

If you do need namespace access control support, you need either:

A Kubernetes cluster configured with OpenID connect integration, which results in the API server accepting tokens issued by your identity provider.
A replacement or reverse proxy for the Kubernetes cluster API capable of handling the OIDC authentication.

The first option is preferred if you can manipulate your cluster API server startup flags, which will result in your cluster to also be integrated with the external OpenID provider.

The second option is provided for cases where you are using a managed Kubernetes and your cloud provider does not support configuring OpenID integration. Kiali assumes an implementation of a Kubernetes API server. For example, a community user has reported to successfully configure Kiali’s OpenID strategy by using kube-oidc-proxy which is a reverse proxy that handles the OpenID authentication and forwards the authenticated requests to the Kubernetes API.

Set-up with namespace access control support

Assuming you already have a working Kubernetes cluster with OpenId integration (or a working alternative like kube-oidc-proxy), you should already had configured an application or a client in your OpenId server (some cloud providers configure this app/client automatically for you). You must re-use this existing application/client by adding the root path of your Kiali instance as an allowed/authorized callback URL. If the OpenID server provided you a client secret for the application/client, or if you had manually set a client secret, issue the following command to create a Kubernetes secret holding the OpenId client secret:

kubectl create secret generic kiali --from-literal="oidc-secret=$CLIENT_SECRET" -n $NAMESPACE

where $NAMESPACE is the namespace where you installed Kiali and $CLIENT_SECRET is the secret you configured or provided by your OpenId Server. If Kiali is already running, you may need to restart the Kiali pod so that the secret is mounted in Kiali.

It’s worth emphasizing that to configure OpenID integration you must re-use the OpenID application/client that you created for your Kubernetes cluster. If you create a new application/client for Kiali in your OpenId server, Kiali will fail to properly authenticate users.

Then, to enable the OpenID Connect strategy, the minimal configuration you need to set in the Kiali CR is like the following:

spec:
  auth:
    strategy: openid
    openid:
      client_id: "kiali-client"
      issuer_uri: "https://openid.issuer.com"

This assumes that your Kubernetes cluster is configured with OpenID Connect integration. In this case, the client-id and issuer_uri attributes must match the --oidc-client-id and --oidc-issuer-url flags used to start the cluster API server. If these values don’t match, users will fail to login to Kiali.

If you are using a replacement or a reverse proxy for the Kubernetes API server, the minimal configuration is like the following:

spec:
  auth:
    strategy: openid
    openid:
      api_proxy: "https://proxy.domain.com:port"
      api_proxy_ca_data: "..."
      client_id: "kiali-client"
      issuer_uri: "https://openid.issuer.com"

The value of client-id and issuer_uri must match the values of the configuration of your reverse proxy or cluster API replacement. The api_proxy attribute is the URI of the reverse proxy or cluster API replacement (only HTTPS is allowed). The api_proxy_ca_data is the public certificate authority file encoded in a base64 string, to trust the secure connection.

Set-up with no namespace access control support

Register Kiali as a client application in your OpenId Server. Use the root path of your Kiali instance as the callback URL. If the OpenId Server provides you a client secret, or if you manually set a client secret, issue the following command to create a Kubernetes secret holding the OpenId client secret:

kubectl create secret generic kiali --from-literal="oidc-secret=$CLIENT_SECRET" -n $NAMESPACE

Then, to enable the OpenID Connect strategy, the minimal configuration you need to set in the Kiali CR is like the following:

spec:
  auth:
    strategy: openid
    openid:
      client_id: "kiali-client"
      disable_rbac: true
      issuer_uri: "https://openid.issuer.com"

As namespace access control is disabled, all users logging into Kiali will share the same cluster-wide privileges.

Additional configurations

Configuring the displayed user name

The Kiali front-end will, by default, retrieve the string of the sub claim of the OpenID token and display it as the user name. You can customize which field to display as the user name by setting the username_claim attribute of the Kiali CR. For example:

spec:
  auth:
    openid:
      username_claim: "email"

If you enabled namespace access control, you will want the username_claim attribute to match the --oidc-username-claim flag used to start the Kubernetes API server, or the equivalent option if you are using a replacement or reverse proxy of the API server. Else, any user-friendly claim will be OK as it is purely informational.

Configuring requested scopes

By default, Kiali will request access to the openid, profile and email standard scopes. If you need a different set of scopes, you can set the scopes attribute in the Kiali CR. For example:

spec:
  auth:
    openid:
      scopes:
      - "openid"
      - "email"
      - "groups"

The openid scope is forced. If you don’t add it to the list of scopes to request, Kiali will still request it from the identity provider.

Configuring authentication timeout

When the user is redirected to the external authentication system, by default Kiali will wait at most 5 minutes for the user to authenticate. After that time has elapsed, Kiali will reject authentication. You can adjust this timeout by setting the authentication_timeout with the number of seconds that Kiali should wait at most. For example:

spec:
  auth:
    openid:
      authentication_timeout: 60 # Wait only one minute.

Configuring allowed domains

Some identity providers use a shared login and regardless of configuring your own application under your domain (or organization account), login can succeed even if the user that is logging in does not belong to your account or organization. Google is an example of this kind of provider.

To prevent foreign users from logging into your Kiali instance, you can configure a list of allowed domains:

spec:
  auth:
    openid:
      allowed_domains:
      - example.com
      - foo.com

The e-mail reported by the identity provider is used for the validation. Login will be allowed if the domain part of the e-mail is listed as an allowed domain; else, the user will be rejected. Naturally, you will need to configure the email scope to be requested.

There is a special case: some identity providers include a hd claim in the id_token. If this claim is present, this is used instead of extracting the domain from the user e-mail. For example, Google Workspace (aka G Suite) includes this hd claim for hosted domains.

Using an OpenID provider with a self-signed certificate

If your OpenID provider is using a self-signed certificate, you can disable certificate validation by setting the insecure_skip_verify_tls to true in the Kiali CR:

spec:
  auth:
    openid:
      insecure_skip_verify_tls: true

You should use self-signed certificates only for testing purposes.

However, if your organization or internal network has an internal trusted certificate authority (CA), and your OpenID server is using a certificate issued by this CA, you can configure Kiali to trust certificates from this CA rather than disabling verification.

See the TLS Configuration page for detailed instructions on configuring custom CA certificates. You can use either the global additional-ca-bundle.pem key (which makes the CA trusted for all HTTPS connections) or the OpenID-specific openid-server-ca.crt key in the kiali-cabundle ConfigMap.

Using an HTTP/HTTPS Proxy

In some network configurations, there is the need to use proxies to connect to the outside world. OpenID requires outside world connections to get metadata and do key validation, so you can configure it by setting the http_proxy and https_proxy keys in the Kiali CR. They use the same format as the HTTP_PROXY and HTTPS_PROXY environment variables.

spec:
  auth:
    openid:
      http_proxy: http://USERNAME:PASSWORD@10.0.1.1:8080/
      https_proxy: https://USERNAME:PASSWORD@10.0.0.1:8080/

Passing additional options to the identity provider

When users click on the Login button on Kiali, a redirection occurs to the authentication page of the external identity provider. Kiali sends a fixed set of parameters to the identity provider to enable authentication. If you need to add an additional set of parameters to your identity provider, you can use the additional_request_params setting of the Kiali CR, which accepts key-value pairs. For example:

spec:
  auth:
    openid:
      additional_request_params:
        prompt: login

The prompt parameter is a standard OpenID parameter. When the login value is passed in this parameter, the identity provider is instructed to ask for user credentials regardless if the user already has an active session because of a previous login in some other system.

If your OpenId provider supports other non-standard parameters, you can specify the ones you need in this additional_request_params setting.

Take into account that you should not add the client_id, response_type, redirect_uri, scope, nonce nor state parameters to this list. These are already in use by Kiali and some already have a dedicated setting.

Provider-specific instructions

Using with Keycloak

When using OpenId with Keycloak, you will need to enable the Standard Flow Enabled option on the Client (in the Administration Console):

Client configuration screen on Keycloak

The Standard Flow described on the options is the same as the authorization code flow from the rest of the documentation.

Using with Google Cloud Platform / GKE OAuth2

If you are using Google Cloud Platform (GCP) and its products such as Google Kubernetes Engine (GKE), it should be straightforward to configure Kiali’s OpenID strategy to authenticate using your Google credentials.

First, you’ll need to go to your GCP Project and to the Credentials screen which is available at (Menu Icon) > APIs & Services > Credentials.

Credentials Screen on in GCP Project

On the Credentials screen you can select to create a new OAuth client ID.

Select OAuth on Credentials Screen

If you’ve never setup the OAuth consent screen you will need to do that before you can create an OAuth client ID. On screen you’ll have multiple warnings and prompts to walk you through this.

On the Create OAuth client ID screen, set the Application type to Web Application and enter a name for your key.

Select Web Application

Then enter in the Authorized Javascript origins and Authorized redirect URIs for your project. You can enter in localhost as appropriate during testing. You can also enter multiple URIs as appropriate.

Enter URLs

After clicking Create you’ll be shown your newly minted client id and secret. These are important and needed for your Kiali CR yaml and Kiali secrets files.

Get Credentials

You’ll need to update your Kiali CR file to include the following auth block.

spec:
  auth:
    strategy: "openid"
    openid:
      client_id: "<your client id from GCP>"
      disable_rbac: true
      issuer_uri: "https://accounts.google.com"
      scopes: ["openid", "email"]
      username_claim: "email"

Don’t get creative here. The issuer_uri should be https://accounts.google.com.

Finally you will need to create a secret, if you don’t have one already, that sets the oidc-secret for the openid flow.

apiVersion: v1
kind: Secret
metadata:
  name: kiali
  namespace: istio-system
  labels:
    app: kiali
type: Opaque
data:
  oidc-secret: "<base64 encode your client secret from GCP and enter here>"

Once all these settings are complete just set your Kiali CR and the Kiali secret to your cluster. You may need to refresh your Kiali Pod to set the Secret if you add the Secret after the Kiali pod is created.

Using with OpenShift and an external OIDC provider

Starting with OpenShift 4.20, you can configure your OpenShift cluster to authenticate users against an external OpenID Connect (OIDC) provider instead of using the built-in OAuth server. This is sometimes called “Bring Your Own OIDC” (BYO OIDC). When OpenShift is configured this way, Kiali can use the openid authentication strategy with full namespace access control support.

This section applies when you want to use an external OIDC provider (such as Keycloak, Okta, Auth0, or others) with OpenShift. If you want to use OpenShift’s built-in OAuth server, use the openshift authentication strategy instead.

Prerequisites

OpenShift 4.20 or later
An external OIDC provider configured and accessible from your OpenShift cluster
The OIDC provider must be configured as an authentication source for both OpenShift and Kiali (they share the same provider)
A certificate-based kubeconfig or long-lived service account token for emergency cluster access (the built-in OAuth will be disabled)

Step 1: Configure OpenShift for external OIDC authentication

First, configure your OpenShift cluster to use your external OIDC provider. This involves modifying the cluster’s Authentication resource to specify your OIDC provider details.

Refer to the official OpenShift documentation for detailed instructions: Enabling direct authentication with an external OIDC identity provider.

The key configuration elements include:

Issuer URL: The URL of your OIDC provider
Client ID: The OAuth2 client ID registered with your OIDC provider
Audiences: The list of acceptable audiences for tokens (must include your Kiali client ID)
Username claim mapping: How the OIDC token claims map to Kubernetes usernames
Username prefix: Optional prefix to distinguish OIDC users (e.g., oidc:)
CA certificate: If your OIDC provider uses a private CA
webhookTokenAuthenticator: null: Must be set when type is OIDC

When OpenShift is configured with external OIDC authentication, the built-in OAuth server is disabled. This means:

Users cannot log in using OpenShift’s standard login page
The OAuthClient API becomes unavailable
Keep a certificate-based kubeconfig (or other long-lived admin credentials) available for emergency access because the normal OAuth login paths and OAuth APIs are unavailable in this mode

Ensure your OIDC provider is properly configured and you have set up RBAC policies before enabling external OIDC authentication.

Step 2: Configure user RBAC

When using external OIDC with OpenShift, user identities in Kubernetes are derived from the OIDC token claims. OpenShift typically adds a configurable prefix to the username (e.g., oidc:) to distinguish OIDC users from other identity sources.

For example, if your OIDC provider returns user@example.com in the email claim and you configured a prefix of oidc:, the Kubernetes username becomes oidc:user@example.com.

Create RBAC resources to grant users access to the namespaces they need. See Namespace access control for details on the required privileges.

Example Role and RoleBinding to grant a user access to the istio-system namespace:

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: kiali-user-access
  namespace: istio-system
rules:
- apiGroups: [""]
  resources:
  - namespaces
  - pods/log
  verbs:
  - get
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: kiali-user-access
  namespace: istio-system
subjects:
- kind: User
  name: "oidc:user@example.com"  # Use the prefixed username
  apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: Role
  name: kiali-user-access
  apiGroup: rbac.authorization.k8s.io

The username prefix (e.g., oidc:) is configured in OpenShift’s Authentication resource under spec.oidcProviders[].claimMappings.username.prefix.prefixString. Make sure your RBAC resources use the same prefixed username format.

Step 3: Create the OIDC client secret

If your OIDC provider requires a client secret, create a Kubernetes secret to store it:

oc create secret generic kiali --from-literal="oidc-secret=$CLIENT_SECRET" -n istio-system

Replace $CLIENT_SECRET with the client secret from your OIDC provider.

Step 4: Configure the CA certificate (if needed)

If your OIDC provider uses a certificate issued by a private CA (not a public CA), you need to configure Kiali to trust it. Create a ConfigMap with the CA certificate:

apiVersion: v1
kind: ConfigMap
metadata:
  name: kiali-cabundle
  namespace: istio-system
data:
  openid-server-ca.crt: |
    -----BEGIN CERTIFICATE-----
    MIIDxTCCAq2gAwIBAgIQAqxcJmoLQ...
    ... (your OIDC provider's CA certificate) ...
    -----END CERTIFICATE-----

See TLS Configuration for more details on configuring custom CA certificates.

Step 5: Configure the Kiali CR

Configure Kiali to use the openid authentication strategy. The client_id and issuer_uri must match the values configured in OpenShift’s Authentication resource:

spec:
  auth:
    strategy: openid
    openid:
      client_id: "kiali-client"
      issuer_uri: "https://your-oidc-provider.example.com"
      scopes:
      - "openid"
      - "email"
      username_claim: "email"

The client_id must be listed in the audiences array of your OpenShift OIDC configuration, and the issuer_uri must exactly match the issuer URL configured in OpenShift. If these don’t match, authentication will fail.

Important configuration notes:

username_claim: Should match the claim mapping configured in OpenShift (commonly email or preferred_username)
scopes: Request the scopes that provide the claims you need (typically openid and email)
disable_rbac: Do not set this to true if you want per-user namespace access control. When disable_rbac is false (the default), Kiali uses the user’s OIDC token for Kubernetes API calls, enabling per-user RBAC.

Complete example

Here’s a complete example of the Kiali CR configuration for OpenShift with an external OIDC provider:

apiVersion: kiali.io/v1alpha1
kind: Kiali
metadata:
  name: kiali
  namespace: istio-system
spec:
  auth:
    strategy: openid
    openid:
      client_id: "kiali-client"
      issuer_uri: "https://your-oidc-provider.example.com"
      scopes:
      - "openid"
      - "email"
      username_claim: "email"

With the supporting resources:

# OIDC client secret (if required by your provider)
apiVersion: v1
kind: Secret
metadata:
  name: kiali
  namespace: istio-system
type: Opaque
stringData:
  oidc-secret: "your-client-secret-here"
---
# CA certificate (if using a private CA)
apiVersion: v1
kind: ConfigMap
metadata:
  name: kiali-cabundle
  namespace: istio-system
data:
  openid-server-ca.crt: |
    -----BEGIN CERTIFICATE-----
    ... (your CA certificate) ...
    -----END CERTIFICATE-----    
---
# RBAC for a user (repeat for each user/namespace combination)
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: kiali-user-access
  namespace: istio-system
rules:
- apiGroups: [""]
  resources:
  - namespaces
  - pods/log
  verbs:
  - get
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: kiali-user-access
  namespace: istio-system
subjects:
- kind: User
  name: "oidc:user@example.com"
  apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: Role
  name: kiali-user-access
  apiGroup: rbac.authorization.k8s.io

Using with Azure: AKS and AAD

The OpenID authentication strategy can be used with Azure Kubernetes Service (AKS) and Azure Active Directory (AAD) with Kiali versions 1.33 and later. Prior Kiali versions do not support namespace access control on Azure.

AKS has support for a feature named AKS-managed Azure Active Directory, which enables integration between AKS and AAD. This has the advantage that users can use their AAD credentials to access AKS clusters and can also use Kubernetes RBAC features to assign privileges to AAD users.

However, Azure is implementing this integration via the Kubernetes Webhook Token Authentication rather than via the Kubernetes OpenID Connect Tokens authentication (see the Azure AD integration section in AKS Concepts documentation). Because of this difference, authentication in AKS behaves slightly different from a standard OpenID setup, but Kiali’s OpenID authentication strategy can still be used with namespace access control support by following the next steps.

First, enable the AAD integration on your AKS cluster. See the official AKS documentation to learn how. Once it is enabled, your AKS panel should show the following:

AKS-managed AAD is enabled,700

Create a web application for Kiali in your Azure AD panel:

Go to AAD > App Registration, create an application with a redirect url like https://<your-kiali-url>/kiali
Go to Certificates & secrets and create a client secret.
1. After creating the client secret, take note of the provided secret. Create a Kubernetes secret in your cluster as mentioned in the Set-up with namespace access control support section. Please, note that the suggested name for the Kubernetes Secret is kiali. If you want to customize the secret name, you will have to specify your custom name in the Kiali CR. See: secret_name in Kial CR Reference.
Go to API Permissions and press the Add a permission button. In the new page that appears, switch to the APIs my organization uses tab.
Type the following ID in the search field: 6dae42f8-4368-4678-94ff-3960e28e3630 (this is a shared ID for all Azure clusters). And select the resulting entry.
Select the Delegated permissions square.
Select the user.read permission.
Go to Authentication and make sure that the Access tokens checkbox is ticked.

Access tokens enabled

Then, create or modify your Kiali CR and include the following settings:

spec:
  auth:
    strategy: "openid"
    openid:
      client_id: "<your Kiali application client id from Azure>"
      issuer_uri: "https://sts.windows.net/<your AAD tenant id>/"
      username_claim: preferred_username
      api_token: access_token
      additional_request_params:
        resource: "6dae42f8-4368-4678-94ff-3960e28e3630"

You can find your client_id and tenant_id in the Overview page of the Kiali App registration that you just created. See this documentation for more information.

2.1.4 - OpenShift strategy

Access Kiali requiring OpenShift authentication.

Introduction

The openshift authentication strategy is the preferred and default strategy when Kiali is deployed on an OpenShift cluster.

When using the openshift strategy, a user logging into Kiali will be redirected to the login page of the OpenShift console. Once the user provides his OpenShift credentials, he will be redireted back to Kiali and will be logged in if the user has enough privileges.

The openshift strategy supports namespace access control.

The openshift strategy is supported for single and multi-cluster deployments.

Set-up

Since openshift is the default strategy when deploying Kiali in OpenShift, you shouldn’t need to configure anything. If you want to be verbose, use the following configuration in the Kiali CR:

spec:
  auth:
    strategy: openshift

The Kiali operator will make sure to setup the needed OpenShift OAuth resources to register Kiali as a client for the most common use-cases. The openshift strategy does have a few configuration settings that most people will never need but are available in case you have a situation where the customization is needed. See the Kiali CR Reference page for the documentation on those settings.

Multi-Cluster

There are some things to know when using the openshift strategy with Kiali in a multi-cluster environment.

Consistent Kiali Namespace and Instance-Name

The default namespace for Kiali is istio-system. But many users prefer to use a dedicated namespace for Kiali, such as kiali, kiali-server, etc. In a multi-cluster environment Kiali must be deployed in the same namespace on each cluster. Clusters that don’t have a Kiali deployment must still provide the namespace, to hold the remote cluster resources.

The default instance-name for kiali is kiali. Any change to the default must also be made consistently across all clusters.

Assuming Kiali is installed via the Kiali Operator. Any customization would be done via the following CR settings:

spec.deployment.namespace
spec.deployment.instance_name

It is recommended that the Kiali Operator be deployed on all clusters, even if Kiali itself is not deployed. This will ensure that the proper namespace and remote cluster resources are created. For clusters without Kiali, requiring only the remote cluster resources (for auth), configure the CR with:

spec.deployment.remote_cluster_resources_only: true

OpenShift OAuthClient Naming

OpenShift OAuth requires an OAuthClient resource on each cluster to be named <instance-name>-<namespace>. For example, if Kiali is installed with the default instance name kiali in namespace istio-system, the OAuthClient on every cluster must be named kiali-istio-system.

Both the Kiali Operator and the Kiali Server helm chart automatically create the OAuthClient with the correct name when they create the remote cluster resources. The kiali-prepare-remote-cluster.sh script also delegates to the Kiali Server helm chart for resource creation and will produce a correctly-named OAuthClient, provided you pass --kiali-resource-name and --remote-cluster-namespace values that match the Kiali instance name and namespace on the cluster where Kiali is deployed. If you are managing resources entirely manually, ensure the OAuthClient on the remote cluster is named consistently with the Kiali instance name and namespace.

If the OAuthClient names do not match across clusters, OAuth authentication will fail.

OAuthClient Redirect URIs for Remote Cluster Resources

When using remote_cluster_resources_only: true on a remote cluster with the openshift auth strategy, the Kiali Operator must create an OAuthClient resource but cannot automatically determine the redirect URI (since there is no Kiali server or route on the remote cluster). You must explicitly specify the redirect URI in the Kiali CR on the remote cluster via spec.auth.openshift.redirect_uris. Without this, the Kiali Operator will fail to reconcile with the error:

Redirect URIs for the Kiali Server OAuthClient are not specified via auth.openshift.redirect_uris;
this is required when creating remote cluster resources with auth.strategy of openshift.

The redirect URI must point back to the Kiali server on the cluster where Kiali is deployed. Critically, the URI must include the remote cluster’s name as a path suffix in the form https://<kiali-route-host>/api/auth/callback/<remote-cluster-name>. This is required so that the OAuth callback can correctly identify which cluster the login is for. Using the base /api/auth/callback path (without the cluster name) will result in the login failing with a http: named cookie not present error.

To determine the correct URI:

If Kiali is already deployed, run this on the cluster where Kiali is deployed: oc get route -l app.kubernetes.io/name=kiali -n <kiali-namespace> -o jsonpath='{..spec.host}'
If Kiali is not yet deployed, you can predict the route hostname from the cluster’s app domain by running this on the cluster where Kiali will be deployed: oc get ingresses.config.openshift.io cluster -o jsonpath='{.spec.domain}'

The Kiali route hostname will be something like kiali-<namespace>.<app-domain>, so the full redirect URI will be something like https://kiali-<namespace>.<app-domain>/api/auth/callback/<remote-cluster-name>, where <remote-cluster-name> is the Istio cluster name of the remote cluster.

For example, if Kiali is deployed in namespace istio-system with instance_name kiali, the app domain is apps.east.example.com, and the remote cluster’s Istio cluster name is west, the Kiali CR on the remote cluster should include:

spec:
  auth:
    openshift:
      redirect_uris:
        - https://kiali-istio-system.apps.east.example.com/api/auth/callback/west
  deployment:
    remote_cluster_resources_only: true

When using the openshift strategy with multiple clusters, users must be logged into each cluster in order to access resources on that cluster. The Kiali UI provides a mechanism to log into remote clusters:

In your browser, navigate to the Kiali UI and log in using your credentials for the cluster where Kiali is deployed.
Once logged in, use the user profile dropdown in the Kiali UI to initiate login to each remote cluster. Kiali will redirect you to the remote cluster’s OpenShift login page.
Log in with your credentials for that remote cluster. You will be redirected back to the Kiali UI. Repeat step 2 for each additional remote cluster until you are logged into all clusters.

Currently, OpenShift OAuth does not provide SSO across clusters. Each cluster requires its own login. If you are having trouble logging into a remote cluster from within Kiali, try starting a fresh private/incognito browser tab to ensure there are no stale OAuth cookies from prior logins to the remote cluster’s OpenShift console.

Using an internal or self-signed certificate

If you have a multi-cluster Kiali deployment and the OAuth server is configured with an external IdP that uses an internal or self-signed certificate, you can configure Kiali to trust the server’s certificate by creating a ConfigMap named kiali-oauth-cabundle containing the CA certificate bundle for the server under the oauth-server-ca.crt key:

Note that if you are deploying Kiali with spec.deployment.instance_name set to a value that is different than the default of kiali, your ConfigMap name needs to be that instance name appended with “-oauth-bundle”. For example, if your instance name is “myserver” then the name of the ConfigMap must be myserver-oauth-cabundle.

apiVersion: v1
kind: ConfigMap
metadata:
  name: kiali-oauth-cabundle
  namespace: istio-system # This is Kiali's install namespace
data:
  oauth-server-ca.crt: <PEM encoded CA root certificate>

Kiali will automatically trust this root certificate for all HTTPS requests (not just OAuth). The certificate is loaded into Kiali’s global certificate pool. Kiali watches for changes to the CA bundle and automatically refreshes without requiring a pod restart. If you have multiple different CAs, for different clusters, include each as a separate block in the bundle.

For most use cases, you can simply add your CA to the kiali-cabundle ConfigMap under the additional-ca-bundle.pem key instead of creating a separate kiali-oauth-cabundle ConfigMap. Both approaches result in the CA being trusted globally.

Insecure setting

You should only use this setting for testing and not in a production environment.

You can disable certificate validation between Kiali and the remote OAuth server(s) by setting insecure_skip_verify_tls to true in the Kiali CR:

spec:
  auth:
    openshift:
      insecure_skip_verify_tls: true

2.1.5 - Token strategy

Access Kiali requiring a Kubernetes ServiceAccount token.

Introduction

The token authentication strategy allows a user to login to Kiali using the token of a Kubernetes ServiceAccount. This is similar to the login view of Kubernetes Dashboard.

The token strategy supports namespace access control.

The token strategy is only supported for single cluster.

Set-up

Since token is the default strategy when deploying Kiali in Kubernetes, you shouldn’t need to configure anything, unless your cluster is OpenShift. If you want to be verbose or if you need to enable the token strategy in OpenShift, use the following configuration in the Kiali CR:

spec:
  auth:
    strategy: token

The token strategy doesn’t have any additional configuration other than the session expiration time.

2.1.6 - Session options

Session timeout and signing key configuration

There are two settings that are available for the user’s session. The first one is the session expiration time, which is only applicable to token and header authentication strategies:

spec:
  login_token:
    # By default, users session expires in 24 hours.
    expiration_seconds: 86400

The session expiration time is the amount of time before the user is asked to extend his session by another cycle. It does not matter if the user is actively using Kiali, the user will be asked if the session should be extended.

The second available option is the signing key configuration, which is unset by default, meaning that a random 16-character signing key will be generated and stored to a secret named kiali-signing-key, in Kiali’s installation namespace:

spec:
  login_token:
    # By default, create a random signing key and store it in
    # a secret named "kiali-signing-key".
    signing_key: ""

If the secret already exists (which may mean a previous Kiali installation was present), then the secret is reused.

The signing key is used on security sensitive data. For example, one of the usages is to sign HTTP cookies related to the user session to prevent session forgery.

If you need to set a custom fixed key, you can pre-create or modify the kiali-signing-key secret:

apiVersion: v1
kind: Secret
metadata:
  namespace: "kiali-installation-namespace"
  name: kiali-signing-key
type: Opaque
data:
  key: "<your signing key encoded in base64>"

The signing key must be 16, 24 or 32 bytes length. Otherwise, Kiali will fail to start.

If you prefer a different secret name for the signing key and/or a different key-value pair of the secret, you can specify your preferred names in the Kiali CR:

spec:
  login_token:
    signing_key: "secret:<secretName>:<secretDataKey>"

It is possible to specify the signing key directly in the Kiali CR, in the spec.login_token.signing_key attribute. However, this should be only for testing purposes. The signing key is sensitive and should be treated like a password that must be protected.

2.2 - Console Customization

Default selections, find and hide presets and custom metric aggregations.

Custom metric aggregations

The inbound and outbound metric pages, in the Metrics settings drop-down, provides an opinionated set of groupings that work both for filtering out metric data that does not match the selection and for aggregating data into series. Each option is backed by a label on the collected Istio telemetry.

It is possible to add custom aggregations, like in the following example:

spec:
  kiali_feature_flags:
    ui_defaults:
      metrics_inbound:
        aggregations:
        - display_name: Istio Network
          label: topology_istio_io_network
        - display_name: Istio Revision
          label: istio_io_rev
      metrics_outbound:
        aggregations:
        - display_name: Istio Revision
          label: istio_io_rev

Notice that custom aggregations for inbound and outbound metrics are defined separately.

You can find some screenshots in Kiali v1.40 feature update blog post.

Default metrics duration and refresh interval

Most Kiali pages show metrics per refresh and refresh interval drop-downs. These are located at the top-right of the page.

Metrics per refresh specifies the time range back from the current instant to fetch metrics and/or distributed tracing data. Also known as the query duration. By default, a 1-minute time range is selected, or the lowest valid setting.

Refresh interval specifies how often Kiali will automatically refresh the data shown. By default, Kiali refreshes data every 60 seconds.

spec:
  kiali_feature_flags:
    ui_defaults:
      # Valid values: 1m, 2m, 5m, 10m, 30m, 1h, 3h, 6h, 12h, 1d, 7d, 30d
      metrics_per_refresh: "1m"

      # Valid values: pause, manual, 10s, 15s, 30s, 1m, 5m, 15m
      refresh_interval: "15s"

User selections won’t persist a reload.

Default namespace selection

By default, when Kiali is accessed by the first time, on most Kiali pages users will need to use the namespace drop-down to choose namespaces they want to view data from. The selection will be persisted on reloads.

However, it is possible to configure a predefined selection of namespaces, like in the following example:

spec:
  kiali_feature_flags:
    ui_defaults:
      namespaces:
      - istio-system
      - bookinfo

Namespace selection will reset to the predefined set on reloads. Also, if for some reason a namespace becomes deleted, Kiali will simply ignore it from the list.

Graph find and hide presets

In the toolbar of the topology graph, the Find and Hide textboxes can be configured with presets for your most used criteria. You can find screenshots and a brief description of this feature in the feature update blog post for versions 1.31 to 1.33.

The following are the default presets:

spec:
  kiali_feature_flags:
    ui_defaults:
      graph:
        find_options:
        - auto_select: false  
          description: "Find: slow edges (> 1s)"
          expression: "rt > 1000"
        - auto_select: false
          description: "Find: unhealthy nodes"
          expression:  "! healthy"
        - auto_select: false
          description: "Find: unknown nodes"
          expression:  "name = unknown"
        hide_options:
        - auto_select: false
          description: "Hide: healthy nodes"
          expression: "healthy"
        - auto_select: false
          description: "Hide: unknown nodes"
          expression:  "name = unknown"

Hopefully, the attributes to configure this feature are self-explanatory.

To enable one of the configurations by default, it is possible to set auto_select to true, available for find and hide settings.

Note that by providing your own presets, you will be overriding the default configuration. Make sure to include any default presets that you need in case you provide your own configuration.

Graph default traffic rates

Traffic rates in the graph are fetched from Istio telemetry and there are several metric sources that can be used.

In the graph page, you can select the traffic rate metrics using the Traffic drop-down (next to the Namespaces drop-down). By default, Requests is selected for GRPC and HTTP protocols, and Sent bytes is selected for the TCP protocol, but you can change the default selection:

spec:
  kiali_feature_flags:
    ui_defaults:
      graph:
        traffic:
          grpc: "requests" # Valid values: none, requests, sent, received and total
          http: "requests" # Valid values: none and requests
          tcp:  "sent"     # Valid values: none, sent, received and total

Note that only requests provide response codes and will allow health to be calculated. Also, the resulting topology graph may be different for each source.

2.3 - Custom Dashboards

Configuring additional, non-default dashboards.

Custom Dashboards require some configuration to work properly.

Declaring a custom dashboard

When installing Kiali, you define your own custom dashboards in the Kiali CR spec.custom_dashboards field. Here’s an example of what it looks like:

custom_dashboards:
- name: vertx-custom
  title: Vert.x Metrics
  runtime: Vert.x
  discoverOn: "vertx_http_server_connections"
  items:
  - chart:
      name: "Server response time"
      unit: "seconds"
      spans: 6
      metrics:
      - metricName: "vertx_http_server_responseTime_seconds"
        displayName: "Server response time"
      dataType: "histogram"
      aggregations:
      - label: "path"
        displayName: "Path"
      - label: "method"
        displayName: "Method"
  - chart:
      name: "Server active connections"
      unit: ""
      spans: 6
      metricName: "vertx_http_server_connections"
      dataType: "raw"
  - include: "micrometer-1.1-jvm"
  externalLinks:
  - name: "My custom Grafana dashboard"
    type: "grafana"
    variables:
      app: var-app
      namespace: var-namespace
      version: var-version

The name field corresponds to what you can set in the pod annotation kiali.io/dashboards.

The rest of the field definitions are:

runtime: optional, name of the related runtime. It will be displayed on the corresponding Workload Details page. If omitted no name is displayed.
title: dashboard title, displayed as a tab in Application or Workloads Details
discoverOn: metric name to match for auto-discovery. If omitted, the dashboard won’t be discovered automatically, but can still be used via pods annotation.
items: a list of items, that can be either chart, to define a new chart, or include to reference another dashboard
- chart: new chart object
  - name: name of the chart
  - chartType: type of the chart, can be one of line (default), area, bar or scatter
  - unit: unit for Y-axis. Free-text field to provide any unit suffix. It can eventually be scaled on display. See specific section below.
  - unitScale: in case the unit needs to be scaled by some factor, set that factor here. For instance, if your data is in milliseconds, set 0.001 as scale and seconds as unit.
  - spans: number of “spans” taken by the chart, from 1 to 12, using bootstrap convention
  - metrics: a list of metrics to display on this single chart:
    - metricName: the metric name in Prometheus
    - displayName: name to display on chart
  - dataType: type of data to be displayed in the chart. Can be one of raw, rate or histogram. Raw data will be queried without transformation. Rate data will be queried using promQL rate() function. And histogram with histogram_quantile() function.
  - min and max: domain for Y-values. When unset, charts implementations should usually automatically adapt the domain with the displayed data.
  - xAxis: type of the X-axis, can be one of time (default) or series. When set to series, only one datapoint per series will be displayed, and the chart type then defaults to bar.
  - aggregator: defines how the time-series are aggregated when several are returned for a given metric and label set. For example, if a Deployment creates a ReplicaSet of several Pods, you will have at least one time-series per Pod. Since Kiali shows the dashboards at the workload (ReplicaSet) level or at the application level, they will have to be aggregated. This field can be used to fix the aggregator, with values such as sum or avg (full list available in Prometheus documentation). However, if omitted the aggregator will default to sum and can be changed from the dashboard UI.
  - aggregations: list of labels eligible for aggregations / groupings (they will be displayed in Kiali through a dropdown list)
    - label: Prometheus label name
    - displayName: name to display in Kiali
    - singleSelection: boolean flag to switch between single-selection and multi-selection modes on the values of this label. Defaults to false.
  - groupLabels: a list of Prometheus labels to be used for grouping. Similar to aggregations, except this grouping will be always turned on.
  - sortLabel: Prometheus label to be used for the metrics display order.
  - sortLabelParseAs: set to int if sortLabel needs to be parsed and compared as an integer instead of string.
- include: to include another dashboard, or a specific chart from another dashboard. Typically used to compose with generic dashboards such as the ones about MicroProfile Metrics or Micrometer-based JVM metrics. To reference a full dashboard, set the name of that dashboard. To reference a specific chart of another dashboard, set the name of the dashboard followed by $ and the name of the chart (ex: include: "microprofile-1.1$Thread count").
externalLinks: a list of related external links (e.g. to Grafana dashboards)
- name: name of the related dashboard in the external system (e.g. name of a Grafana dashboard)
- type: link type, currently only grafana is allowed
- variables: a set of variables that can be injected in the URL. For instance, with something like namespace: var-namespace and app: var-app, an URL to a Grafana dashboard that manages namespace and app variables would look like: http://grafana-server:3000/d/xyz/my-grafana-dashboard?var-namespace=some-namespace&var-app=some-app. The available variables in this context are namespace, app and version.

Label clash: you should try to avoid labels clashes within a dashboard.

In Kiali, labels for grouping are aggregated in the top toolbar, so if the same label refers to different things depending on the metric, you wouldn’t be able to distinguish them in the UI. For that reason, ideally, labels should not have too generic names in Prometheus. For instance labels named “id” for both memory spaces and buffer pools would better be named “space_id” and “pool_id”. If you have control on label names, it’s an important aspect to take into consideration. Else, it is up to you to organize dashboards with that in mind, eventually splitting them into smaller ones to resolve clashes.

Modifying Built-in Dashboards: If you want to modify or remove a built-in dashboard, you can set its new definition in the Kiali CR’s spec.custom_dashboards. Simply define a custom dashboard with the same name as the built-in dashboard. To remove a built-in dashboard so Kiali doesn’t use it, simply define a custom dashboard by defining only its name with no other data associated with it (e.g. in spec.custom_dashboards you add a list item that has - name: <name of built-in dashboard to remove>.

Dashboard scope

The custom dashboards defined in the Kiali CR are available for all workloads in all namespaces.

Additionally, new custom dashboards can be created for a given namespace or workload, using the dashboards.kiali.io/templates annotation.

This is an example where a “Custom Envoy” dashboard will be available for all applications and workloads for the default namespace:

apiVersion: v1
kind: Namespace
metadata:
  name: default
  annotations:
    dashboards.kiali.io/templates: |
      - name: custom_envoy
        title: Custom Envoy
        discoverOn: "envoy_server_uptime"
        items:
          - chart:
              name: "Pods uptime"
              spans: 12
              metricName: "envoy_server_uptime"
              dataType: "raw"

This other example will create an additional “Active Listeners” dashboard only on details-v1 workload:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: details-v1
  labels:
    app: details
    version: v1
spec:
  replicas: 1
  selector:
    matchLabels:
      app: details
      version: v1
  template:
    metadata:
      labels:
        app: details
        version: v1
      annotations:
        dashboards.kiali.io/templates: |
          - name: envoy_listeners
            title: Active Listeners
            discoverOn: "envoy_listener_manager_total_listeners_active"
            items:
              - chart:
                  name: "Total Listeners"
                  spans: 12
                  metricName: "envoy_listener_manager_total_listeners_active"
                  dataType: "raw"          
    spec:
      serviceAccountName: bookinfo-details
      containers:
      - name: details
        image: docker.io/istio/examples-bookinfo-details-v1:1.16.2
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 9080
        securityContext:
          runAsUser: 1000

Units

Some units are recognized in Kiali and scaled appropriately when displayed on charts:

unit: "seconds" can be scaled down to ms, µs, etc.
unit: "bytes-si" and unit: "bitrate-si" can be scaled up to kB, MB (etc.) using SI / metric system. The aliases unit: "bytes" and unit: "bitrate" can be used instead.
unit: "bytes-iec" and unit: "bitrate-iec" can be scaled up to KiB, MiB (etc.) using IEC standard / IEEE 1541-2002 (scale by powers of 2).

Other units will fall into the default case and be scaled using SI standard. For instance, unit: "m" for meter can be scaled up to km.

Prometheus Configuration

Kiali custom dashboards work exclusively with Prometheus, so it must be configured correctly to pull your application metrics.

If you are using the demo Istio installation with addons, your Prometheus instance should already be correctly configured and you can skip to the next section; with the exception of Istio 1.6.x where you need customize the ConfigMap, or install Istio with the flag --set meshConfig.enablePrometheusMerge=true.

Using another Prometheus instance

You can use a different instance of Prometheus for these metrics, as opposed to Istio metrics. This second Prometheus instance can be configured from the Kiali CR when using the Kiali operator, or ConfigMap otherwise:

# ...
external_services:
  custom_dashboards:
    prometheus:
      url: URL_TO_PROMETHEUS_SERVER_FOR_CUSTOM_DASHBOARDS
    namespace_label: kubernetes_namespace
  prometheus:
    url: URL_TO_PROMETHEUS_SERVER_FOR_ISTIO_METRICS
# ...

For more details on this configuration, such as Prometheus authentication options, check the Kiali CR Reference page.

You must make sure that this Prometheus instance is correctly configured to scrape your application pods and generates labels that Kiali will understand. Please refer to this documentation to setup the kubernetes_sd_config section. As a reference, here is how it is configured in Istio.

It is important to preserve label mapping, so that Kiali can filter by app and version, and to have the same namespace label as defined per Kiali config. Here’s a relabel_configs that allows this:

      relabel_configs:
      - action: labelmap
        regex: __meta_kubernetes_pod_label_(.+)
      - source_labels: [__meta_kubernetes_namespace]
        action: replace
        target_label: kubernetes_namespace

Pod Annotations and Auto-discovery

Application pods must be annotated for the Prometheus scraper, for example, within a Deployment definition:

spec:
  template:
    metadata:
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "8080"
        prometheus.io/path: "/metrics"

prometheus.io/scrape tells Prometheus to fetch these metrics or not
prometheus.io/port is the port under which metrics are exposed
prometheus.io/path is the endpoint path where metrics are exposed, default is /metrics

Kiali will try to discover automatically dashboards that are relevant for a given Application or Workload. To do so, it reads their metrics and try to match them with the discoverOn field defined on dashboards.

But if you can’t rely on automatic discovery, you can explicitly annotate the pods to associate them with Kiali dashboards.

spec:
  template:
    metadata:
      annotations:
        # (prometheus annotations...)
        kiali.io/dashboards: vertx-server

kiali.io/dashboards is a comma-separated list of dashboard names that Kiali will look for. Each name in the list must match the name of a built-in dashboard or the name of a custom dashboard as defined in the Kial CR’s spec.custom_dashboards.

2.4 - Debugging Kiali

How to debug the Kiali Server and the Kiali Operator using logs, metrics, traces, and profiler.

Logs

The most basic way of debugging the internals of Kiali is to examine its log messages. A typical way of examining the log messages is via:

kubectl logs -n istio-system deployment/kiali

Each log message is logged at a specific level. The different log levels are trace, debug, info, warn, error, and fatal. By default, log messages at info level and higher will be logged. If you want to see more verbose logs, set the log level to debug or trace (trace is the most verbose setting and will make the log output very “noisy”). You set the log level in the Kiali CR:

spec:
  deployment:
    logger:
      log_level: debug

By default, Kiali will log messages in a basic text format. You can have Kiali log messages in JSON format, which can sometimes make reading, querying, and filtering the logs easier:

spec:
  deployment:
    logger:
      log_format: json

Filtering logs

You may want to pinpoint specific log messages in the Kiali logs. The following are different commands and expressions you can use in order to filter the logs to help expose messages you are most interested in. There are two sets of commands/expressions documented below: one using grep and sed if Kiali is logging its messages in simple text format, and the other using jq if Kiali is logging its messages in JSON format. (Note that jq will format each JSON message into multiple lines to read the JSON easier. Pass the -c option to jq to condense the JSON into one line per log message - it may be harder to read, but will reduce the amount of lines considerably.)

Note that all commands/expressions below should have the Kiali logs piped into its stdin. Usually this means using kubectl to get the logs from Kiali and pipe them, like this:

kubectl logs -n istio-system deployments/kiali | <...commands/expressions here...>

Remove log levels

If you have enabled the log level of “trace”, the Kiali logs will contain a large amount of messages. If you have a hard time sifting through all of those messages, rather than reconfigure Kiali with a different log level you can simply filter out the trace messages.


text:	`grep -v ' TRC '`
json:	`jq -R 'fromjson? \| select(.level != "trace")'`

If you want to remove both “trace” and “debug” level messages (leaving “info” and higher priority messages):


text:	`grep -vE ' (TRC\|DBG) '`
json:	`jq -R 'fromjson? \| select(.level != "trace" and .level != "debug")'`

Show logs for only a single request

Some log messages are associated with a single, specific request. You can obtain all the logs associated with any specific request given a request ID. To determine which request ID you want to use as a filter, you first find all the request IDs in the logs:


text:	`grep -o 'request-id=[^ ]*' \| sed 's/^request-id=//' \| sort -u`
json:	`jq -rR 'fromjson? \| select(has("request-id")) \| ."request-id"'`

Pick a request ID, and use it to retrieve all the logs associated with that request:


text:	`grep 'request-id=abc123'`
json:	`jq -rR 'fromjson? \| select(."request-id" == "abc123")'`

But just having a list of every request ID is likely not enough. You most likely want to look at the logs for requests for a specific Kiali API (like the graph generation API). To see all the different routes into the Kiali API server that were requested, you can get their route names like this:


text:	`grep -o 'route=[^ ]*' \| sed 's/^route=//' \| sort -u`
json:	`jq -rR 'fromjson? \| select(.route) \| .route' \| sort -u`

The GraphNamespaces route is an important one - it is the API that is used to generate the main Kiali graphs. If you want to see all the IDs for all requests to this API, you can do this:


text:	`grep 'route=GraphNamespaces' \| grep -o 'request-id=[^ ]*' \| sed 's/^request-id=//' \| sort -u`
json:	`jq -rR 'fromjson? \| select(.route == "GraphNamespaces") \| .["request-id"]' \| sort -u`

Now you can take one of those request IDs and obtain logs for it (as explained earlier) to see all the logs for that request to generate a graph.

Some routes that may be of interest are:

AggregateMetrics: aggregate metrics for a given resource
AppMetrics, ServiceMetrics, WorkloadMetrics: gets metrics for a given resource
AppSpans, ServiceSpans, WorkloadSpans: gets tracing spans for a given resource
AppTraces, ServiceTraces, WorkloadTraces: gets traces for a given resource
Authenticate: authenticates users
ClustersHealth: gets the health data for all resources in a namespace within a single cluster
ConfigValidationSummary: gets the validation summary for all resources in given namespaces
ControlPlaneMetrics: gets metrics for a single control plane
GraphAggregate: generates a node detail graph
GraphNamespaces: generates a namespaces graph
IstioConfigDetails: gets the content of an Istio configuration resource
IstioConfigList: gets the list of Istio configuration resources in a namespace
MeshGraph: generates a mesh graph
NamespaceList: gets the list of available namespaces
NamespaceMetrics: gets metrics for a single namespace
NamespaceValidationSummary: gets the validation summary for all resources in a given namespace
TracesDetails: gets detailed information on a specific trace

Show logs of processing times

Kiali collects metrics of its internal systems to track its performance (see the next section, “Metrics”). Many of these metrics use a timer to measure the duration of time that Kiali takes to process some unit of work (for example, the time it takes to generate a graph). Kiali will log these duration times as well as export them to Prometheus. To see what metric timers Kiali is tracking internally, you can do this:


text:	`grep -o 'timer=[^ ]*' \| sed 's/^timer=//' \| sort -u`
json:	`jq -rR 'fromjson? \| select(.timer) \| .timer' \| sort -u`

Note that Kiali will not log times that are under 3 seconds since those are deemed uninteresting and logging them will make the logs “noisy”. Prometheus will still collect those metrics, so they are still being recorded.

One timer is especially useful - the timer named “GraphGenerationTime”. You can query the log for all the graph generation times like this:


text:	`grep 'timer=GraphGenerationTime'`
json:	`jq -R 'fromjson? \| select(.timer == "GraphGenerationTime")'`

Each log message contains a duration attribute - this is the amount of time it took to generate the graph (or parts of the graph). Look at the additional attributes for details on what the timer measured.

Some timers that may be of interest are:

APIProcessingTime: The time it takes to process an API request in its entirety
CheckerProcessingTime: The time it takes to run a specific validation checker
GraphAppenderTime: The time it takes for an appender to decorate a graph
GraphGenerationTime: The time it takes to generate a full graph
PrometheusProcessingTime: The time it takes to run Prometheus queries
SingleValidationProcessingTime: The time it takes to validate an Istio configuration resource
TracingProcessingTime: The time it takes to run Tracing queries
ValidationProcessingTime: The time it takes to validate a set of Istio configuration resources

Metrics

Kiali has a metrics endpoint that can be enabled, allowing Prometheus to collect Kiali metrics. You can then use Prometheus (or Kiali itself) to examine and analyze these metrics.

The metrics server uses the same TLS configuration as the main Kiali server. When TLS is enabled (via identity.cert_file and identity.private_key_file), the metrics endpoint requires HTTPS and enforces the same TLS policy (versions and cipher suites). When TLS is not configured, the metrics endpoint uses plain HTTP.

Configuring Prometheus to Scrape Kiali Metrics

When Kiali’s metrics endpoint is enabled, the Kiali pod includes standard prometheus.io/* annotations that many Prometheus deployments use for auto-discovery:

prometheus.io/scrape: "true"
prometheus.io/port: "<metrics-port>" (default: 9090)
prometheus.io/scheme: "http" or "https" (depending on TLS configuration)

For HTTP (no TLS configured): If your Prometheus setup is configured to honor prometheus.io/* pod annotations (for example, the standard kubernetes-pods scrape job), it can scrape Kiali metrics without additional configuration. If you’re using Prometheus Operator and do not have a pod-annotation scrape job, create a PodMonitor or ServiceMonitor instead.

For HTTPS (TLS configured): When TLS is enabled, Prometheus needs additional configuration to properly scrape the metrics endpoint. This is particularly relevant on OpenShift where Kiali automatically uses service-serving certificates.

The challenge is that service-serving certificates are valid for the Service DNS name (e.g., kiali.istio-system.svc), not for pod IP addresses. When Prometheus scrapes pods directly by IP address (as the standard kubernetes-pods job does), TLS certificate validation fails. The solutions below address this by ensuring Prometheus uses the Service DNS name for TLS validation, even when the actual scrape target is a pod IP.

Option 1: ServiceMonitor (Prometheus Operator)

If you’re using the Prometheus Operator, create a ServiceMonitor that scrapes through the Kiali Service (where the certificate is valid):

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: kiali
  namespace: istio-system  # Or your Kiali namespace
spec:
  endpoints:
  - port: tcp-metrics
    scheme: https
    tlsConfig:
      # For OpenShift cluster monitoring, the service CA is available at this path
      caFile: /etc/prometheus/configmaps/serving-certs-ca-bundle/service-ca.crt
      # serverName must match the certificate's SAN
      serverName: kiali.istio-system.svc
  namespaceSelector:
    matchNames:
    - istio-system  # Or your Kiali namespace
  selector:
    matchLabels:
      app.kubernetes.io/name: kiali

CA File Path Note

The caFile path shown above (/etc/prometheus/configmaps/serving-certs-ca-bundle/service-ca.crt) is specific to OpenShift’s built-in cluster monitoring Prometheus. If you’re using a different Prometheus deployment, you’ll need to:

Mount the OpenShift service CA into your Prometheus pod
Adjust the caFile path accordingly

To get the service CA, create a ConfigMap with the annotation service.beta.openshift.io/inject-cabundle: "true" and OpenShift will automatically populate it with the service CA certificate.

Option 2: Static Scrape Configuration

For non-Operator Prometheus deployments, add a scrape job to your Prometheus configuration file (prometheus.yml) that targets the Kiali Service:

scrape_configs:
- job_name: 'kiali'
  scheme: https
  tls_config:
    ca_file: /path/to/service-ca.crt
    server_name: kiali.istio-system.svc
  static_configs:
  - targets:
    - kiali.istio-system.svc:9090

Option 3: Skip Certificate Verification (Not Recommended)

For testing purposes only, you can configure Prometheus to skip certificate verification. In a ServiceMonitor resource, add insecureSkipVerify to the tlsConfig:

tlsConfig:
  insecureSkipVerify: true

Or in your Prometheus configuration file (prometheus.yml), add insecure_skip_verify to the tls_config:

tls_config:
  insecure_skip_verify: true

Security Warning: Skipping certificate verification defeats the purpose of TLS and makes your metrics collection vulnerable to man-in-the-middle attacks. Only use this for testing, never in production.

Viewing and Analyzing Kiali Metrics

To see the metrics that are currently being emitted by Kiali, you can run the following command which simply parses the metrics endpoint data and outputs all the metrics it finds:

# For HTTP (when TLS not configured):
curl -s http://<KIALI_HOSTNAME>:9090/metrics | grep -o '^# HELP kiali_.*' | awk '{print $3}'

# For HTTPS (when TLS configured):
curl -s -k https://<KIALI_HOSTNAME>:9090/metrics | grep -o '^# HELP kiali_.*' | awk '{print $3}'

The Kiali UI itself graphs some of these metrics. In the Kiali UI, navigate to the Kiali workload and select the “Kiali Internal Metrics” tab:

Kiali metrics

Use the Kiali UI to analyze these metrics in the same way that you would analyze your application metrics. (Note that “Tracing processing duration” will be empty if you have not integrated your Tracing backend with Kiali).

Because these are metrics collected by Promtheus, you can analyze Kiali’s metrics through Prometheus queries and the Prometheus UI. Some of the more interesting Prometheus queries are listed below.

API routes
- Average latency per API route: rate(kiali_api_processing_duration_seconds_sum[5m]) / rate(kiali_api_processing_duration_seconds_count[5m])
- Request rate per API route: rate(kiali_api_processing_duration_seconds_count[5m])
- 95th percentile latency per API route: histogram_quantile(0.95, rate(kiali_api_processing_duration_seconds_bucket[5m]))
- Alert: 95th Percentile Latency > 5s: histogram_quantile(0.95, rate(kiali_api_processing_duration_seconds_bucket[5m])) > 5s
- Top 5 slowest API routes (avg latency over 5m): topk(5, rate(kiali_api_processing_duration_seconds_sum[5m]) / rate(kiali_api_processing_duration_seconds_count[5m]))
Graph
- Use the same queries as “API routes” but with the metric kiali_graph_generation_duration_seconds_[count,sum,bucket] to get information about the graph generator.
- Use the same queries as “API routes” but with the metric kiali_graph_appender_duration_seconds_[count,sum,bucket] to get information about the graph generator appenders. This helps analyze the performance of the individual appenders that are used to build and decorate the graphs.
Tracing
- Use the same queries as “API routes” but with the metric kiali_tracing_processing_duration_seconds_[count,sum,bucket] to get information about the groups of different Tracing queries. This helps analyze the performance of the Kiali/Tracing integration.
Metrics
- Use the same queries as “API routes” but with the metric kiali_prometheus_processing_duration_seconds_[count,sum,bucket] to get information about the different groups of Prometheus queries. This help analyze the performance of the Kiali/Prometheus integration.
Validations
- Use the same queries as “API routes” but with the metric kiali_validation_processing_duration_seconds_[count,sum,bucket] to get information about Istio configuration validation. This helps analyze the performance of Istio configuration validation as a whole.
- Use the same queries as “API routes” but with the metric kiali_checker_processing_duration_seconds_[count,sum,bucket] to get information about the different validation checkers. This helps analyze the performance of the individual checkers performed during the Istio configuration validation.
Failures
- Failures per API route (in the past hour): sum by (route) (rate(kiali_api_failures_total[1h]))
- Error rate percentage per API route: 100 * sum by (route) (rate(kiali_api_failures_total[1h])) / sum by (route) (rate(kiali_api_processing_duration_seconds_count[1h]))
- The number of failures per API route in the past 30 minutes: increase(kiali_api_failures_total[30m])
- The top 5 API routes with failures in the past 30 minutes topk(5, increase(kiali_api_failures_total[30m]))

Tracing

Kiali provides the ability to emit debugging traces to the distributed tracing platform, Jaeger or Grafana Tempo.

From Kiali 1.79, the feature of Kiali emitting tracing data into Jaeger format has been removed.

The traces can be sent in HTTP, HTTPS or gRPC protocol. It is also possible to use TLS. When tls_enabled is set to true, one of the options skip_verify or ca_name should be specified.

The traces are sent in OTel format, indicated in the collector_type setting.

server:
  observability:
    tracing:
      collector_url: "jaeger-collector.istio-system:4317"
      enabled: true
      otel:
        protocol: "grpc"
        tls_enabled: true
        skip_verify: false
        ca_name: "/tls.crt"

Usually, the tracing platforms expose different ports to collect traces in distinct formats and protocols:

The Jaeger collector accepts OpenTelemetry Protocol over HTTP (4318) and gRPC (4317).
The Grafana Tempo distributor accepts OpenTelemetry Protocol over HTTP (4318) and gRPC (4317). It can be configured to accept TLS.

The traces emitted by Kiali can be searched in the Kiali workload:

Kiali traces

Tracing Integration

Sometimes integration with tracing can be complex, but since version 2.11, a tool is available to help with the configuration. It’s available on the mesh page, by clicking on the tracing node. From there, under “Configuration Tester,” it will show 2 different features:

Tracing tool

Discovery tool
Configuration tester

The discovery feature will show possible valid configurations that might work based on the tracing open ports. It’s important that at least the URL is properly defined - external_services.tracing.internal_url if it’s inside the cluster, or external_services.tracing.external_url if it’s outside.

The logs section will provide more insights about the tests done, the open ports, the errors found, that can help to troubleshoot in case of more complex scenarios, like urls with tenants or https.

Tracing discovery

The configuration tester allows to test a specific configuration without having to edit the config map and wait for the Kiali pod to be restarted. Please note that the configuration will not be saved permanently.

Tracing configuration tester

Profiler

The Kial Server is integrated with the Go pprof profiler. By default, the integration is disabled. If you want the Kiali Server to generate profile reports, enable it in the Kiali CR:

spec:
  server:
    profiler:
      enabled: true

Once the profiler is enabled, you can access the profile reports by pointing your browser to the <kiali-root-url>/debug/pprof endpoint and click the link to the profile report you want. You can obtain a specific profile report by appending the name of the profile to the URL. For example, if your Kiali Server is found at the root URL of “http://localhost:20001/kiali”, and you want the heap profile report, the URL http://localhost:20001/kiali/debug/pprof/heap will provide the data for that report.

Go provides a pprof tool that you can then use to visualize the profile report. This allows you to analyze the data to help find potential problems in the Kiali Server itself. For example, you can start the pprof UI on port 8080 which allows you to see the profile data in your browser:

go tool pprof -http :8080 http://localhost:20001/kiali/debug/pprof/heap

You can download a profile report and store it as a file for later analysis. For example:

curl -o pprof.txt http://localhost:20001/kiali/debug/pprof/heap

You can then examine the data found in the profile report:

go tool pprof -http :8080 ./pprof.txt

Your browser will be opened to http://localhost:8080/ui which allows you to see the profile report.

Kiali CR Status

When you install the Kiali Server via the Kiali Operator, you do so by creating a Kiali CR. One quick way to debug the status of a Kiali Server installation is to look at the Kiali CR’s status field (e.g. kubectl get kiali --all-namespaces -o jsonpath='{..status}'). The operator will report any installation errors within this Kiali CR status. If the Kiali Server fails to install, always check the Kiali CR status field first because in many instances you will find an error message there that can provide clear guidance on what to do next.

Debugging the Kiali Operator

The Kiali Operator is built on the Ansible Operator SDK. It has multiple independent logging controls that each affect a different subsystem. They are listed here in order of how commonly they are needed for debugging.

Ansible Playbook Verbosity

This controls how verbose the Ansible playbook output is during reconciliation (equivalent to the -v, -vv, -vvv flags passed to ansible-runner). This is useful for debugging issues within the Ansible playbook logic itself, such as seeing the values of variables or the details of each task.

Set the ansible.sdk.operatorframework.io/verbosity annotation on the Kiali or OSSMConsole CR. The value is an integer from 0 (default, no extra verbosity) to 5 (most verbose):

metadata:
  annotations:
    ansible.sdk.operatorframework.io/verbosity: "1"

See the Ansible Operator SDK advanced options documentation for more details on this.

Ansible Debug Logs

When set to true, this causes the operator to print the full ansible-runner stdout after each reconciliation completes. This is useful for seeing the complete Ansible output including all task results.

When Installed via Helm

Set the debug.enabled value:

helm upgrade kiali-operator kiali/kiali-operator --set debug.enabled=true

When Installed via OLM

Add the environment variable to the Subscription’s spec.config.env:

spec:
  config:
    env:
    - name: ANSIBLE_DEBUG_LOGS
      value: "true"

Go Structured Log Level

--zap-log-level controls the log level of the Go-based controller-runtime framework that manages the operator’s reconciliation loop. This is the setting needed for diagnosing why reconciliation is being triggered, which is typically only necessary when investigating unexpected or periodic reconciliations.

The supported levels are:

info: Logs startup information, controller events, and proxy cache reads.
debug: Additionally logs the event handler messages that tell you exactly what event triggered each reconciliation.

When set to debug, the operator will emit a log message like the following immediately before each reconciliation:

{"level":"debug","ts":"2026-02-10T20:06:23Z","logger":"ansible.handler","msg":"Metrics handler event","Event type":"Update","GroupVersionKind":"kiali.io/v1alpha1, Kind=Kiali","Name":"kiali","Namespace":"kiali-operator"}

The key fields in this message are:

Event type: One of Create, Update, Delete, or Generic - tells you what kind of change triggered the reconciliation.
GroupVersionKind: Which resource type changed (e.g. kiali.io/v1alpha1, Kind=Kiali or kiali.io/v1alpha1, Kind=OSSMConsole).
Name / Namespace: Which specific CR instance was affected.

To find these messages in the logs:

kubectl logs deployment/kiali-operator -n <operator-namespace> | grep 'ansible.handler'

The Go log level is controlled by the --zap-log-level container argument on the operator deployment. The method for changing this depends on how the operator was installed.

When Installed via Helm

When the operator is installed via Helm, you can patch the Deployment directly since there is no OLM to revert the change:

kubectl patch deployment kiali-operator -n <operator-namespace> --type='json' \
  -p='[{"op":"replace","path":"/spec/template/spec/containers/0/args/0","value":"--zap-log-level=debug"}]'

To revert back to normal logging, just run that command again with the --zap-log-level set back to info.

When Installed via OLM

When the operator is installed via OLM (Operator Lifecycle Manager), you cannot patch the Deployment directly because OLM will revert the change. The OLM Subscription config also does not support overriding container args. Instead, you must patch the ClusterServiceVersion (CSV), which OLM treats as the authoritative source for the deployment spec.

To enable debug logging:

kubectl patch csv $(kubectl get csv -n <operator-namespace> --no-headers -o custom-columns=NAME:.metadata.name | grep '^kiali-operator') \
  -n <operator-namespace> --type='json' \
  -p='[{"op":"replace","path":"/spec/install/spec/deployments/0/spec/template/spec/containers/0/args/0","value":"--zap-log-level=debug"}]'

OLM will automatically roll out a new operator pod with the updated args.

To revert back to normal logging, just run that command again with the --zap-log-level set back to info.

On OpenShift, the operator namespace is typically openshift-operators. On vanilla Kubernetes with OLM, it is typically operators.

Ansible Task Profiler

The operator includes an Ansible task profiler that uses the profile_tasks Ansible callback plugin. When enabled, it logs the execution time of each Ansible task to the operator pod’s log output at the end of each reconciliation run. This is useful for identifying slow tasks in the operator’s Ansible playbooks.

When Installed via Helm

Set the debug.enableProfiler value:

helm upgrade kiali-operator kiali/kiali-operator --set debug.enableProfiler=true

When Installed via OLM

Set the ANSIBLE_CONFIG environment variable to the profiler configuration in the Subscription’s spec.config.env:

spec:
  config:
    env:
    - name: ANSIBLE_CONFIG
      value: "/opt/ansible/ansible-profiler.cfg"

To disable the profiler, set the value back to /etc/ansible/ansible.cfg.

Examples

The following are just some examples of how you can use the Kiali signals to help diagnose problems within Kiali itself.

Use log messages to find out what is slow

The examples below assume Kiali is outputting logs in JSON format (spec.deployment.logger.log_format = json). Use grep, sed, and related tools to query logs if Kiali is logging the output as text.

Make sure you turn on trace logging (spec.deployment.logger.log_level = trace) in order to get the log messages needed for this kind of analysis.

Find all the logs that show APIs with long execution times. Because Kiali is not logging times faster than 3 seconds, this query will return all the routes (i.e. the API endpoints) that were 3 seconds or slower:

kubectl logs -n istio-system deployments/kiali | \
  jq -rR 'fromjson? | select(.timer) | .route' | \
  sort -u

Suppose that returned only one route name - GraphNamespaces. This means the main graph page was slow. Let’s examine the logs for a request for that API. We first find the ID of the last request that was made for the GraphNamespaces API:

kubectl logs -n istio-system deployments/kiali | \
  jq -rR 'fromjson? | select(.route == "GraphNamespaces") | .["request-id"]' | tail -n 1

Take the ID string that was output (in this example, it is d0staq6nq35s73b6mdug) and use it to examine the logs for that request only:

kubectl logs -n istio-system deployments/kiali | \
  jq -rR 'fromjson? | select(."request-id" == "d0staq6nq35s73b6mdug")'

To make the output less verbose, we can eliminate some of the message’s attributes that we do not need to see:

kubectl logs -n istio-system deployments/kiali | \
  jq -rR 'fromjson? | select(."request-id" == "d0staq6nq35s73b6mdug") | \
  del(.["level", "route", "route-pattern", "group", "request-id"])'

The output of that command is the log messages, in chronological order, as the request to generate the graph was processed in the Kiali server. Examining timestamps, timer durations, warnings, and other data in these messages can help determine what made the request slow:

{
  "ts": "2025-05-30T15:57:28Z",
  "msg": "Build [versionedApp] graph for [1] namespaces [map[bookinfo:{bookinfo 1m0s false false}]]"
}
{
  "ts": "2025-05-30T15:57:28Z",
  "msg": "Build traffic map for namespace [{bookinfo 1m0s false false}]"
}
{
  "appender": "workloadEntry",
  "ts": "2025-05-30T15:57:28Z",
  "msg": "Running workload entry appender"
}
{
  "appender": "workloadEntry",
  "ts": "2025-05-30T15:57:28Z",
  "msg": "WorkloadEntries found: 0"
}
{
  "appender": "workloadEntry",
  "ts": "2025-05-30T15:57:28Z",
  "msg": "WorkloadEntries found: 0"
}
{
  "appender": "workloadEntry",
  "ts": "2025-05-30T15:57:28Z",
  "msg": "WorkloadEntries found: 0"
}
{
  "appender": "workloadEntry",
  "ts": "2025-05-30T15:57:28Z",
  "msg": "WorkloadEntries found: 0"
}
{
  "appender": "workloadEntry",
  "ts": "2025-05-30T15:57:28Z",
  "msg": "WorkloadEntries found: 0"
}
{
  "appender": "workloadEntry",
  "ts": "2025-05-30T15:57:28Z",
  "msg": "WorkloadEntries found: 0"
}
{
  "appender": "workloadEntry",
  "ts": "2025-05-30T15:57:28Z",
  "msg": "WorkloadEntries found: 0"
}
{
  "appender": "idleNode",
  "namespace": "bookinfo",
  "timer": "GraphAppenderTime",
  "duration": "3.153312011s",
  "ts": "2025-05-30T15:57:31Z",
  "msg": "Namespace graph appender time"
}
{
  "ts": "2025-05-30T15:57:31Z",
  "msg": "Generating config for [common] graph..."
}
{
  "ts": "2025-05-30T15:57:31Z",
  "msg": "Done generating config for [common] graph"
}
{
  "inject-service-nodes": "true",
  "graph-kind": "namespace",
  "graph-type": "versionedApp",
  "timer": "GraphGenerationTime",
  "duration": "3.280609145s",
  "ts": "2025-05-30T15:57:31Z",
  "msg": "Namespace graph generation time"
}
{
  "status-code": "200",
  "timer": "APIProcessingTime",
  "duration": "3.280986943s",
  "ts": "2025-05-30T15:57:31Z",
  "msg": "API processing time"
}

Examining those log messages of a single request to generate the graph easily shows that the idleNode graph appender code is very slow (taking over 3 seconds to complete). Thus, the first thing that should be suspected as the cause of the slow graph generation is the code that generates idle nodes in the graph.

Use Prometheus to find out what is slow

You can use Prometheus to look at Kiali’s metrics to help analyze problems. Even though Kiali does not log metric timers that are faster than 3 seconds, those metrics are still stored in Prometheus.

We can look at the metrics that are emitted by the graph appenders to see how they are performing. This shows the top-5 slowest graph appenders for this specific Kiali environment - and here we see the idleNode appender is by far the worst offender. Again, this helps pin-point a cause of slow graph generation - in this case, the idleNode graph appender code:

Prometheus query: topk(5, rate(kiali_graph_appender_duration_seconds_sum[5m]) / rate(kiali_graph_appender_duration_seconds_count[5m]))

Prometheus showing slow appender metrics

If you are not sure what exactly is slowing down the Kiali Server, one of the first things to examine is the duration of time each API takes to complete. Here are the top-2 slowest Kiali APIs for this specific Kiali environment:

Prometheus query: topk(2, rate(kiali_api_processing_duration_seconds_sum[5m]) / rate(kiali_api_processing_duration_seconds_count[5m]))

Prometheus showing the top-2 slowest Kiali APIs

The above shows that the graph generation is slow. So let’s next look at the graph appenders to see if any one of them could be the culprit of the poor performance:

Prometheus query: topk(5, rate(kiali_graph_appender_duration_seconds_sum[5m]) / rate(kiali_graph_appender_duration_seconds_count[5m]))

Prometheus showing the top-5 slowest Kiali graph appenders

In this specific case, it does not look like any one of the appenders is the source of the problem. They all appear to be having issues with poor performance. Since the graph generation relies heavily on querying the Prometheus server, another thing to check is the time it takes for Kiali to query Prometheus:

Prometheus query: topk(5, rate(kiali_prometheus_processing_duration_seconds_sum[5m]) / rate(kiali_prometheus_processing_duration_seconds_count[5m]))

Prometheus processing metrics

Here it looks like Prometheus itself might be the source of the poor performance. All of the Prometheus queries Kiali is requesting are taking over a full second to complete (some are taking as much as 3.5 seconds). At this point, you should check the Prometheus server and the network connection between Kiali and Prometheus as possible causes of the slow Kiali performance. Perhaps Kiali is asking for so much data from Prometheus, Prometheus cannot keep up. Perhaps there is a network outage causing the Kiali requests to Prometheus being slow. But at least in this case we’ve pin-pointed a bottleneck and can narrow our focus when searching for the root cause of the problem.

Use Kiali to find out what is slow

Kiali itself can be used to help find its own internal problems.

Navigate to the Kiali workload, and select the Kiali Internal Metrics tab. In this case, we can see some APIs are very slow due to the high p99 and average values. We can eliminate the tracing integration as the source of the problem because all processing of tracing requests are taking an average of about 20ms to complete. However, the graph generation appears to be very slow, taking an average of between 15 and 30 seconds to complete each request:

Kiali workload metrics

The Kiali UI allows you to expand each mini-chart into a full size chart for easier viewing. You can also display the different metric labels as separate chart lines. In this case, the graph is showing the duration times for the GraphNamespaces and GraphWorkload APIs:

Kiali workload graph metrics

The above metric charts clearly show a performance problem in the graph generation. Because the graph generation code requests many Prometheus queries, one of the next things to check is the performance of the Kiali-Prometheus integration. One fast and easy way to see how the Prometheus queries are performing is to look at the Kiali workload’s Overview tab, specifically, the graph shown on the right side. Look at the edge between the Kiali node and the Prometheus node for indications of problems (the edge label will show you throughput numbers; the color of the edge will indicate request errors):

This traffic data between Kiali and Prometheus is only available if Kiali is located inside the mesh (e.g. Kiali has an Istio sidecar).

Kiali workload overview

2.5 - Istio Environment

Kiali’s default configuration matches settings present in Istio’s installation configuration profiles. If you are customizing your Istio installation some Kiali settings may need to be adjusted. Also, some Istio management features can be enabled or disabled selectively.

Labels and resource names

Istio recommends adding app and version labels to pods to attach this information to telemetry. Kiali relies on correctness of these labels for several features.

In Istio, it is possible to use a different set of labels, like app.kubernetes.io/name and app.kubernetes.io/version, however you must configure Kiali to the labels you are using. By default, Kiali uses Istio’s recommended labels:

spec:
  istio_labels:
    app_label_name: "app"
    version_label_name: "version"

Although Istio lets you use different labels on different pods, Kiali can only use a single set.

For example, Istio lets you use the app label in one pod and the app.kubernetes.io/name in another pod and it will generate telemetry correctly. However, you will have no way to configure Kiali for this case.

Root namespace

Istio’s root namespace is the namespace where you can create some resources to define default Istio configurations and adapt Istio behavior to your environment. For more information on this Istio configuration, check the Istio docs Global Mesh options page and search for “rootNamespace”.

Kiali uses the root namespace for some of the validations of Istio resources. Kiali automatically detects the root namespace for each Istio control plane, so no manual configuration is required. This enables Kiali to properly support environments with multiple Istio control planes, where each control plane may have a different root namespace.

Prior to Kiali v2.16, the root namespace was configured manually via the external_services.istio.root_namespace setting. This configuration option has been removed as Kiali now autodetects the appropriate root namespace for each control plane.

Sidecar injection, canary upgrade management and Istio revisions

Kiali can assist with configuring automatic sidecar injection and migrating workloads from an old Istio version to a newer one using the canary upgrade method. Kiali uses the standard Istio labels to control sidecar injection policy and canary upgrades.

Management of sidecar injection is enabled by default. If you don’t want this feature, you can disable it with the following configuration:

spec:
  kiali_feature_flags:
    istio_injection_action: false

Using Kiali to apply revision labels through the UI during a canary upgrade is turned off by default. You can enable this in Kiali with the following configuration:

spec:
  kiali_feature_flags:
    # Turns on canary upgrade support
    istio_upgrade_action: true

Upgrade actions appear in the Namespaces page actions menu (Kiali >= 2.23).

Canary upgrade action

The progress of the canary upgrade process can be tracked on the mesh page, which displays the namespaces pending migration to the canary Istio control plane.

Canary upgrade process

There following are links to sections of Kiali blogs posts that briefly explains these features:

2.6 - Kiali CR Reference

Reference page for the Kiali CR. The Kiali Operator will watch for resources of this type and install Kiali according to those resources’ configurations.

Example CR

(all values shown here are the defaults unless otherwise noted)

apiVersion: kiali.io/v1alpha1
kind: Kiali
metadata:
  name: kiali
  annotations:
    ansible.sdk.operatorframework.io/verbosity: "1"
spec:
  additional_display_details:
  - title: "API Documentation"
    annotation: "kiali.io/api-spec"
    icon_annotation: "kiali.io/api-type"

  installation_tag: ""

  version: "default"

  auth:
    strategy: ""
    openid:
      # default: additional_request_params is empty
      additional_request_params:
        openIdReqParam: "openIdReqParamValue"
      # default: allowed_domains is an empty list
      allowed_domains: ["allowed.domain"]
      api_proxy: ""
      api_proxy_ca_data: ""
      api_token: "id_token"
      authentication_timeout: 300
      authorization_endpoint: ""
      client_id: ""
      disable_rbac: false
      http_proxy: ""
      https_proxy: ""
      insecure_skip_verify_tls: false
      issuer_uri: ""
      scopes: ["openid", "profile", "email"]
      username_claim: "sub"
      discovery_override:
        authorization_endpoint: ""
        jwks_uri: ""
        token_endpoint: ""
        userinfo_endpoint: ""
    openshift:
      #redirect_uris:
      #token_inactivity_timeout:
      #token_max_age:

  chat_ai:    
    default_provider: ""
    enabled: false
    providers: []
    store_config:
      enabled: true
      max_cache_memory_mb: 1024      
      reduce_threshold: 15
      reduce_with_ai: false
      
  clustering:
    autodetect_secrets:
      enabled: true
      label: "kiali.io/multiCluster=true"
    clusters: []
    ignore_home_cluster: false
    kiali_urls: []

  # default: custom_dashboards is an empty list
  custom_dashboards:
  - name: "envoy"

  deployment:
    additional_pod_containers_yaml: []
    additional_pod_init_containers_yaml: []
    # default: additional_service_yaml is empty
    additional_service_yaml:
      externalName: "kiali.example.com"
    affinity:
      # default: node is empty
      node:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
          - matchExpressions:
            - key: kubernetes.io/e2e-az-name
              operator: In
              values:
              - e2e-az1
              - e2e-az2
      # default: pod is empty
      pod:
        requiredDuringSchedulingIgnoredDuringExecution:
        - labelSelector:
            matchExpressions:
            - key: security
              operator: In
              values:
              - S1
          topologyKey: topology.kubernetes.io/zone
      # default: pod_anti is empty
      pod_anti:
        preferredDuringSchedulingIgnoredDuringExecution:
        - weight: 100
          podAffinityTerm:
            labelSelector:
              matchExpressions:
              - key: security
                operator: In
                values:
                - S2
            topologyKey: topology.kubernetes.io/zone
    cluster_wide_access: true
    # default: configmap_annotations is empty
    configmap_annotations:
      strategy.spinnaker.io/versioned: "false"
    # default: custom_envs is an empty list
    custom_envs:
    - name: "HTTP_PROXY"
      value: "http://my.proxy.com:1234"
    - name: "NO_PROXY"
      value: "hostname.example.com"
    # default: custom_secrets is an empty list
    custom_secrets:
    - name: "a-custom-secret"
      mount: "/a-custom-secret-path"
      optional: true
    - name: "a-csi-secret"
      mount: "/a-csi-secret-path"
      csi:
        driver: secrets-store.csi.k8s.io
        readOnly: true
        volumeAttributes:
          secretProviderClass: kiali-secretprovider
    # default: discovery_selectors is empty
    discovery_selectors:
      default:
      - matchLabels:
          region: north
      - matchExpressions:
        - key: organization
          operator: "In"
          values: ["engineering", "accounting"]
      - matchLabels:
          region: south
        matchExpressions:
        - key: app
          operator: "DoesNotExist"
        - key: domain
          operator: "NotIn"
          values: ["production"]
      overrides:
        myRemoteCluster:
        - matchLabels:
            region: world
        - matchExpressions:
          - key: organization
            operator: "NotIn"
            values: ["marketing"]
        - matchLabels:
            region: antarctica
          matchExpressions:
          - key: app
            operator: "DoesNotExist"
          - key: domain
            operator: "In"
            values: ["staging"]
    dns:
      # default: config is empty
      config:
        options:
        - name: ndots
          value: "1"
      # default: policy is empty
      policy: "ClusterFirst"
    extra_labels: {}
    # default: host_aliases is an empty list
    host_aliases:
    - ip: "192.168.1.100"
      hostnames:
      - "foo.local"
      - "bar.local"
    hpa:
      api_version: ""
      # default: spec is empty
      spec:
        maxReplicas: 2
        minReplicas: 1
        metrics:
        - type: Resource
          resource:
            name: cpu
            target:
              type: Utilization
              averageUtilization: 50
    image_digest: ""
    image_name: ""
    image_pull_policy: "IfNotPresent"
    # default: image_pull_secrets is an empty list
    image_pull_secrets: ["image.pull.secret"]
    image_version: ""
    ingress:
      # default: additional_labels is empty
      additional_labels:
        ingressAdditionalLabel: "ingressAdditionalLabelValue"
      class_name: "nginx"
      # default: enabled is undefined
      enabled: false
      # default: override_yaml is undefined
      override_yaml:
        metadata:
          annotations:
            nginx.ingress.kubernetes.io/secure-backends: "true"
            nginx.ingress.kubernetes.io/backend-protocol: "HTTPS"
        spec:
          rules:
          - http:
              paths:
              - path: "/kiali"
                pathType: Prefix
                backend:
                  service:
                    name: "kiali"
                    port:
                      number: 20001
    instance_name: "kiali"
    logger:
      log_level: "info"
      log_format: "text"
      sampler_rate: "1"
      time_field_format: "2006-01-02T15:04:05Z07:00"
    namespace: "istio-system"
    network_policy:
      enabled: true
    # default: node_selector is empty
    node_selector:
      nodeSelector: "nodeSelectorValue"
    # default: pod_annotations is empty
    pod_annotations:
      proxy.istio.io/config: '{ "holdApplicationUntilProxyStarts": true }'
    # default: pod_labels is empty
    pod_labels:
      sidecar.istio.io/inject: "true"
    priority_class_name: ""
    probes:
      liveness:
        initial_delay_seconds: 5
        period_seconds: 30
      readiness:
        initial_delay_seconds: 5
        period_seconds: 30
      startup:
        failure_threshold: 6
        initial_delay_seconds: 30
        period_seconds: 10
    remote_cluster_resources_only: false
    replicas: 1
    # default: resources is undefined
    resources:
      requests:
        cpu: "10m"
        memory: "64Mi"
      limits:
        memory: "1Gi"
    secret_name: "kiali"
    security_context: {}
    # default: service_annotations is empty
    service_annotations:
      svcAnnotation: "svcAnnotationValue"
    # default: service_type is undefined
    service_type: "NodePort"
    strategy: {}
    # default: tolerations is an empty list
    tolerations:
    - key: "example-key"
      operator: "Exists"
      effect: "NoSchedule"
    topology_spread_constraints: []
    version_label: ""
    view_only_mode: false

  # default: extensions is an empty list
  extensions:
  - enabled: true
    name: "skupper"

  external_services:
    custom_dashboards:
      discovery_auto_threshold: 10
      discovery_enabled: "auto"
      enabled: true
      is_core: false
      namespace_label: "namespace"
      prometheus:
        auth:
          insecure_skip_verify: false
          password: ""
          token: ""
          type: "none"
          use_kiali_token: false
          username: ""
        cache_duration: 7
        cache_enabled: true
        cache_expiration: 300
        # default: custom_headers is empty
        custom_headers:
          customHeader1: "customHeader1Value"
        health_check_url: ""
        is_core: true
        # default: query_scope is empty
        query_scope:
          mesh_id: "mesh-1"
          cluster: "cluster-east"
        thanos_proxy:
          enabled: false
          retention_period: "7d"
          scrape_interval: "30s"
        url: ""
    grafana:
      auth:
        insecure_skip_verify: false
        password: ""
        token: ""
        type: "none"
        use_kiali_token: false
        username: ""
      dashboards:
      - name: "Istio Service Dashboard"
        variables:
          datasource: "var-datasource"
          namespace: "var-namespace"
          service: "var-service"
          version: "var-version"
      - name: "Istio Workload Dashboard"
        variables:
          datasource: "var-datasource"
          namespace: "var-namespace"
          workload: "var-workload"
          version: "var-version"
      - name: "Istio Mesh Dashboard"
      - name: "Istio Control Plane Dashboard"
      - name: "Istio Performance Dashboard"
      - name: "Istio Wasm Extension Dashboard"
      datasource_uid: ""
      enabled: true
      external_url: ""
      health_check_url: ""
      internal_url: "http://grafana.istio-system:3000"
      is_core: false
    istio:
      component_status:
        components: []
        enabled: true
      gateway_api_classes: []
      gateway_api_classes_label_selector: ""
      istio_api_enabled: true
      istio_identity_domain: "svc.cluster.local"
      istiod_polling_interval_seconds: 20
      validation_change_detection_enabled: true
      validation_reconcile_interval: "1m"
    perses:
      auth:
        insecure_skip_verify: false
        password: ""
        type: "none"
        use_kiali_token: false
        username: ""
      dashboards:
      - name: "Istio Service Dashboard"
        variables:
          datasource: "var-datasource"
          namespace: "var-namespace"
          service: "var-service"
          version: "var-version"
      - name: "Istio Workload Dashboard"
        variables:
          datasource: "var-datasource"
          namespace: "var-namespace"
          workload: "var-workload"
          version: "var-version"
      - name: "Istio Mesh Dashboard"
      - name: "Istio Control Plane Dashboard"
      - name: "Istio Performance Dashboard"
      - name: "Istio Wasm Extension Dashboard"
      enabled: false
      external_url: ""
      health_check_url: ""
      internal_url: ""
      is_core: false
      project: "istio"
      url_format: ""
    prometheus:
      auth:
        insecure_skip_verify: false
        password: ""
        token: ""
        type: "none"
        use_kiali_token: false
        username: ""
      cache_duration: 7
      cache_enabled: true
      cache_expiration: 300
      # default: custom_headers is empty
      custom_headers:
        customHeader1: "customHeader1Value"
      health_check_url: ""
      is_core: true
      # default: query_scope is empty
      query_scope:
        mesh_id: "mesh-1"
        cluster: "cluster-east"
      thanos_proxy:
        enabled: false
        retention_period: "7d"
        scrape_interval: "30s"
      url: ""
    tracing:
      auth:
        insecure_skip_verify: false
        password: ""
        token: ""
        type: "none"
        use_kiali_token: false
        username: ""
      # default: custom_headers is empty
      custom_headers:
        customHeader1: "customHeader1Value"
      disable_version_check: false
      enabled: false
      external_url: ""
      grpc_port: 9095
      health_check_url: ""
      internal_url: ""
      is_core: false
      namespace_selector: true
      provider: "jaeger"
      # default: query_scope is empty
      query_scope:
        mesh_id: "mesh-1"
        cluster: "cluster-east"
      query_timeout: 5
      tempo_config:
        cache_capacity: 200
        cache_enabled: true
        datasource_uid: ""
        name: ""
        namespace: ""
        org_id: ""
        tenant: ""
        url_format: "grafana"
      use_grpc: true
      use_waypoint_name: false
      whitelist_istio_system: ["jaeger-query", "istio-ingressgateway"]

  health_config:
    compute:
      duration: "5m"
      refresh_interval: "3m"
      timeout: "10m"
    # default: rate is an empty list
    rate:
    - namespace: ".*"
      kind: ".*"
      name: ".*"
      tolerance:
      - protocol: "http"
        direction: ".*"
        code: "[1234]00"
        degraded: 5
        failure: 10

  identity:
    # default: cert_file is undefined
    cert_file: ""
    # default: private_key_file is undefined
    private_key_file: ""

  istio_labels:
    app_label_name: ""
    egress_gateway_label: "istio=egressgateway"
    ingress_gateway_label: "istio=ingressgateway"
    injection_label_name: "istio-injection"
    injection_label_rev: "istio.io/rev"
    version_label_name: ""

  kiali_feature_flags:
    clustering:
      enable_exec_provider: false
    # default: custom_workload_types is an empty list
    custom_workload_types:
    - group: "argoproj.io"
      version: "v1alpha1"
      kind: "Rollout"
    disabled_features: []
    istio_annotation_action: true
    istio_injection_action: true
    istio_upgrade_action: false
    ui_defaults:
      graph:
        find_options:
        - description: "Find: slow edges (> 1s)"
          expression: "rt > 1000"
        - description: "Find: unhealthy nodes"
          expression: "! healthy"
        - description: "Find: unknown nodes"
          expression: "name = unknown"
        hide_options:
        - description: "Hide: healthy nodes"
          expression: "healthy"
        - description: "Hide: unknown nodes"
          expression: "name = unknown"
        settings:
          animation: "point"
        traffic:
          ambient: "total"
          grpc: "requests"
          http: "requests"
          tcp: "sent"
      i18n:
        language: "en"
        show_selector: false
      list:
        include_health: true
        include_istio_resources: true
        include_validations: true
        show_include_toggles: false
      mesh:
        find_options:
        - description: "Find: unhealthy nodes"
          expression: "! healthy"
        hide_options:
        - description: "Hide: healthy nodes"
          expression: "healthy"
      # default: metrics_inbound is undefined
      metrics_inbound:
        aggregations:
        - display_name: "Istio Network"
          label: "topology_istio_io_network"
          single_selection: false
        - display_name: "Istio Revision"
          label: "istio_io_rev"
          single_selection: false
      # default: metrics_outbound is undefined
      metrics_outbound:
        aggregations:
        - display_name: "Istio Network"
          label: "topology_istio_io_network"
          single_selection: false
        - display_name: "Istio Revision"
          label: "istio_io_rev"
          single_selection: false
      metrics_per_refresh: "1m"
      # default: namespaces is an empty list
      namespaces: ["istio-system"]
      refresh_interval: "1m"
      tracing:
        limit: 100
    validations:
      ignore: ["KIA1301"]
      skip_wildcard_gateway_hosts: false

  kubernetes_config:
    burst: 200
    cache_duration: 300
    cache_token_namespace_duration: 10
    excluded_workloads:
    - "CronJob"
    - "DeploymentConfig"
    - "Job"
    - "ReplicationController"
    qps: 175

  login_token:
    expiration_seconds: 86400
    signing_key: ""

  server:
    address: ""
    audit_log: true
    cors_allow_all: false
    gzip_enabled: true
    # default: node_port is undefined
    node_port: 32475
    observability:
      metrics:
        enabled: true
        health_status:
          enabled: false
          max_consecutive_na: 3
        port: 9090
      tracing:
        collector_type: "otel"
        collector_url: "jaeger-collector.istio-system:4318"
        enabled: false
        otel:
          ca_name: ""
          protocol: "http"
          skip_verify: false
          tls_enabled: false
        sampling_rate: 0.5
    port: 20001
    profiler:
      enabled: false
    require_auth: false
    web_fqdn: ""
    web_history_mode: "browser"
    web_port: ""
    web_root: ""
    web_schema: ""
    write_timeout: "60s"

Validating your Kiali CR

The Kiali CR has a CRD Schema so it will be validated when you create or update it in your cluster.

Properties

.spec

(object)

This is the CRD for the resources called Kiali CRs. The Kiali Operator will watch for resources of this type and when it detects a Kiali CR has been added, deleted, or modified, it will install, uninstall, and update the associated Kiali Server installation. The settings here will configure the Kiali Server as well as the Kiali Operator. All of these settings will be stored in the Kiali ConfigMap. Do not modify the ConfigMap; it will be managed by the Kiali Operator. Only modify the Kiali CR when you want to change a configuration setting.

.spec.additional_display_details

(array)

A list of additional details that Kiali will look for in annotations. When found on any workload or service, Kiali will display the additional details in the respective workload or service details page. This is typically used to inject some CI metadata or documentation links into Kiali views. For example, by default, Kiali will recognize these annotations on a service or workload (e.g. a Deployment, StatefulSet, etc.):

spec:
  annotations:
    kiali.io/api-spec: http://list/to/my/api/doc
    kiali.io/api-type: rest

Note that if you change this setting for your own custom annotations, keep in mind that it would override the current default. So you would have to add the default setting as shown in the example CR if you want to preserve the default links.

(string)

DEPRECATED AFTER v1.73: A Kubernetes label selector expression that will be used to include namespaces.

.spec.auth

(object)

.spec.auth.openid

(object)

(string)

DEPRECATED since v2.21: Use auth.openid.discovery_override.authorization_endpoint instead. The URL of the provider’s authorization endpoint.

.spec.auth.openid.client_id

(string)

(object)

To learn more about these settings and how to configure the OpenShift authentication strategy, read the documentation at https://kiali.io/docs/configuration/authentication/openshift/

.spec.auth.openshift.auth_timeout

(integer)

DEPRECATED AFTER v1.73: The amount of time in seconds Kiali will wait for a response from the OpenShift API when requesting authentication information.

.spec.auth.openshift.client_id_prefix

(string)

DEPRECATED AFTER v1.73: A prefix that will be applied to the OpenShift OAuth client identifier.

.spec.auth.openshift.insecure_skip_verify_tls

(boolean)

Set true to skip verifying certificate validity when Kiali contacts OpenShift over https.

.spec.auth.openshift.redirect_uris

(array)

Custom redirect URIs for the OpenShift OAuth client. These URIs specify where users will be redirected after successful authentication. If not specified, Kiali will automatically generate appropriate redirect URIs based on the Kiali server’s route. You normally do not have to set this unless you are creating remote cluster resources (see deployment.remote_cluster_resources_only) with auth.strategy set to openshift.

.spec.auth.openshift.redirect_uris[*]

(string)

.spec.auth.openshift.token_inactivity_timeout

(integer)

Sets the maximum time in seconds that can elapse between consecutive uses of an OAuth access token before it expires due to inactivity. This helps improve security by automatically expiring unused tokens. If set to 0, tokens will not expire due to inactivity. Note that OpenShift may enforce minimum values for this setting, and existing tokens are not affected by changes to this configuration.

.spec.auth.openshift.token_max_age

(integer)

Sets the absolute maximum lifetime in seconds for OAuth access tokens, regardless of activity. After this time period, tokens will expire and users must re-authenticate. If set to 0, tokens will not have an absolute expiration time and will only expire due to inactivity (if token_inactivity_timeout is configured).

.spec.auth.strategy

(string)

Determines what authentication strategy to use when users log into Kiali. Options are anonymous, token, openshift, openid, or header.

Choose anonymous to allow full access to Kiali without requiring any credentials.
Choose token to allow access to Kiali using service account tokens, which controls access based on RBAC roles assigned to the service account.
Choose openshift to use the OpenShift OAuth login which controls access based on the individual’s RBAC roles in OpenShift. Not valid for non-OpenShift environments.
Choose openid to enable OpenID Connect-based authentication. Your cluster is required to be configured to accept the tokens issued by your IdP. There are additional required configurations for this strategy. See below for the additional OpenID configuration section.
Choose header when Kiali is running behind a reverse proxy that will inject an Authorization header and potentially impersonation headers.

When empty, this value will default to openshift on OpenShift and token on other Kubernetes environments.

.spec.chat_ai

(object)

.spec.chat_ai.default_provider

(string)

The default provider to use for the ChatAI feature. This is the provider that will be used if no provider is specified in the request.

.spec.chat_ai.enabled

(boolean)

Enable or disable the ChatAI feature.

.spec.chat_ai.providers

(array)

A list of providers that can be used for the ChatAI feature. This is the list of providers that will be available to the user to choose from.

(array)

A list of models that can be used for the ChatAI feature. This is the list of models that will be available to the user to choose from.

(array)

A list of clusters that the Kiali Server can access. You need to specify the remote clusters here if ‘autodetect_secrets.enabled’ is false.

.spec.clustering.clusters[*]

(object)

.spec.clustering.clusters[*].name

(string)

The name of the cluster.

.spec.clustering.clusters[*].secret_name

(string)

The name of the secret that contains the credentials necessary to connect to the remote cluster. This secret must exist in the Kiali deployment namespace. If a secret name is not provided then it’s assumed that the cluster is inaccessible.

.spec.clustering.enable_exec_provider

(boolean)

Flag to enable exec provider for clustering authentication.

.spec.clustering.ignore_home_cluster

(boolean)

Set to true for an external Kiali deployment, or if Kiali should not try to discover Istio on the home cluster. When set to true, it is required to set kubernetes_config.cluster_name.

.spec.clustering.kiali_urls

(array)

A map between cluster name, instance name and namespace to a Kiali URL. Will be used showing the Mesh page’s Kiali URLs. The Kiali service’s ‘kiali.io/external-url’ annotation will be overridden when this property is set.

.spec.clustering.kiali_urls[*]

(object)

.spec.clustering.kiali_urls[*].cluster_name

(string)

The name of the cluster.

.spec.clustering.kiali_urls[*].instance_name

(string)

The instance name of this Kiali installation. This should be the value used in deployment.instance_name for Kiali resource name.

.spec.clustering.kiali_urls[*].namespace

(string)

The namespace into which Kiali is installed.

.spec.clustering.kiali_urls[*].url

(string)

The URL of Kiali in the cluster.

.spec.custom_dashboards

(array)

A list of user-defined custom monitoring dashboards that you can use to generate metrics charts for your applications. The server has some built-in dashboards; if you define a custom dashboard here with the same name as a built-in dashboard, your custom dashboard takes precedence and will overwrite the built-in dashboard. You can disable one or more of the built-in dashboards by simply defining an empty dashboard.

An example of an additional user-defined dashboard,

spec:
  custom_dashboards:
  - name: myapp
    title: My App Metrics
    items:
    - chart:
        name: "Thread Count"
        spans: 4
        metricName: "thread-count"
        dataType: "raw"

An example of disabling a built-in dashboard (in this case, disabling the Envoy dashboard),

spec:
  custom_dashboards:
  - name: envoy

To learn more about custom monitoring dashboards, see the documentation at https://kiali.io/docs/configuration/custom-dashboard/

(object)

Additional custom yaml to add to the service definition. This is used mainly to customize the service type. For example, if the deployment.service_type is set to ‘LoadBalancer’ and you want to set the loadBalancerIP, you can do so here with: additional_service_yaml: { 'loadBalancerIP': '78.11.24.19' }. Another example would be if the deployment.service_type is set to ‘ExternalName’ you will need to configure the name via: additional_service_yaml: { 'externalName': 'my.kiali.example.com' }. A final example would be if external IPs need to be set: additional_service_yaml: { 'externalIPs': ['80.11.12.10'] }

.spec.deployment.affinity

(object)

Affinity definitions that are to be used to define the nodes where the Kiali pod should be constrained. See the Kubernetes documentation on Assigning Pods to Nodes for the proper syntax for these three different affinity types.

.spec.deployment.affinity.node

(object)

.spec.deployment.affinity.pod

(object)

.spec.deployment.affinity.pod_anti

(object)

.spec.deployment.cluster_wide_access

(boolean)

Determines if the Kiali server will be granted cluster-wide permissions to see all namespaces. When true, this provides more efficient caching within the Kiali server. It must be true if deployment.discovery_selectors.default is left unset. To limit the namespaces for which Kiali has permissions, set to false and define the desired selectors in deployment.discovery_selectors.default.

.spec.deployment.configmap_annotations

(object)

Custom annotations to be created on the Kiali ConfigMap.

.spec.deployment.custom_envs

(array)

Defines additional environment variables to be set in the Kiali server pod. This is typically used for (but not limited to) setting proxy environment variables such as HTTP_PROXY, HTTPS_PROXY, and/or NO_PROXY.

.spec.deployment.custom_envs[*]

(object)

.spec.deployment.custom_envs[*].name

(string) *Required*

The name of the custom environment variable.

.spec.deployment.custom_envs[*].value

(string) *Required*

The value of the custom environment variable.

.spec.deployment.custom_secrets

(array)

Defines additional secrets that are to be mounted in the Kiali pod.

These are useful to contain client certificates that are used by Kiali to authenticate to third party systems using mTLS (for example, see external_services.tracing.auth.cert_file and external_services.tracing.auth.key_file).

These secrets must be created by an external mechanism. Kiali will not generate these secrets; it is assumed these secrets are externally managed. You can define 0, 1, or more secrets. An example configuration is,

spec:
  deployment:
    custom_secrets:
    - name: mysecret
      mount: /mysecret-path
    - name: my-other-secret
      mount: /my-other-secret-location
      optional: true

spec:
  deployment:
    host_aliases:
    - ip: 192.168.1.100
      hostnames:
      - "foo.local"
      - "bar.local"

For details on the content of this setting, see https://kubernetes.io/docs/tasks/network/customize-hosts-file-for-pods/#adding-additional-entries-with-hostaliases

.spec.deployment.host_aliases[*]

(object)

.spec.deployment.host_aliases[*].hostnames

(array)

.spec.deployment.host_aliases[].hostnames[]

(string)

.spec.deployment.host_aliases[*].ip

(string)

.spec.deployment.hpa

(object)

Determines what (if any) HorizontalPodAutoscaler should be created to autoscale the Kiali pod. A typical way to configure HPA for Kiali is,

spec:
  deployment:
    hpa:
      api_version: "autoscaling/v2"
      spec:
        maxReplicas: 2
        minReplicas: 1
        metrics:
        - type: Resource
          resource:
            name: cpu
            target:
              type: Utilization
              averageUtilization: 50

.spec.deployment.image_pull_secrets[*]

(string)

.spec.deployment.image_version

(string)

Determines which version of Kiali to install. Choose ‘lastrelease’ to use the last Kiali release. Choose ‘latest’ to use the latest image (which may or may not be a released version of Kiali). Choose ‘operator_version’ to use the image whose version is the same as the operator version. Otherwise, you can set this to any valid Kiali version (such as ‘v1.0’) or any valid Kiali digest hash (if you set this to a digest hash, you must indicate the digest in deployment.image_digest).

Note that if this is set to ‘latest’ then the deployment.image_pull_policy will be set to ‘Always’.

Note that override_yaml.metadata.labels is not allowed - you cannot override the labels; to add labels to the default set of labels, use the deployment.ingress.additional_labels setting. Example,

spec:
  deployment:
    ingress:
      override_yaml:
        metadata:
          annotations:
            nginx.ingress.kubernetes.io/secure-backends: "true"
            nginx.ingress.kubernetes.io/backend-protocol: "HTTPS"
        spec:
          rules:
          - http:
              paths:
              - path: /kiali
                pathType: Prefix
                backend:
                  service
                    name: "kiali"
                    port:
                      number: 20001

.spec.deployment.ingress.override_yaml.metadata

(object)

.spec.deployment.ingress.override_yaml.metadata.annotations

(object)

(object)

A set of node labels that dictate onto which node the Kiali pod will be deployed.

.spec.deployment.pod_annotations

(object)

Custom annotations to be created on the Kiali pod. By default, the following annotation is applied:

proxy.istio.io/config: '{ "holdApplicationUntilProxyStarts": true }'

If you define your own pod_annotations, they will overwrite this default. To retain the default behavior while adding your own annotations, make sure to include this value alongside your custom annotations.

.spec.deployment.pod_labels

(object)

Custom labels to be created on the Kiali pod. An example use for this setting is to inject an Istio sidecar such as,

sidecar.istio.io/inject: "true"

.spec.deployment.priority_class_name

(string)

The priorityClassName used to assign the priority of the Kiali pod.

.spec.deployment.probes

.spec.deployment.probes.startup.initial_delay_seconds

(integer)

.spec.deployment.probes.startup.period_seconds

(integer)

.spec.deployment.remote_cluster_resources_only

(boolean)

When true, only those resources necessary for a remote Kiali Server to access this cluster are created (such as the service account and roles/bindings). There will be no Kiali Server deployment/pod created when this is true.

.spec.deployment.replicas

(integer)

The replica count for the Kiail deployment. If deployment.hpa is specified, this setting is ignored.

.spec.deployment.resources

(object)

Defines compute resources that are to be given to the Kiali pod’s container. The value is a dict as defined by Kubernetes. See the Kubernetes documentation (https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container). If you set this to an empty dict ({}) then no resources will be defined in the Deployment. If you do not set this at all, the default is,

spec:
  deployment:
    resources:
      requests:
        cpu: "10m"
        memory: "64Mi"
      limits:
        memory: "1Gi"

(string)

Maximum TLS version (e.g., TLSv1.3, TLSv1.2).

.spec.deployment.tls_config.min_version

(string)

Minimum TLS version (e.g., TLSv1.3, TLSv1.2).

.spec.deployment.tls_config.source

(string)

TLS policy source: ‘auto’ to use OpenShift TLSSecurityProfile; ‘config’ to use explicit settings.

.spec.deployment.tolerations

(array)

A list of tolerations which declare which node taints Kiali can tolerate. See the Kubernetes documentation on Taints and Tolerations for more details.

.spec.deployment.tolerations[*]

(object)

.spec.deployment.topology_spread_constraints

(array)

A list of constraints which control how the Kiali pods are spread across your cluster to help achieve high availability as well as efficient resource utilization. See the Kubernetes documentation on Topology Spread Constraints for more details.

.spec.deployment.topology_spread_constraints[*]

(object)

.spec.deployment.verbose_mode

(boolean)

DEPRECATED AFTER v1.73: When true, Kiali will log additional debug information about its operations.

.spec.deployment.version_label

(string)

Kiali resources will be assigned a ‘version’ label when they are deployed. This setting determines what value those ‘version’ labels will have. When empty, its default will be determined as follows,

If deployment.image_version is ‘latest’, version_label will be fixed to ‘master’.
If deployment.image_version is ‘lastrelease’, version_label will be fixed to the last Kiali release version string.
If deployment.image_version is anything else, version_label will be that value, too.

.spec.deployment.view_only_mode

(boolean)

When true, Kiali will be in ‘view only’ mode, allowing the user to view and retrieve management and monitoring data for the service mesh, but not allow the user to modify the service mesh.

.spec.extensions

(array)

Defines third-party extensions whose metrics can be integrated into the Kiali traffic graph.

.spec.extensions[*]

(object)

.spec.extensions[*].enabled

(boolean)

Determines if the Kiali traffic graph should incorporate the extension’s metrics.

.spec.extensions[*].name

(string)

The name that is used to identify the metric time series for the extension.

.spec.external_services

.spec.external_services.grafana

(object)

Configuration used to access the Grafana dashboards.

.spec.external_services.grafana.auth

(object)

Settings used to authenticate with the Grafana instance.

.spec.external_services.grafana.auth.ca_file

.spec.external_services.grafana.dashboards[*].name

(string)

The name of the Grafana dashboard.

(array)

A list of Perses dashboards that Kiali can link to.

.spec.external_services.perses.dashboards[*]

(object)

.spec.external_services.perses.dashboards[*].name

(string)

The name of the Perses dashboard.

(string)

.spec.external_services.prometheus.auth.cert_file

(string)

The client certificate file to use when accessing Prometheus using https with mTLS. An empty string means no client certificate is used. May refer to a secret.

.spec.external_services.prometheus.auth.insecure_skip_verify

(boolean)

(string)

.spec.external_services.tracing

(object)

Configuration used to access the Tracing (Jaeger or Tempo) dashboards.

.spec.external_services.tracing.auth

(object)

Settings used to authenticate with the Tracing server instance.

(string)

The maximum time allowed for a single health refresh cycle. If exceeded, the refresh is cancelled and the next cycle starts on schedule.

.spec.health_config.rate

(array)

.spec.health_config.rate[*]

(object)

.spec.health_config.rate[*].kind

(string)

The type of resource that this configuration applies to. This is a regular expression.

.spec.health_config.rate[*].name

(string)

The name of a resource that this configuration applies to. This is a regular expression.

.spec.health_config.rate[*].namespace

(string)

The name of the namespace that this configuration applies to. This is a regular expression.

.spec.health_config.rate[*].tolerance

(array)

A list of tolerances for this configuration.

(boolean)

DEPRECATED AFTER v1.73: When true, certificate information indicators will be displayed.

.spec.kiali_feature_flags.certificates_information_indicators.secrets

(array)

DEPRECATED AFTER v1.73: List of secrets that contain certificate information.

.spec.kiali_feature_flags.certificates_information_indicators.secrets[*]

(string)

.spec.kiali_feature_flags.clustering

(object)

DEPRECATED AFTER v1.73: Multi-cluster related features.

.spec.kiali_feature_flags.clustering.autodetect_secrets

(object)

DEPRECATED AFTER v1.73: Settings to allow cluster secrets to be auto-detected.

.spec.kiali_feature_flags.clustering.autodetect_secrets.enabled

(boolean)

DEPRECATED AFTER v1.73: If true then remote cluster secrets will be autodetected.

.spec.kiali_feature_flags.clustering.autodetect_secrets.label

(string)

DEPRECATED AFTER v1.73: The name and value of a label that exists on all remote cluster secrets.

.spec.kiali_feature_flags.clustering.clusters

(array)

DEPRECATED AFTER v1.73: A list of clusters that the Kiali Server can access.

.spec.kiali_feature_flags.clustering.clusters[*]

(object)

.spec.kiali_feature_flags.clustering.clusters[*].name

(string)

DEPRECATED AFTER v1.73: The name of the cluster.

.spec.kiali_feature_flags.clustering.clusters[*].secret_name

(string)

DEPRECATED AFTER v1.73: The name of the secret that contains the credentials necessary to connect to the remote cluster.

.spec.kiali_feature_flags.clustering.enable_exec_provider

(boolean)

DEPRECATED AFTER v1.73: Flag to enable exec provider for clustering authentication.

.spec.kiali_feature_flags.clustering.kiali_urls

(array)

DEPRECATED AFTER v1.73: A map between cluster name, instance name and namespace to a Kiali URL.

(object)

.spec.kiali_feature_flags.custom_workload_types[*].group

(string) *Required*

The API group of the custom workload type (e.g., ‘argoproj.io’).

.spec.kiali_feature_flags.custom_workload_types[*].kind

(string) *Required*

The kind of the custom workload type (e.g., ‘Rollout’).

.spec.kiali_feature_flags.custom_workload_types[*].version

(string) *Required*

The API version of the custom workload type (e.g., ‘v1alpha1’).

.spec.kiali_feature_flags.disabled_features

(array)

There may be some features that admins do not want to be accessible to users (even in ‘view only’ mode). In this case, this setting allows you to disable one or more of those features entirely.

(array)

A list of commonly used and useful find expressions that will be provided to the user out-of-box.

.spec.kiali_feature_flags.ui_defaults.graph.find_options[*]

(object)

.spec.kiali_feature_flags.ui_defaults.graph.find_options[*].auto_select

(boolean)

If true this option will be selected and take effect automatically. Note that only one option in the list can have this value be set to true.

.spec.kiali_feature_flags.ui_defaults.graph.find_options[*].description

(string)

Human-readable text to let the user know what the expression does.

.spec.kiali_feature_flags.ui_defaults.graph.find_options[*].expression

(string)

The find expression.

.spec.kiali_feature_flags.ui_defaults.graph.hide_options

(array)

A list of commonly used and useful hide expressions that will be provided to the user out-of-box.

(object)

.spec.kiali_feature_flags.ui_defaults.mesh.find_options[*].auto_select

(boolean)

If true this option will be selected and take effect automatically. Note that only one option in the list can have this value be set to true.

.spec.kiali_feature_flags.ui_defaults.mesh.find_options[*].description

(string)

Human-readable text to let the user know what the expression does.

.spec.kiali_feature_flags.ui_defaults.mesh.find_options[*].expression

(string)

The find expression.

.spec.kiali_feature_flags.ui_defaults.mesh.hide_options

(array)

A list of commonly used and useful hide expressions that will be provided to the user out-of-box.

.spec.kiali_feature_flags.ui_defaults.mesh.hide_options[*]

(object)

.spec.kiali_feature_flags.ui_defaults.mesh.hide_options[*].auto_select

(boolean)

If true this option will be selected and take effect automatically. Note that only one option in the list can have this value be set to true.

.spec.kiali_feature_flags.ui_defaults.mesh.hide_options[*].description

(string)

Human-readable text to let the user know what the expression does.

.spec.kiali_feature_flags.ui_defaults.mesh.hide_options[*].expression

(string)

The hide expression.

.spec.kiali_feature_flags.ui_defaults.metrics_inbound

(object)

Additional label aggregation for inbound metric pages in detail pages. You will see these configurations in the ‘Metric Settings’ drop-down. An example,

spec:
  kiali_feature_flags:
    ui_defaults:
      metrics_inbound:
        aggregations:
        - display_name: Istio Network
          label: topology_istio_io_network
        - display_name: Istio Revision
          label: istio_io_rev

.spec.kiali_feature_flags.ui_defaults.metrics_inbound.aggregations

(array)

.spec.kiali_feature_flags.ui_defaults.metrics_inbound.aggregations[*]

(object)

.spec.kiali_feature_flags.ui_defaults.metrics_inbound.aggregations[*].display_name

(string)

.spec.kiali_feature_flags.ui_defaults.metrics_inbound.aggregations[*].label

(string)

.spec.kiali_feature_flags.ui_defaults.metrics_inbound.aggregations[*].single_selection

(boolean)

Flag to indicate if only one option can be selected for this aggregation.

.spec.kiali_feature_flags.ui_defaults.metrics_outbound

(object)

Additional label aggregation for outbound metric pages in detail pages. You will see these configurations in the ‘Metric Settings’ drop-down. An example,

spec:
  kiali_feature_flags:
    ui_defaults:
      metrics_outbound:
        aggregations:
        - display_name: Istio Network
          label: topology_istio_io_network
        - display_name: Istio Revision
          label: istio_io_rev

.spec.kiali_feature_flags.ui_defaults.metrics_outbound.aggregations

(array)

.spec.kiali_feature_flags.ui_defaults.metrics_outbound.aggregations[*]

(object)

.spec.kiali_feature_flags.ui_defaults.metrics_outbound.aggregations[*].display_name

(string)

.spec.kiali_feature_flags.ui_defaults.metrics_outbound.aggregations[*].label

(string)

.spec.kiali_feature_flags.ui_defaults.metrics_outbound.aggregations[*].single_selection

(boolean)

Flag to indicate if only one option can be selected for this aggregation.

.spec.kiali_feature_flags.ui_defaults.metrics_per_refresh

(string)

Duration of metrics to fetch on each refresh. Value must be one of: 1m, 2m, 5m, 10m, 30m, 1h, 3h, 6h, 12h, 1d, 7d, or 30d

.spec.kiali_feature_flags.ui_defaults.namespaces

(array)

Default selections for the namespace selection dropdown. Non-existent or inaccessible namespaces will be ignored. Omit or set to an empty array for no default namespaces.

(integer)

This Kiali cache is a list of namespaces per user. This is typically a short-lived cache compared with the duration of the namespace cache defined by the cache_duration setting. This is specified in seconds.

.spec.kubernetes_config.cluster_name

(string)

The name of the cluster Kiali is deployed in. This is also known as the home cluster. This is only used in multi cluster environments. This must be set when clustering.ignore_home_cluster=true. If not set, Kiali will try to auto detect the cluster name from the Istiod deployment or use the default ‘Kubernetes’.

.spec.kubernetes_config.excluded_workloads

(array)

List of controllers that won’t be used for Workload calculation. Kiali queries Deployment, ReplicaSet, ReplicationController, DeploymentConfig, StatefulSet, Job and CronJob controllers. Deployment and ReplicaSet will be always queried, but ReplicationController, DeploymentConfig, StatefulSet, Job and CronJobs can be skipped from Kiali workloads queries if they are present in this list.

.spec.kubernetes_config.excluded_workloads[*]

(string)

.spec.kubernetes_config.qps

(integer)

The QPS value of the Kubernetes client.

Kiali has support for Istio multi-cluster installations.

Kiali multi-cluster

Before proceeding with the setup, ensure you meet the requirements.

Requirements

Aggregated metrics and traces. Kiali needs a single endpoint for metrics and a single endpoint for traces where it can consume aggregated metrics/traces across all clusters. There are many ways to aggregate metrics/traces such as Prometheus federation or using OTEL collector pipelines but setting these up are outside of the scope of Kiali.
Anonymous, OpenID or OpenShift authentication strategy. The unified multi-cluster configuration currently only supports anonymous, OpenID and OpenShift authentication strategies. In addition, current support varies by provider for OpenID across clusters.

Setup

The unified Kiali multi-cluster setup requires the Kiali Service Account (SA) to have read access to each Kubernetes cluster in the mesh. This is separate from the user credentials that are required when a user logs into Kiali. The user credentials are used to check user access to a namespace and to perform write operations. In anonymous mode, the Kiali SA is used for all operations. Write access need not be required if you only want to give Kiali “view-only” capabilities. To give the Kiali SA access to each remote cluster, a kubeconfig with credentials needs to be created and mounted into the Kiali pod. While the location of Kiali in relation to the controlplane and dataplane may change depending on your Istio deployment model, the requirements will remain the same.

Although not required for some deployment models, it is recommended that the Kiali namespace and instance name be consistent across all clusters, including remote clusters without a Kiali server deployed. If not using default values, the following Kiali CR settings should typically have consistent values:

spec.deployment.namespace
spec.deployment.instance_name

If you would like to keep a separate Kiali per cluster and do not want to give Kiali access to remote clusters, you can still manually specify the remote cluster and remote Kiali URLs in the Kiali configuration and the UI will try to provide links to the remote Kiali where appropriate. See below for more details.

Create a SA and its associated resources on the remote cluster. In order for Kiali to access a remote cluster, you first must create a SA and its role/role binding with the proper permissions. There are three ways to do this:
- Kiali Operator (recommended): Deploy the Kiali Operator on the remote cluster and create a Kiali CR with spec.deployment.remote_cluster_resources_only: true. The Operator will create and manage the SA, role, and role binding. Deleting the Kiali CR will remove these resources.
- Kiali Server helm chart: Use the kiali-server helm chart with --set deployment.remote_cluster_resources_only=true.
- kiali-prepare-remote-cluster.sh script: Use the script with --process-remote-resources true. See the script usage notes in step 2 below.
If using the openshift auth strategy, there are additional OpenShift-specific requirements for this step. See the OpenShift multi-cluster documentation before proceeding.
Create a remote cluster secret. Kiali needs a kubeconfig stored in a Kubernetes secret in order to access the remote cluster. This secret contains the SA token from step 1 and the remote cluster’s connection info. A remote cluster secret will look something like this:

apiVersion: v1
kind: Secret
metadata:
  name: my-cluster-name
  labels:
    kiali.io/multiCluster: "true"
stringData:
  my-cluster-name: |
    apiVersion: v1
    kind: Config
    preferences: {}
    current-context: my-cluster-name
    contexts:
    - name: my-cluster-name
      context:
        cluster: my-cluster-name
        user: my-cluster-name
    users:
    - name: my-cluster-name
      user:
        token: <...the long remote cluster SA token string goes here...>
    clusters:
    - name: my-cluster-name
      cluster:
        server: <...the URL to your remote cluster goes here...>
        certificate-authority-data: <...the long CA data goes here...>

You can place multiple kubeconfigs in a single secret, each under its own key in stringData where the key name must be the name of the remote cluster. Name the secret kiali-multi-cluster-secret for the added benefit of having the operator automatically detect this secret without having to configure anything within the Kiali CR. If you do name the secret kiali-multi-cluster-secret you also can add to it the label kiali.io/kiali-multi-cluster-secret="true" which will tell the operator to restart the Kiali Server pod automatically when the secret changes thus allowing the server to pick up the changes immediately. A multi-cluster secret with two clusters named my-cluster-name and my-other-cluster would look like this:

apiVersion: v1
kind: Secret
metadata:
  name: kiali-multi-cluster-secret
  labels:
    kiali.io/kiali-multi-cluster-secret: "true"
stringData:
  my-cluster-name: |
    apiVersion: v1
    kind: Config
    preferences: {}
    current-context: my-cluster-name
    contexts:
    - name: my-cluster-name
      context:
        cluster: my-cluster-name
        user: my-cluster-name
    users:
    - name: my-cluster-name
      user:
        token: <...the long remote cluster SA token string goes here...>
    clusters:
    - name: my-cluster-name
      cluster:
        server: <...the URL to your remote cluster goes here...>
        certificate-authority-data: <...the long CA data goes here...>    
  my-other-cluster: |
    apiVersion: v1
    kind: Config
    preferences: {}
    current-context: my-other-cluster
    contexts:
    - name: my-other-cluster
      context:
        cluster: my-other-cluster
        user: my-other-cluster
    users:
    - name: my-other-cluster
      user:
        token: <...the long remote cluster SA token string goes here...>
    clusters:
    - name: my-other-cluster
      cluster:
        server: <...the URL to your remote cluster goes here...>
        certificate-authority-data: <...the long CA data goes here...>

The verify-kiali-permissions.sh script can be used to check that your remote cluster secret provides the necessary permissions that Kiali needs to access the remote cluster. See the comments at the top of the script and its --help output for details on how to run it, but here’s an example:

curl -L -o verify-kiali-permissions.sh https://raw.githubusercontent.com/kiali/kiali/master/hack/istio/multicluster/verify-kiali-permissions.sh
chmod +x verify-kiali-permissions.sh
./verify-kiali-permissions.sh --kubeconfig-secret istio-system:kiali-multi-cluster-secret:my-cluster-name --kiali-version v2.10.0

It is up to you how you want to create and manage the token and secret, however, you can use the kiali-prepare-remote-cluster.sh script (with the --process-kiali-secret true option) to simplify this process for you.

The kiali-prepare-remote-cluster.sh script can be used to:

Create a Kiali SA and its role/role-binding in the remote cluster

and/or,

Create a kubeconfig file and store it in a Kubernetes secret that is created in the namespace where Kiali is deployed.

In order to run this script you will need adequate permissions configured in your local kubeconfig for both the cluster on which Kiali is deployed and the remote cluster.

For example:

curl -L -o kiali-prepare-remote-cluster.sh https://raw.githubusercontent.com/kiali/kiali/master/hack/istio/multicluster/kiali-prepare-remote-cluster.sh
chmod +x kiali-prepare-remote-cluster.sh
./kiali-prepare-remote-cluster.sh --kiali-cluster-context east --remote-cluster-context west --view-only false --process-kiali-secret true --process-remote-resources true

If you used the Kiali Operator (or helm chart) to create the remote cluster resources (step 1 above) and are using this script only to create the remote cluster secret (--process-remote-resources false --process-kiali-secret true), you must pass --kiali-resource-name set to the name of the Service Account created by the Operator. The Operator names the SA <instance_name>-service-account (e.g. kiali-service-account for the default instance name kiali). For example:

./kiali-prepare-remote-cluster.sh \
  --kiali-cluster-context east \
  --remote-cluster-context west \
  --kiali-resource-name kiali-service-account \
  --process-remote-resources false \
  --process-kiali-secret true \
  --view-only false

Specifying the remote cluster name: The script derives the remote cluster name from the kubeconfig context, which may contain characters not valid in a Kubernetes secret key (e.g. colons in OpenShift-generated context names). Always pass --remote-cluster-name explicitly, set to the name Istio uses for the remote cluster, to avoid this issue:

./kiali-prepare-remote-cluster.sh \
  ... \
  --remote-cluster-name west

Use the option --help for additional details on using the script to create and delete the remote cluster resources and secrets.

Configure Kiali. The Kiali Operator needs to know about the remote cluster secrets so it can mount them into the Kiali Server pod. There are three ways to do this:
- Auto-discovery by label (default): The Kiali Operator will automatically detect any secret in the Kiali deployment namespace that has the label kiali.io/multiCluster="true". The secrets created by the kiali-prepare-remote-cluster.sh script are labeled this way and will be auto-detected with no additional Kiali CR configuration needed.
- Explicit configuration: In the Kiali CR you can explicitly specify each remote cluster secret rather than rely on auto-discovery.
- Single combined secret: Create a single secret named kiali-multi-cluster-secret in the Kiali deployment namespace containing kubeconfigs for all remote clusters, each under its own top-level key in stringData where the key is the cluster name. If you also label this secret with kiali.io/kiali-multi-cluster-secret="true", the Kiali Operator will auto-detect changes and roll out a new Kiali Server pod automatically.
Do not use the label kiali.io/kiali-multi-cluster-secret="true" on any other secret not specifically named kiali-multi-cluster-secret. The operator will not have permission to see that secret and errors will occur if you attempt this.

If you have multiple Kiali Servers deployed in the same namespace, and you want to use that single secret named kiali-multi-cluster-secret, all Kiali Servers in that namespace are required to use that secret. If you want each Kiali Server to talk to a different set of clusters, you must not use the kiali-multi-cluster-secret secret.

Once the Kiali Operator knows about the remote cluster secrets (either through auto-discovery or through explicit configuration) it will mount them into the Kiali Server pod, putting Kiali in “multi-cluster” mode. Kiali will begin using those credentials to communicate with the other clusters in the mesh.
Configure user access. When using anonymous mode, the Kiali SA credentials will be used to display mesh info to the user. When not using anonymous mode, Kiali will check the user’s access to each configured cluster’s namespace before showing the user any resources from that namespace.
- For OpenID, refer to your OIDC provider’s instructions for configuring user access to a Kubernetes cluster.
- For OpenShift, see the OpenShift multi-cluster documentation for important information about logging into remote clusters from the Kiali UI. This step is required — users must log into each cluster via the Kiali UI to access resources on that cluster.
Optional - Narrow metrics to mesh. If your unified metrics store also contains data outside of your mesh, you can limit which metrics Kiali will query for by setting the query_scope configuration.

That’s it! From here you can login to Kiali and manage your mesh across all clusters from a single Kiali instance.

Removing a Cluster

To remove a cluster from Kiali, you must delete the associated remote cluster secret. If you originally created the remote cluster secret via the kiali-prepare-remote-cluster.sh script, run that script again with the same command line options as before but also pass in the command line option --delete true.

Don’t forget to remove the resources (such as the SA and its role/role binding) from the remote cluster. If you created these resources with the Kiali Operator, simply delete the Kiali CR from the remote cluster and these resources will be removed. If you used the kiali-prepare-remote-cluster.sh script to create these resources, use it to remove these resources.

After the remote cluster secret has been removed, you must then tell the Kiali Operator to re-deploy the Kiali Server so the Kiali Server no longer attempts to access the now-deleted remote cluster secret. If you are using auto-discovery, you can tell the Kiali Operator to do this by touching the Kiali CR. The easiest way to do this is to simply add or modify any annotation on the Kiali CR. It is recommended that you use the kiali.io/reconcile annotation as described here. If you did not rely on auto-discovery but instead explicitly specified each remote cluster secret in the Kiali CR, then you simply have to remove the now-deleted remote cluster secret’s information from the Kiali CR’s clustering.clusters section. Finally, if you are using the single kiali-multi-cluster-secret to define all of your remote clusters (and you labeled that secret with kiali.io/kiali-multi-cluster-secret="true"), then you do not have to do anything other than delete that one secret. The Kiali Operator will detect that the secret has been removed and will re-deploy the Kiali Server automatically.

Adding an Inaccessible Cluster

If you would like to keep a separate Kiali per cluster or you do not want to give Kiali access to remote clusters, you can still manually specify the remote clusters and remote Kiali URLs in the Kiali configuration and the Kiali UI will try to provide links to the remote Kiali UIs where appropriate.

For example, if there is a Kiali on the east cluster that does not have access to the west cluster and a Kiali on the west cluster that does not have access to the east cluster, you can add the following to your Kiali configurations to have each Kiali generate links to the Kiali for that cluster.

East Kiali configuration

clustering:
  clusters:
    name: west
  kiali_urls:
    cluster_name: west
    instance_name: kiali
    namespace: istio-system
    url: https://kiali-external.west.example.com

West Kiali configuration

clustering:
  clusters:
    name: east
  kiali_urls:
    cluster_name: east
    instance_name: kiali
    namespace: istio-system
    url: https://kiali-external.east.example.com

2.7.1 - ACM Observability

Configure Kiali to use Red Hat Advanced Cluster Management Observability for centralized metrics in multi-cluster OpenShift environments.

OpenShift Only: This guide is specifically for Red Hat OpenShift environments using Red Hat Advanced Cluster Management (ACM) for Kubernetes. ACM is an OpenShift-specific product.

Overview

Red Hat Advanced Cluster Management (ACM) provides centralized observability for multi-cluster OpenShift environments through its Observability Service. When ACM Observability is enabled, metrics from all managed clusters (including the hub cluster itself) are collected and aggregated into a central Thanos-based storage system.

Kiali can query these aggregated metrics either through ACM’s external Observatorium API (using mTLS authentication) or directly through internal Thanos services. This guide explains both options, with detailed steps for the Observatorium API approach.

Architecture

Components

On the Hub Cluster:

ACM Observability Service: Centralized observability platform
- Observatorium API: External HTTPS endpoint with mTLS authentication
- Thanos: Metrics storage and query engine (Query, Query Frontend, Receive, Store)

On Managed Clusters (Hub + Spokes):

User Workload Monitoring (UWM): OpenShift’s Prometheus for user workloads
PodMonitor/ServiceMonitor: Scrape Istio metrics from:
- Sidecar proxies (in application namespaces)
- Control plane (istiod in istio-system)
- Ztunnel (in ztunnel namespace, for L4 metrics in Ambient mode)
- Waypoint proxies (in application namespaces, for L7 metrics in Ambient mode)
Metrics Allowlist ConfigMaps: Define which metrics ACM should collect
Metrics Collector: Runs on each managed cluster and pushes its Prometheus metrics to the hub cluster’s Thanos every 5 minutes (default)

Kiali Deployment Location:

Kiali can be deployed on any cluster with network access to:

The hub cluster’s metrics backend (Observatorium API or internal Thanos services)
Each managed cluster’s Kubernetes API (for workload and configuration data)

Common deployment locations:

Hub cluster (recommended): Co-located with ACM for lower latency metric queries and simplified networking. Can use internal Thanos services (HTTP) or external Observatorium API (HTTPS). Typically requires external deployment mode (ignore_home_cluster: true) since the hub usually doesn’t run mesh workloads or an Istio control plane.
Spoke/managed cluster: Kiali deployed alongside the mesh workloads or the Istio control plane. Must use external Observatorium API route.
Separate management cluster: Kiali deployed externally in dedicated “external deployment” mode (see External Kiali). Must use external Observatorium API route.

This guide assumes Kiali is deployed on the hub cluster in external deployment mode, but the configuration applies to any deployment location.

Metrics Flow

There are two independent flows:

Ingestion (managed cluster → hub):

Istio data plane components (sidecars, ztunnel, or waypoint proxies) expose metrics at :15020/stats/prometheus.
User Workload Monitoring Prometheus scrapes those metrics (typically every 30s).
The ACM observability collector/agent on the managed cluster reads from Prometheus and ships metrics to the hub (typically every 5 minutes).
The hub stores them in Thanos Receive/Store and serves them through Thanos Query Frontend.

Query (Kiali → hub):

Kiali can query metrics through either of these paths:

Via Observatorium API Route (HTTPS with mTLS):

Kiali queries the external Observatorium API route.
Observatorium forwards the request to Thanos Query Frontend.
Thanos Query Frontend reads from Thanos Store/Receive and returns the result back through Observatorium to Kiali.

Via Internal Thanos Service (HTTP):

Kiali queries the internal Thanos Query Frontend service directly within the cluster, bypassing Observatorium.

Expected Latency: 5-6 minutes from traffic generation to visibility in Kiali due to the 5-minute (default) push interval.

Prerequisites

1. ACM Observability Service

ACM MultiClusterObservability must be installed on the hub cluster:

# Verify ACM Observability is running
oc get mco observability

# Check Observatorium API route
oc get route observatorium-api -n open-cluster-management-observability

2. User Workload Monitoring

User Workload Monitoring must be enabled on all clusters (hub and spokes):

# Enable UWM by editing cluster-monitoring-config
oc -n openshift-monitoring edit configmap cluster-monitoring-config

# Add:
# data:
#   config.yaml: |
#     enableUserWorkload: true

# Verify UWM pods are running
oc get pods -n openshift-user-workload-monitoring

See: Enabling monitoring for user-defined projects

3. Istio Metrics Collection

Create ServiceMonitor and PodMonitor resources to collect Istio metrics. The PodMonitor for sidecars must be created in each namespace with Istio sidecars because OpenShift monitoring ignores namespaceSelector in these resources. The ServiceMonitor for istiod is created once in istio-system.

ServiceMonitor for istiod (in istio-system):

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: istiod-monitor
  namespace: istio-system
spec:
  targetLabels:
  - app
  selector:
    matchLabels:
      istio: pilot
  endpoints:
  - port: http-monitoring
    interval: 30s

PodMonitor for Istio proxies (must be applied in every mesh namespace):

apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: istio-proxies-monitor
  namespace: <your-mesh-namespace>
spec:
  selector:
    matchExpressions:
    - key: istio-prometheus-ignore
      operator: DoesNotExist
  podMetricsEndpoints:
  - path: /stats/prometheus
    interval: 30s
    relabelings:
    - action: keep
      sourceLabels: ["__meta_kubernetes_pod_container_name"]
      regex: "istio-proxy"
    - action: keep
      sourceLabels: ["__meta_kubernetes_pod_annotationpresent_prometheus_io_scrape"]
    - action: replace
      regex: (\d+);(([A-Fa-f0-9]{1,4}::?){1,7}[A-Fa-f0-9]{1,4})
      replacement: '[$2]:$1'
      sourceLabels: ["__meta_kubernetes_pod_annotation_prometheus_io_port","__meta_kubernetes_pod_ip"]
      targetLabel: "__address__"
    - action: replace
      regex: (\d+);((([0-9]+?)(\.|$)){4})
      replacement: '$2:$1'
      sourceLabels: ["__meta_kubernetes_pod_annotation_prometheus_io_port","__meta_kubernetes_pod_ip"]
      targetLabel: "__address__"
    - sourceLabels: ["__meta_kubernetes_pod_label_app_kubernetes_io_name","__meta_kubernetes_pod_label_app"]
      separator: ";"
      targetLabel: "app"
      action: replace
      regex: "(.+);.*|.*;(.+)"
      replacement: "${1}${2}"
    - sourceLabels: ["__meta_kubernetes_pod_label_app_kubernetes_io_version","__meta_kubernetes_pod_label_version"]
      separator: ";"
      targetLabel: "version"
      action: replace
      regex: "(.+);.*|.*;(.+)"
      replacement: "${1}${2}"
    - sourceLabels: ["__meta_kubernetes_namespace"]
      action: replace
      targetLabel: namespace
    - action: replace
      replacement: "<your-mesh-identification-string>"
      targetLabel: mesh_id

See: Configuring OpenShift Monitoring with Service Mesh

Ambient Mode Metrics

If you are using Istio’s Ambient mode instead of (or in addition to) sidecar mode, you need additional PodMonitors to collect metrics from the Ambient data plane components.

Understanding Ambient Mode Metrics

Ambient mode uses a layered architecture with two metric sources:

Ztunnel (L4 metrics only)

Runs as a DaemonSet (namespace varies by installation)
Handles all L4 traffic for pods enrolled in ambient mode
Produces TCP-level metrics:
- istio_tcp_sent_bytes_total
- istio_tcp_received_bytes_total
- istio_tcp_connections_opened_total
- istio_tcp_connections_closed_total
Does not produce HTTP metrics

Waypoint proxies (L7 metrics)

Run as Deployments in application namespaces
Optional L7 proxies deployed per-namespace or per-service
Produce full HTTP metrics (same as sidecars):
- istio_requests_total
- istio_request_duration_milliseconds_*
- istio_request_bytes_*
- istio_response_bytes_*
- Plus all TCP metrics listed above

If you only use ztunnel (no waypoints), Kiali will show TCP traffic but not HTTP-level details like response codes or latency histograms.

PodMonitor for Ztunnel

Create a PodMonitor in the namespace where ztunnel runs. Ztunnel pods expose metrics using the same interface as sidecars:

Container name: istio-proxy
Annotation: prometheus.io/scrape: "true"
Metrics path: /stats/prometheus on port 15020

Because ztunnel uses the same metrics interface, you can use the same PodMonitor configuration shown in the Istio Metrics Collection section above, changing only the namespace field to match your ztunnel namespace.

Note: The ztunnel namespace location depends on your Istio installation method. Verify your ztunnel namespace with: oc get pods -l app=ztunnel -A

PodMonitor for Waypoint Proxies

Create a PodMonitor in each namespace with a waypoint. Waypoint pods also expose metrics using the same interface as sidecars:

Container name: istio-proxy
Annotation: prometheus.io/scrape: "true"
Metrics path: /stats/prometheus on port 15020

Because waypoints use the same metrics interface, you can use the same PodMonitor configuration shown in the Istio Metrics Collection section above.

4. Metrics Allowlist Configuration

ACM only collects metrics that are explicitly allowlisted. For Istio metrics to be collected, create a ConfigMap named observability-metrics-custom-allowlist in the source namespace (see note below) with key uwl_metrics_list.yaml:

apiVersion: v1
kind: ConfigMap
metadata:
  name: observability-metrics-custom-allowlist
  namespace: <your-mesh-namespace>
data:
  uwl_metrics_list.yaml: |
    names:
    # Core Istio metrics below. For additional metrics that Kiali uses,
    # see: https://kiali.io/docs/faq/general/#requiredmetrics
    #
    # L7 (HTTP) metrics - from sidecars and waypoint proxies
    - istio_requests_total
    - istio_request_duration_milliseconds_bucket
    - istio_request_duration_milliseconds_sum
    - istio_request_duration_milliseconds_count
    - istio_request_bytes_bucket
    - istio_request_bytes_sum
    - istio_request_bytes_count
    - istio_response_bytes_bucket
    - istio_response_bytes_sum
    - istio_response_bytes_count
    # L4 (TCP) metrics - from sidecars, waypoint proxies, AND ztunnel
    - istio_tcp_sent_bytes_total
    - istio_tcp_received_bytes_total
    - istio_tcp_connections_opened_total
    - istio_tcp_connections_closed_total

Critical: The ConfigMap must be in the source namespace where metrics originate (e.g., istio-system, application namespaces), NOT in open-cluster-management-observability.

Ambient Mode: The same allowlist works for all Istio data plane components. However, ztunnel only produces TCP metrics (istio_tcp_*), so HTTP metrics in the allowlist will have no data from ztunnel. Waypoints produce both TCP and HTTP metrics, same as sidecars. Create the allowlist ConfigMap in each namespace where you have a PodMonitor, including the namespace where ztunnel runs and any namespaces with waypoint proxies.

See: Adding user workload metrics

Configuring Kiali for ACM Observability

Choosing Between Observatorium API and Internal Thanos Services

You have two options for connecting Kiali to ACM metrics:

Option 1: Observatorium API Route (HTTPS with mTLS)

external_services:
  prometheus:
    url: "https://observatorium-api-<namespace>.<apps-domain>/api/metrics/v1/default"
    auth:
      type: none
      cert_file: "secret:acm-observability-certs:tls.crt"
      key_file: "secret:acm-observability-certs:tls.key"

Provides:

HTTPS with mTLS authentication and encryption
External access (can be accessed from outside the cluster if needed)
RBAC enforcement via Observatorium
Multi-tenant isolation
Requires certificate setup

Option 2: Internal Thanos Service (HTTP)

external_services:
  prometheus:
    url: "http://observability-thanos-query-frontend.open-cluster-management-observability.svc:9090"
    auth:
      type: none

Provides:

Simpler setup (no certificates required)
Direct access to Thanos (potentially lower latency)
Internal cluster networking only
HTTP only (no encryption between Kiali and Thanos)

Recommendation: Use the Observatorium API for production environments where you want encrypted connections and proper authentication. Use internal services for development/testing environments where simplicity is preferred or where network security is already provided by the cluster infrastructure.

The rest of this guide focuses on the Observatorium API approach with mTLS authentication.

Step 1: Obtain mTLS Certificates from ACM

ACM automatically creates long-lived client certificates (1 year validity) for accessing the Observatorium API. Extract these from the hub cluster:

# Extract client certificate (for authentication)
oc get secret observability-grafana-certs \
  -n open-cluster-management-observability \
  -o jsonpath='{.data.tls\.crt}' | base64 -d > tls.crt

# Extract client key (for authentication)
oc get secret observability-grafana-certs \
  -n open-cluster-management-observability \
  -o jsonpath='{.data.tls\.key}' | base64 -d > tls.key

Note: These certificates are created automatically when ACM MultiClusterObservability is deployed and are already trusted by the Observatorium API.

ACM Version Note: Secret names may vary depending on your ACM version. Before proceeding, verify the secret exists:

oc get secrets -n open-cluster-management-observability | grep -i cert

If observability-grafana-certs doesn’t exist, look for similar secrets containing client certificates.

Step 2: Extract Server CA Certificate

Extract the CA certificate that signed the Observatorium API server certificate. This is used by Kiali to validate the server’s TLS certificate.

First, identify which CA issued the server certificate:

# Get the Observatorium API route hostname
HOST=$(oc get route observatorium-api -n open-cluster-management-observability -o jsonpath='{.spec.host}')

# Check who issued the server certificate
echo | openssl s_client -connect "${HOST}:443" -servername "${HOST}" -showcerts 2>/dev/null | openssl x509 -noout -issuer

Example output:

issuer=C=US, O=Red Hat, Inc., CN=observability-server-ca-certificate

Then, extract the matching CA certificate based on the issuer CN:

If the issuer CN is observability-server-ca-certificate:

oc get secret observability-server-ca-certs \
  -n open-cluster-management-observability \
  -o jsonpath='{.data.ca\.crt}' | base64 -d > server-ca.crt

If the issuer CN is observability-client-ca-certificate:

oc get secret observability-client-ca-certs \
  -n open-cluster-management-observability \
  -o jsonpath='{.data.ca\.crt}' | base64 -d > server-ca.crt

Note: Both secrets are in the open-cluster-management-observability namespace. The exact CA used may vary depending on your ACM version and configuration.

Step 3: Create Kubernetes Resources

Note: <kiali-namespace> and ${KIALI_NAMESPACE} are used as a placeholder for the namespace where Kiali is deployed. This is commonly istio-system but is not required to be - replace with your actual Kiali namespace.

Create the mTLS certificate secret in Kiali’s namespace:

KIALI_NAMESPACE="istio-system"  # Replace with your Kiali namespace

oc create secret generic acm-observability-certs \
  -n ${KIALI_NAMESPACE} \
  --from-file=tls.crt=tls.crt \
  --from-file=tls.key=tls.key

Create the CA bundle ConfigMap in Kiali’s namespace:

oc create configmap kiali-cabundle \
  -n ${KIALI_NAMESPACE} \
  --from-file=additional-ca-bundle.pem=server-ca.crt

On OpenShift: The Kiali Operator (or Helm chart) automatically creates a separate ConfigMap named kiali-cabundle-openshift for the OpenShift service CA, then uses a projected volume to combine it with your custom kiali-cabundle ConfigMap. You only need to create/manage kiali-cabundle with your ACM CA - the system handles merging.

For more details about CA bundle configuration, see TLS Configuration.

Step 4: Get Observatorium API URL

Find the external Observatorium API route URL:

oc get route observatorium-api \
  -n open-cluster-management-observability \
  -o jsonpath='https://{.spec.host}/api/metrics/v1/default'

The URL format is: https://observatorium-api-<namespace>.<apps-domain>/api/metrics/v1/default

Step 5: Configure Kiali

Using Kiali Operator (Kiali CR):

spec:
  external_services:
    prometheus:
      # Use Observatorium API route
      url: "<observatorium-api-url>"

      auth:
        type: none  # mTLS authentication at TLS layer, no Authorization header
        cert_file: "secret:acm-observability-certs:tls.crt"
        key_file: "secret:acm-observability-certs:tls.key"

      # Enable Thanos proxy mode
      thanos_proxy:
        enabled: true
        retention_period: "14d"
        scrape_interval: "5m"

Using Server Helm Chart:

OBSERVATORIUM_API_URL="$(oc get route observatorium-api -n open-cluster-management-observability -o jsonpath='https://{.spec.host}/api/metrics/v1/default')"

helm install kiali kiali-server \
  --namespace ${KIALI_NAMESPACE} \
  --set external_services.prometheus.url="${OBSERVATORIUM_API_URL}" \
  --set external_services.prometheus.auth.type="none" \
  --set external_services.prometheus.auth.cert_file="secret:acm-observability-certs:tls.crt" \
  --set external_services.prometheus.auth.key_file="secret:acm-observability-certs:tls.key" \
  --set external_services.prometheus.thanos_proxy.enabled="true" \
  --set external_services.prometheus.thanos_proxy.retention_period="14d" \
  --set external_services.prometheus.thanos_proxy.scrape_interval="5m"

Important Configuration Notes

Metrics Latency

ACM collects metrics from each cluster’s Prometheus and pushes to Thanos every 5 minutes (default). This means, by default, there is a 5-6 minute delay before new metrics appear in Kiali. This latency is inherent to ACM’s architecture and applies to all managed clusters.

Note: This interval is configurable via the spec.observabilityAddonSpec.interval field (in seconds) in the MultiClusterObservability CR on the hub cluster.

Initial warm-up period: After deploying a new application, it takes approximately twice the collection interval before data appears in Kiali’s graph and metrics tab. This is because Kiali uses PromQL rate() functions which require at least two data points to compute a result, and with ACM’s collection interval, two data points take at least two collection cycles to accumulate. For example, with the default 5-minute interval, expect a ~10-minute warm-up period. After this initial warm-up, all time ranges in Kiali should display data normally. However, keep in mind that the most recent data visible in Kiali will always be at least one collection interval old, since metrics must complete a full collection cycle before they appear in Thanos.

Thanos Proxy Mode

Enable thanos_proxy when using ACM/Thanos:

external_services:
  prometheus:
    thanos_proxy:
      enabled: true
      retention_period: "14d"  # Should match your ACM Thanos retention
      scrape_interval: "5m"   # Must match ACM's metrics collection interval

When enabled: true, Kiali uses the configured scrape_interval and retention_period values directly, rather than querying Prometheus’s /api/v1/status/config and /api/v1/status/runtimeinfo endpoints to discover them. This is necessary because Thanos does not expose these Prometheus configuration endpoints.

Why these values matter:

scrape_interval: Kiali’s UI uses this value to compute PromQL rate() intervals and query step sizes. The rate interval must be large enough to contain at least two data points for rate() to produce results. With ACM, data points arrive in Thanos at the ACM collection interval (default 5 minutes), not at the local Prometheus scrape interval (typically 15-30 seconds). If scrape_interval is set too low (e.g., “30s”), the computed rate windows will be too narrow to capture two ACM data points, causing Kiali’s metrics tab to show empty charts even though data exists in Thanos.

Critical: Set scrape_interval to match the ACM metrics collection interval (default "5m"), not the local Prometheus scrape interval. The ACM collection interval is configured via spec.observabilityAddonSpec.interval in the MultiClusterObservability CR on the hub cluster. If you have customized this value, set scrape_interval to match.

retention_period: Used to limit time range queries to available data. ACM defaults to 365d retention when spec.advanced.retentionConfig is not explicitly configured in the MultiClusterObservability CR. If using the default, set retention_period to “365d”. If configuring custom retention, use at least 10d minimum (a Thanos requirement for downsampling to function). Always match retention_period to your actual ACM retention configuration. The “14d” value shown in examples here is used for demonstration.

Multi-Cluster Setup

For multi-cluster service mesh deployments with ACM:

1. Metrics Aggregation (Handled by ACM)

ACM automatically aggregates metrics from all managed clusters. Each cluster’s metrics include a cluster label with the cluster name (the metadata.name of the ManagedCluster resource). To get a list of all the clusters managed by ACM, run oc get managedcluster on the hub cluster.

Kiali can filter metrics by cluster using query_scope. The query_scope configuration adds label filters to every Prometheus query:

external_services:
  prometheus:
    # Example 1: Filter to a single cluster
    query_scope:
      cluster: "east-cluster"

    # Example 2: Filter by mesh_id and cluster
    query_scope:
      mesh_id: "mesh-1"
      cluster: "east-cluster"

Each key-value pair in query_scope is added as key="value" to every query. For example, cluster: "east-cluster" adds cluster="east-cluster" to all PromQL queries.

2. Remote Cluster Access (For Workload/Config Data)

While metrics come from ACM’s central Thanos, Kiali still needs direct API access to each cluster for:

Workload and service discovery
Istio configuration validation
Kubernetes resource details

Create remote cluster secrets as described in the multi-cluster setup guide.

3. External Deployment Model

For multi-cluster with ACM, if you deploy Kiali on the hub cluster (or on a separate management cluster), you will typically want to run Kiali in external deployment mode:

clustering:
  ignore_home_cluster: true  # Kiali is external to mesh

kubernetes_config:
  cluster_name: "<management-cluster-name>"  # Unique name for the cluster where Kiali runs

See the External Kiali guide for complete external deployment instructions.

Certificate Management

Automatic Rotation

ACM-issued certificates (stored in the observability-grafana-certs secret in the ACM observability namespace) have 1-year validity and are automatically rotated by ACM before expiration. When certificates are rotated:

ACM updates the observability-grafana-certs secret in open-cluster-management-observability namespace
You must update the acm-observability-certs secret in Kiali’s namespace with the new certificate data. Options include:
- Re-run the extraction commands from Step 1: Obtain mTLS Certificates from ACM manually
- Use an ACM ConfigurationPolicy with hub cluster templating to automatically distribute and update the secret to the cluster where Kiali runs (see ACM Governance documentation for details)
Kubernetes updates the mounted files in Kiali pod (within 60 seconds after the secret update)
Kiali automatically uses new certificates on next connection (no pod restart needed)

Using Custom Certificates

If you prefer to use your own certificate infrastructure instead of ACM’s certificates:

Generate/obtain certificates signed by a CA trusted by ACM Observatorium API
Configure ACM to trust your CA (consult ACM documentation)
Create the acm-observability-certs secret with your certificates

Verification

Check Certificate Configuration

# Verify secret exists
oc get secret acm-observability-certs -n ${KIALI_NAMESPACE}

# Check certificate expiration
oc get secret acm-observability-certs -n ${KIALI_NAMESPACE} \
  -o jsonpath='{.data.tls\.crt}' | base64 -d | \
  openssl x509 -noout -enddate

# Verify CA bundle
oc get configmap kiali-cabundle -n ${KIALI_NAMESPACE} \
  -o jsonpath='{.data.additional-ca-bundle\.pem}' | \
  openssl x509 -noout -subject

Check Kiali Logs

Verify certificates are loaded successfully:

oc logs -n ${KIALI_NAMESPACE} deployment/kiali | grep -i "credential\|certificate"

# Expected output (at "info" log level):
# INF Loaded [1] valid CA certificate(s) from [/kiali-cabundle/additional-ca-bundle.pem]
#
# Additional output (at "debug" log level):
# DBG Credential file path configured: [/kiali-override-secrets/prometheus-cert/tls.crt]
# DBG Credential file path configured: [/kiali-override-secrets/prometheus-key/tls.key]

Test Metrics

Generate mesh traffic in one of your managed clusters
Wait for the initial warm-up period (approximately twice the ACM collection interval; default ~10 minutes) for metrics to propagate to Thanos and for enough data points to accumulate for rate calculations. The graph may appear sooner (after ~5 minutes).
Access Kiali UI and navigate to a workload
Verify metrics appear in the Metrics tab and traffic graph

Ambient Mode: If you are using Ambient mode:

Ztunnel-only traffic (no waypoint): You’ll see TCP metrics and traffic edges in the graph, but HTTP details (response codes, latency) will not be available.
Traffic through waypoints: You’ll see full L7 metrics, same as sidecar mode.

Verify Metrics in Thanos Directly

Test that metrics exist in Thanos (from within the hub cluster). The following are different queries you can run to obtain metrics data from the backend metric datastore used by ACM.

Note: These commands use jq to format JSON output. If you don’t have jq installed, simply omit | jq . to see the full, unfiltered and raw JSON.

# List available metric names (Kiali uses istio_*, pilot_*, and envoy_* metrics)
oc get --raw "/api/v1/namespaces/open-cluster-management-observability/services/http:observability-thanos-query-frontend:9090/proxy/api/v1/label/__name__/values" | jq -r '.data[] | select(startswith("istio_") or startswith("pilot_") or startswith("envoy_"))'

# Count timeseries for key Istio metrics (shows which metrics have data and how many unique timeseries)
oc get --raw "/api/v1/namespaces/open-cluster-management-observability/services/http:observability-thanos-query-frontend:9090/proxy/api/v1/query?query=count%20by%20(__name__)%20({__name__=~%22istio_requests_total|istio_tcp.*total%22})" | jq -r '.data.result[] | "\(.metric.__name__): \(.value[1])"'

# Query Istio request metrics with full details (limited to first result to show structure)
oc get --raw "/api/v1/namespaces/open-cluster-management-observability/services/http:observability-thanos-query-frontend:9090/proxy/api/v1/query?query=istio_requests_total" | jq '.data.result |= .[0:1]'

Troubleshooting

Empty Graph or No Metrics

Symptom: Kiali shows an empty graph, “No metrics” in the metrics tab, or both.

Causes and Solutions:

scrape_interval too low: If thanos_proxy.scrape_interval is set lower than the ACM collection interval (e.g., “30s” instead of “5m”), Kiali’s rate calculations will use windows too narrow to capture enough data points from Thanos
- Solution: Set thanos_proxy.scrape_interval to match the ACM collection interval (default “5m”). See Thanos Proxy Mode for details
Still in warm-up period: After deploying a new application, it takes approximately twice the ACM collection interval (~10 minutes by default) before enough data points exist for rate calculations
- Solution: Wait for the warm-up period to elapse
Metrics not allowlisted: ACM doesn’t collect metrics by default
- Solution: Create observability-metrics-custom-allowlist ConfigMap with uwl_metrics_list.yaml key in source namespace
PodMonitor missing: Prometheus not scraping Istio data plane components
- Solution: Create istio-proxies-monitor PodMonitor in each mesh namespace (including the ztunnel namespace and namespaces with waypoint proxies if using Ambient mode)
UWM not enabled: User Workload Monitoring not configured
- Solution: Enable enableUserWorkload: true in cluster-monitoring-config ConfigMap in openshift-monitoring namespace
Missing source/destination labels: The graph builds its topology from workload and namespace labels in the metrics. Verify Istio metrics have proper labels
Namespace not selected: Ensure the namespace is selected in the graph’s namespace dropdown
Query scope mismatch: Check query_scope cluster names match actual cluster label values

See also the Why is my graph empty? FAQ for additional troubleshooting information.

TLS/Certificate Errors

Symptom: Kiali logs show “x509: certificate signed by unknown authority” or “tls: bad certificate”

Solutions:

Verify CA bundle: Ensure kiali-cabundle ConfigMap has the correct CA
```
oc get configmap kiali-cabundle -n ${KIALI_NAMESPACE} -o yaml
```

Check certificate chain: Verify client cert is signed by expected CA

oc get secret acm-observability-certs -n ${KIALI_NAMESPACE} \
  -o jsonpath='{.data.tls\.crt}' | base64 -d | \
  openssl x509 -noout -issuer

Verify projected volume: Check both ConfigMaps are mounted

oc exec -n ${KIALI_NAMESPACE} deploy/kiali -- ls -la /kiali-cabundle/
# Should show: additional-ca-bundle.pem, service-ca.crt

Connection Refused / Timeout

Symptom: Kiali cannot reach Observatorium API

Solutions:

Verify route exists:

oc get route observatorium-api -n open-cluster-management-observability

Check ACM is ready (should return “True”):

oc get mco observability -o jsonpath='{.status.conditions[?(@.type=="Ready")].status}{"\n"}'

Test connectivity (should return “OK”):

oc get --raw "/api/v1/namespaces/open-cluster-management-observability/services/http:observability-thanos-query-frontend:9090/proxy/-/ready"

Check NetworkPolicies: Ensure no policies block egress from Kiali’s namespace

Ambient Mode: No HTTP Metrics

Symptom: Ambient mode workloads show TCP traffic in Kiali but no HTTP metrics (response codes, latency)

Possible causes:

No waypoint deployed: Ztunnel only provides L4 (TCP) metrics. Deploy a waypoint proxy for L7 (HTTP) visibility.
Missing waypoint PodMonitor: Even with a waypoint, metrics won’t be collected without a PodMonitor:
- Verify waypoint pod exists: oc get pods -n <namespace> -l gateway.networking.k8s.io/gateway-class-name=istio-waypoint
- Create PodMonitor in the waypoint’s namespace (same config as sidecar PodMonitor)
Missing allowlist in waypoint namespace: Create a ConfigMap with the name observability-metrics-custom-allowlist in the namespace where the waypoint runs (see Metrics Allowlist Configuration)

Ambient Mode: No Ztunnel Metrics

Symptom: Ambient mode workloads show no traffic at all in Kiali

Possible causes:

Missing ztunnel PodMonitor: Create istio-proxies-monitor PodMonitor in the ztunnel namespace
Wrong ztunnel namespace: Verify ztunnel location: oc get pods -l app=ztunnel -A
Missing allowlist: Create a ConfigMap with the name observability-metrics-custom-allowlist in the ztunnel namespace (see Metrics Allowlist Configuration)

Reference

This example represents a fully configured Kiali installation using ACM Observability via the Observatorium API with mTLS:

apiVersion: kiali.io/v1alpha1
kind: Kiali
metadata:
  name: kiali
  namespace: <kiali-namespace>
spec:
  clustering:
    ignore_home_cluster: true  # External deployment

  kubernetes_config:
    cluster_name: "<management-cluster-name>"

  external_services:
    prometheus:
      url: "<observatorium-api-url>"

      auth:
        type: none
        cert_file: "secret:acm-observability-certs:tls.crt"
        key_file: "secret:acm-observability-certs:tls.key"

      thanos_proxy:
        enabled: true
        retention_period: "14d"
        scrape_interval: "5m"

Required Kubernetes resources:

---
# mTLS client certificates (from ACM)
# Data extracted from Secret observability-grafana-certs in namespace open-cluster-management-observability
apiVersion: v1
kind: Secret
metadata:
  name: acm-observability-certs
  namespace: <kiali-namespace>
type: Opaque
data:
  tls.crt: <base64-encoded-certificate>  # From observability-grafana-certs secret, tls.crt key
  tls.key: <base64-encoded-key>          # From observability-grafana-certs secret, tls.key key

---
# Server CA trust (from ACM)
# Data extracted from Secret observability-client-ca-certs (or observability-server-ca-certs) in namespace open-cluster-management-observability
apiVersion: v1
kind: ConfigMap
metadata:
  name: kiali-cabundle
  namespace: <kiali-namespace>
data:
  additional-ca-bundle.pem: |
    -----BEGIN CERTIFICATE-----
    <ACM Observability CA certificate>  # From ca.crt or tls.crt key (see Step 2 for extraction commands)
    -----END CERTIFICATE-----

Additional Resources

2.7.2 - External Kiali

Deploy Kiali on a Management Cluster.

Larger mesh deployments may desire to separate mesh operation from mesh observability. This means deploying Kiali, and potentially other observability tooling, away from the mesh.

This separation allows for:

Dedicated management of mesh observability
Reduced resource consumption on mesh clusters
Centralized visibility across multiple mesh clusters
Improved security isolation

Deployment Model

This deployment model requires a minimum of two clusters. The Kiali “home” cluster (where Kiali is deployed) will serve as the “management” cluster. The “mesh” cluster(s) will be where your service mesh is deployed. The mesh deployment will still conform to any of the Istio deployment models that Kiali already supports. The fundamental difference is that Kiali will not be co-located with an Istio control plane, but instead will reside away from the mesh. For multi-cluster mesh deployments, all of the same requirements apply, such as unified metrics and traces, etc.

It can be beneficial to co-locate other observability tooling on the management cluster. For example, co-locating Prometheus will likely improve Kiali’s metric query performance, while also reducing Prometheus resource consumption on the mesh cluster(s). Although, it may require additional configuration, like federating Prometheus databases, etc.

The high-level deployment model looks like this: Kiali multi-cluster

Configuration

Configuring Kiali for the external deployment model has the same requirements needed for a co-located Kiali in a multi-cluster installation. Kiali still needs the necessary secrets for accessing the remote clusters.

Additionally, the configuration needs to indicate that Kiali will not be managing its home cluster. This is done in the Kiali CR by setting:

clustering:
  ignore_home_cluster: true

Kiali typically sets its home cluster name to the same cluster name set by the co-located Istio control plane. In an external deployment there is no co-located Istio control plane, and therefore the cluster name must also be set in the configuration. The name must be unique within the set of multi-cluster cluster names.

kubernetes_config:
  cluster_name: <KialiHomeClusterName>

Authorization

The external deployment model currently supports openid, openshift, and anonymous authorization strategies. token auth is untested and considered experimental.

Metrics Aggregation

For external Kiali deployments, you need a unified metrics endpoint that aggregates metrics from all mesh clusters.

2.7.2.1 - OpenShift

Deploying External Kiali on OpenShift

These are specific notes for the External Kiali deployment model on OpenShift.

Installation

It is highly recommended that the Kiali Operator be deployed on all clusters, even if the Kiali Server itself is not deployed on some clusters. This will ensure that the proper namespace and remote cluster resources can be created. Clusters without a Kiali Server will require only the remote cluster resources necessary for remote Kiali Server authentication. To install these resources, configure the Kiali CR with:

spec.deployment.remote_cluster_resources_only: true

This Kiali CR will result in an installation requiring very limited resources.

Authorization Strategy

When using the openshift authentication strategy on OpenShift, make sure to read and apply any guidance found in the notes for multi-cluster.

2.8 - Namespace access control

Configuring per-user authorized namespaces.

Introduction

In authentication strategies other than anonymous Kiali supports limiting the namespaces that are accessible on a per-user basis. The anonymous authentication strategy does not support this, although you can still limit privileges when using an OpenShift cluster. See the access control section in Anonymous strategy.

To authorize namespaces, the standard Roles resources (or ClusterRoles) and RoleBindings resources (or ClusterRoleBindings) are used.

The Kubernetes RBAC documentation describe how to use Roles, ClusterRoles, RoleBindings and ClusterRoleBindings resources. If you are using OpenShift, read the OpenShift RBAC documentation.

Kiali can only restrict or grant read access to namespaces as a whole. So, keep in mind that while the RBAC capabilities of the cluster are used to give access, Kiali won’t offer the same privilege granularity that the cluster supports. For example, a user that does not have privileges to get Kubernetes Deployments via typical tools (e.g. kubectl) would still be able to get some details of Deployments through Kiali when listing Workloads or when viewing detail pages, or in the Graph.

Some features allow creating or changing resources in the cluster (for example, the Wizards). For these write operations which may be sensitive, the users will need to have the required privileges in the cluster to perform updates - i.e. the cluster RBAC takes effect.

Kiali is going to reject login to users that aren’t authorized to see any namespace.

Granting access to namespaces

In general, Kiali will give read access to namespaces where the logged in user is allowed to “GET” its definition – i.e. the user is allowed to do a GET call to the api/v1/namespaces/{namespace-name} endpoint of the cluster API. Users granted the LIST verb would get access to all namespaces of the cluster (that’s a GET call to the api/v1/namespaces endpoint of the cluster API).

You, probably, will want to have this small ClusterRole to help you in authorizing individual namespaces in Kiali:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: kiali-namespace-authorization
rules:
- apiGroups: [""]
  resources:
  - namespaces
  - pods/log
  verbs:
  - get

The pods/log privilege is needed for the pods Logs view. Since logs are potentially sensitive, you could remove that privilege if you don’t want users to be able to fetch pod logs.

Once you have created this ClusterRole, you would authorize a namespace foobar to user john with the following RoleBinding:

apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: authorize-ns-foobar-to-john
  namespace: foobar
subjects:
- kind: User
  name: john
  apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: ClusterRole
  name: kiali-namespace-authorization # The name of the ClusterRole created previously
  apiGroup: rbac.authorization.k8s.io

Note that in this example, the subject kind is User, which is the case when using openid or openshift authentication strategies. For other authentication strategies you would need to adjust the RoleBinding to use the right subject kind.

If you want to authorize a user to access all namespaces in the cluster, the most efficient way to do it is by creating a ClusterRole with the list verb for namespaces and bind it to the user using a ClusterRoleBinding:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: kiali-all-namespaces-authorization
rules:
- apiGroups: [""]
  resources:
  - namespaces
  - pods/log
  verbs:
  - get
  - list
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: authorize-all-namespaces-to-john
subjects:
- kind: User
  name: john
  apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: ClusterRole
  name: kiali-all-namespaces-authorization
  apiGroup: rbac.authorization.k8s.io

Note that the only addition to the ClusterRole is the list verb in the first rule.

Alternatively, you could also use the previously mentioned kiali-namespace-authorization rather than creating a new one with the list privilege, and it will work. However, Kiali will perform better if you grant the list privilege.

Please read your cluster RBAC documentation to learn more about the authorization system.

Granting write privileges to namespaces

Changing resources in the cluster can be a sensitive operation. Because of this, the logged in user will need to be given the needed privileges to perform any updates through Kiali. The following ClusterRole contains all read and write privileges that may be used in Kiali:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: kiali-write-privileges
rules:
- apiGroups: [""]
  resources:
  - namespaces
  - pods
  - replicationcontrollers
  - services
  verbs:
  - patch
- apiGroups: ["extensions", "apps"]
  resources:
  - daemonsets
  - deployments
  - replicasets
  - statefulsets
  verbs:
  - patch
- apiGroups: ["batch"]
  resources:
  - cronjobs
  - jobs
  verbs:
  - patch
- apiGroups:
  - networking.istio.io
  - security.istio.io
  - extensions.istio.io
  - telemetry.istio.io
  - gateway.networking.k8s.io
  resources: ["*"]
  verbs:
  - get
  - list
  - watch
  - create
  - delete
  - patch

If needed, you can reduce the set of write privileges to prevent users from changing unwanted resources. However read privileges are require to read the resources.

Similarly to giving access to namespaces, you can either use a RoleBinding to give read and write privileges only to specific namespaces, or use a ClusterRoleBinding to give privileges to all namespaces.

2.9 - Namespace Management

Configuring the namespaces accessible and visible to Kiali.

Introduction

The default Kiali installation gives Kiali access to all namespaces available in the cluster and will allow all namespaces to be visible.

It is possible to restrict Kiali so that it can only access a specific set of namespaces by providing discovery selectors that match those namespaces. Note that Kiali will not use Istio’s discovery selectors; if Istio has been configured with its own discovery selectors, you will likely want to configure Kiali with the same list of discovery selectors.

This documentation makes a distinction between accessible and visible namespaces. The Kiali Server will be given permission to access either (a) all, or (b) a configured subset, of cluster namespaces. The Kiali Server will only be aware of, query for, and access resources within these accessible namespaces. The set of namespaces visible to an end user, via the Kiali UI, will be a subset of the accessible namespaces. In other words, the namespaces visible to a user may be all, or just some of the namespaces accessible to the Kiali Server.

As of Kiali 2.0, the following settings are no longer supported:

deployment.accessible_namespaces
api.namespaces.exclude
api.namespaces.include
api.namespaces.label_selector_exclude
api.namespaces.label_selector_include

Cluster Wide Access Mode

By default, the Kiali Server is given cluster-wide access to all namespaces on the local cluster. This is controlled by the Kiali CR setting deployment.cluster_wide_access, which has a default value of true when not specified.

You cannot have multiple Kiali Servers with both cluster-wide access and identical instance names. If you wish to install multiple Kiali Servers with cluster-wide access enabled, each must have a unique deployment.instance_name value.

In order to restrict the Kiali Server so that it only has access to certain namespaces on the local cluster, it must first have its cluster-wide access disabled. You do this by setting deployment.cluster_wide_access to false in the Kiali CR.

You can still use discovery selectors (explained below) to limit what Kiali will make visible in the UI while cluster_wide_access remains true. You would want to do this for the performance benefits it provides the Kiali Server. But with this, the Kiali Server will be granted ClusterRole permissions rather than individual Role permissions per namespace. In other words, it will have access to all namespaces, but will not make all of them visible.

Accessible Namespaces

With cluster-wide access disabled, the Kiali Server must be told what namespaces are accessible to it. These accessible namespaces are defined by a list of discovery selectors that match namespaces.

The list of accessible namespaces is specified in the Kiali CR via the deployment.discovery_selectors.default setting. As an example, if Kiali is to be installed in the istio-system namespace, and is expected to monitor all namespaces with the label “my-mesh”, the setting would be:

spec:
  deployment:
    cluster_wide_access: false
    discovery_selectors:
      default:
      - matchExpressions:
        - key: my-mesh
          operator: Exists

When cluster_wide_access is set to false, the Kiali Operator will examine the default selectors under spec.deployment.discovery_selectors, as the example above illustrates. The Kiali Operator will then attempt to find all of the namespaces that match the discovery selectors. For each namespace that matches the discovery selectors, the Kiali Operator will create a Role and assign that Role to the Kiali Service Account thus giving Kiali access to those namespaces. These namespaces are therefore called the “accessible namespaces”.

The Kiali Operator will always give the Kiali Server access to the namespace where the Kiali Server is installed, whether its namespace matches a discovery selector or not. When cluster_wide_access is false and no discovery selectors are defined, the Kiali Server will only be given access to its namespace.

Because the Kiali Server utilizes Kubernetes watches to watch all accessible namespaces, this may cause performance issues. To increase performance you can set deployment.cluster_wide_access to true even when specifying a list of discovery selectors. When you do this, the Kiali Server will be given access to the entire cluster and thus it can use a single cluster watch which increases performance and efficiency. However, you must be aware that when you do this, the Kiali Server will be granted access to the cluster via a ClusterRole - individual Roles will not be created per namespace. The spec.deployment.discovery_selectors will still be used to determine which namespaces can be visible to users.

If you install Kiali using the Server Helm Chart, these Roles will be created when cluster_wide_access=false. However, the Server Helm Chart does not provide the same lifecycle management features as the operator:

The operator automatically cleans up Roles/RoleBindings from namespaces that are no longer accessible when discovery selectors (deployment.discovery_selectors.default) change
The operator handles transitions when view_only_mode or auth.strategy settings change (RoleBindings are immutable and must be deleted/recreated)
The operator explicitly cleans up ClusterRole/ClusterRoleBinding resources when switching from cluster_wide_access=true to false
The operator adds labels to accessible namespaces to mark which Kiali instance manages them

With the Server Helm Chart, you may need to manually clean up resources when changing these configurations. For full lifecycle management, use the operator. The Server Helm Chart is provided only as a convenience.

If you install the Kiali Operator using the Operator Helm Chart, to be able to use cluster_wide_access=true, you must specify the --set clusterRoleCreator=true flag when invoking helm install.

When installing multiple Kiali instances into a single cluster, deployment.discovery_selectors.default must be mutually exclusive. In other words, a namespace must be matched by the discovery selectors defined by one and only one Kiali CR on the cluster.

Istio Discovery Selectors

In Istio’s MeshConfig, a list of discovery selectors can be configured. These Istio discovery selectors define the namespaces that Istio will consider “in the mesh” (see this blog post for details). These Istio discovery selectors are utilized only by Istio; they will be ignored by Kiali.

Operator Namespace Watching

Note that the discovery selectors are evaluated by the Kiali Operator at install time when deciding which namespaces should be accessible (and thus which Roles to create). Namespaces that do not exist at the time of install will not be accessible to Kiali until the operator has a chance to reconcile the Kiali CR. There are several ways in which the operator can be told to reconcile a Kiali CR in order to determine the new set of accessible namespaces.

You can ask the Kiali Operator to periodically reconcile the Kiali CR on a fixed schedule. See the Ansible Operator SDK documentation describing the reconcile-period annotation. In short, you can have the Kiali Operator periodically reconcile a Kiali CR by setting the ansible.sdk.operatorframework.io/reconcile-period annotation on the Kiali CR. For example, to reconcile this Kiail CR every 60 seconds:

metadata:
  kind: Kiali
  annotations:
    ansible.sdk.operatorframework.io/reconcile-period: 60s

Modifying the deployment.discovery_selectors.default list of discovery selectors will automatically trigger the Kiali Operator to reconcile a Kiali CR and discover new namespaces. In fact, touching any spec field in the Kiali CR will trigger a reconciliation of the Kiali CR.
Similar to the above, touching any annotation on the Kiali CR will also trigger a reconciliation. One suggestion is to dedicate an annotation whose purpose is solely to trigger operator reconcilations. For example, add or modify the “trigger-reconcile” annotation on the Kiali CR to trigger the operator to run a reconcilation on that Kiali CR:

kubectl annotate kiali my-kiali-cr --namespace istio-system --overwrite trigger-reconcile="$(date)"

The Kiali Operator can be enabled to watch for namespaces getting created in the cluster. When new namespaces are created, the Kiali Operator will detect this and will then attempt to reconcile all Kiali CRs in the cluster. To enable operator namespace watching, see the FAQ describing the operator WATCHES_FILE environment variable. Note that on clusters with large numbers of namespaces that get created, enabling this namespace watching feature can cause the operator to consume a lot of CPU, so you may not wish to use this method.

Once the Kiali Operator is triggered to reconcile a Kiali CR, the operator will create the necessary Roles for all accessible namespaces, giving the Kiali Server access to any new namespaces that have been created since the last reconciliation.

Multi-Cluster Environments

The Kiali CR deployment.discover_selectors section supports multi-cluster configurations.

The default discovery selectors define the namespaces on the local cluster that Kiali will have access to (as explained above). These namespaces are made visible to Kiali users.

It is assumed Kiali will have access to the same set of namespaces on the remote clusters as well. So Kiali will make those remote namespaces visible to users. However, if a remote cluster has a different set of namespaces that should be visible to Kiali users, you can set discovery selector overrides in deployment.discovery_selectors to match those remote namespaces.

Each remote cluster overrides section completely overrides the default discovery selectors. That is to say, if a remote cluster has discovery selector overrides defined, only those selectors are used to determine which remote namespaces are to be visible to users. The default discovery selectors will not be used for a particular remote cluster when overrides are defined for that remote cluster.

Here is an example of defining discovery selectors for a remote cluster:

spec:
  deployment:
    cluster_wide_access: false
    discovery_selectors:
      # define accessible namespaces on the local cluster
      default:
      - matchExpressions:
        - key: my-mesh
          operator: Exists
      overrides:
        # My remote cluster has a different set of namespaces
        my-remote-cluster:
        - matchLabels:
            org: production
        - matchExpressions:
          - key: region
            operator: In
            values: ["east"]

You can define overrides for multiple remote clusters:

spec:
  deployment:
    cluster_wide_access: false
    discovery_selectors:
      default:
      - matchLabels:
          region: south
      overrides:
        cluster1:
        - matchLabels:
            region: east
        cluster2:
        - matchLabels:
            region: west
        cluster3:
        - matchLabels:
            region: north

Discovery Selectors

The default and overrides discovery selectors are processed in the same manner. They follow the same semantics as Istio as described in the Istio discoverySelectors documentation

An empty list of discovery selectors has different semantics depending on the value of deployment.cluster_wide_access.

If deployment.cluster_wide_access is true, an empty list of discovery selectors means all namespaces will be visible except those that are considered system namespaces. These include namespaces whose names are prefixed with “kube-”, “openshift” or “ibm” such as kube-system, openshift-operators, and ibm-system. (Kubernetes has reserved all namespaces prefixed with kube- as system namespaces and users are cautioned against creating them). System namespaces such as these should not be considered to have service mesh components and so are excluded by Kiali. If, for some reason, you want to consider these namespaces in your service mesh, you can do so by defining discovery selectors, or alternatively you can rename your namespaces so they do not resemble system namespaces.
If deployment.cluster_wide_access is false, an empty list of discovery selectors means only the Kiali deployment namespace will be accessible. This is not particularly useful as it will not include any application namespaces.

The Kiali deployment namespace will always be made accessible by Kiali. It is required that Istio control plane namespaces are also accessible. Istio control plane namespace(s) not co-located with Kiali must have their namespaces included in the defined discovery selectors.

In short, the default discovery selectors and each remote cluster overrides are lists of equality-based and set-based label selectors, with each item in a list being disjunctive (that is, match results from each selector item in a selector list are OR’ed together).

Each discovery selector list item itself can consist of one matchLabels, one matchExpressions, or both. A matchLabels can match one or more labels; a matchExpressions can match one or more expressions. All results within a single discovery selector list item are AND’ed together (that is to say, a namespace must match all label selector conditions in order for that namespace to be selected by that label selector).

For details on equality-based and set-based selector syntax and semantics, see the Kubernetes documentation.

Below are a couple of examples to help you understand these semantics.

This defines a discovery selector list that contains a single label selector that consists of one equality-based selector and one set-based selector. The namespaces that match this discovery selector are those that have a env=production label AND a org=frontdesk label AND a app=ticketing label AND a color=blue label:

discovery_selectors:
  default:
  - matchLabels:
      env: production
      org: frontdesk
    matchExpressions:
    - key: app
      operator: In
      values: ["ticketing"]
    - key: color
      operator: In
      values: ["blue"]

Suppose we want to also make accessible all namespaces that have the label region=east. We add another discover selector to the list:

discovery_selectors:
  default:
  - matchLabels:
      region: east
  - matchLabels:
      env: production
      org: frontdesk
    matchExpressions:
    - key: app
      operator: In
      values: ["ticketing"]
    - key: color
      operator: In
      values: ["blue"]

Now all the same namespaces that matched before are also matched. But in addition, all namespaces that simply have a label region=east will also match. This is because both label selectors in the list are OR’ed together.

2.10 - No Istiod Access

Kiali behavior with no access to Istiod (the /debug endpoints are not available)

Introduction

Kiali makes use of the Istiod /debug endpoints for introspection into the control plane. If this API is unavailable Kiali continues to perform, but the feature set will be degraded. The Istio API can be unavailable for various reasons:

The Istio API has been explicitly disabled in the Istio configuration.
The deployment model prevents access to the Istio API (firewalls, other networking concerns or limitations).
The API is configured but for some, potentially unexpected, reason can not be reached by Kiali.

Configuration

When the Istio API is known to be inaccessible Kiali should be configured via the istio_api_enabled configuration item.
By default, istio_api_enabled is true.

# ...
spec:
external_services:
  istio:
    istio_api_enabled: false
# ...

How does it affect Kiali

When the Istio API is not available there is expected feature degradation in Kiali:

The control plane metrics won’t be available.
The proxy status won’t be available in the workloads details view.
The control plane status will be calculated based on the namespace status, instead of the istio component status.
The Istio validations may not be available.
From Kiali >= 2.23, the Kiali validations are available.

Note that Istio Configurations will be available. This is because the list of Istio configurations is obtained using the Kubernetes API.

Istio Validations

The Istio validations won’t be available as this logic is provided by the Istio API. But, if the Istio Config was created when the validatingwebhookconfiguration web hook was enabled, the validation messages will be available and the Istio validations can be found:

Starting with Kiali 2.23, the Kiali validations are available even when the Istio API is disabled (in earlier versions they were disabled too).

Istio Configurations

The Istio Configurations are available in view and edit mode. It is important to know that the validations are disabled, so the configurations created or modified won’t be validated.

There is one scenario where the creation/deletion/edition could fail: If the Istio validation webhook is enabled but Istiod is not reachable. In this case, the webhook should be removed in order for this to work.

It can be checked with the following command:

kubectl get ValidatingWebhookConfiguration

2.11 - OSSMConsole CR Reference

Reference page for the OSSMConsole CR. The Kiali Operator will watch for a resource of this type and install the OSSM Console plugin according to that resource’s configuration. Only one resource of this type should exist at any one time.

Example CR

(all values shown here are the defaults unless otherwise noted)

apiVersion: kiali.io/v1alpha1
kind: OSSMConsole
metadata:
  name: ossmconsole
  annotations:
    ansible.sdk.operatorframework.io/verbosity: "1"
spec:
  version: "default"

  deployment:
    imageDigest: ""
    imageName: ""
    imagePullPolicy: "IfNotPresent"
    # default: image_pull_secrets is an empty list
    imagePullSecrets: ["image.pull.secret"]
    imageVersion: ""
    namespace: ""

  kiali:
    serviceName: ""
    serviceNamespace: ""
    servicePort: 0

Validating your OSSMConsole CR

The OSSMConsole CR has a CRD Schema so it will be validated when you create or update it in your cluster.

Properties

.spec

(object)

This is the CRD for the resources called OSSMConsole CRs. The OpenShift Service Mesh Console Operator will watch for resources of this type and when it detects an OSSMConsole CR has been added, deleted, or modified, it will install, uninstall, and update the associated OSSM Console installation.

.spec.deployment

(object)

.spec.deployment.imageDigest

(string)

If deployment.imageVersion is a digest hash, this value indicates what type of digest it is. A typical value would be ‘sha256’. Note: do NOT prefix this value with a ‘@’.

.spec.deployment.imageName

.spec.kiali.servicePort

(integer)

The internal port used by the Kiali service for the API. If empty, an attempt will be made to auto-discover it from the Kiali OpenShift Route.

.spec.version

(string)

The version of the Ansible role that will be executed in order to install OSSM Console. This also indirectly determines the version of OSSM Console that will be installed. You normally will want to use default since this is the only officially supported value today.

If not specified, the value of default is assumed which means the most recent Ansible role is used; thus the most recent release of OSSM Console will be installed.

This version setting affects the defaults of the deployment.imageName and deployment.imageVersion settings. See the documentation for those settings below for additional details. In short, this version setting will dictate which version of the OSSM Console image will be deployed by default. However, if you explicitly set deployment.imageName and/or deployment.imageVersion to reference your own custom image, that will override the default OSSM Console image to be installed; therefore, you are responsible for ensuring those settings are compatible with the Ansible role that will be executed in order to install OSSM Console (i.e. your custom OSSM Console image must be compatible with the rest of the configuration and resources the operator will install).

.status

(object)

The processing status of this CR as reported by the OpenShift Service Mesh Console Operator.

2.12 - Prometheus, Tracing, Grafana

Kiali data sources and add-ons.

Prometheus is a required telemetry data source for Kiali. Jaeger/Tempo is a highly recommended tracing data source. Kiali also offers simple add-on integrations for Grafana and Perses. This page describes how to configure Kiali to communicate with these dependencies.

Read the dedicated configuration page to learn more.

If any of these services use HTTPS with certificates issued by a private CA, see the TLS Configuration page.

2.12.1 - TLS Configuration

This page describes how to configure TLS certificates for Kiali’s connections to external services.

Overview

When Kiali connects to external services (Prometheus, Grafana, Jaeger/Tempo, Perses) over HTTPS, it needs to verify the TLS certificates presented by those services. By default, Kiali trusts the system certificate authorities (CAs) that are built into the container image.

If your external services use certificates issued by a private CA (such as an internal corporate CA, a service mesh CA, or self-signed certificates), you need to configure Kiali to trust those additional CAs.

Adding Custom Certificate Authorities

Kiali uses a global CA bundle mechanism to trust additional certificate authorities. All custom CAs are added to a single certificate pool that applies to all HTTPS connections Kiali makes to external services.

On Kubernetes

To add custom CAs, create a ConfigMap named <kiali-instance-name>-cabundle in the Kiali namespace. The default instance name is kiali, so the ConfigMap would be named kiali-cabundle:

apiVersion: v1
kind: ConfigMap
metadata:
  name: kiali-cabundle
  namespace: istio-system  # Or your Kiali namespace
data:
  additional-ca-bundle.pem: |
    -----BEGIN CERTIFICATE-----
    MIIDxTCCAq2gAwIBAgIQAqxcJmoLQ...
    ... (your CA certificate) ...
    -----END CERTIFICATE-----
    -----BEGIN CERTIFICATE-----
    MIIDyTCCArGgAwIBAgIRAJ4K...
    ... (additional CA certificates if needed) ...
    -----END CERTIFICATE-----

Key name: The key must be additional-ca-bundle.pem. You can include multiple CA certificates in PEM format in the same file.

Alternative keys: You can also use openid-server-ca.crt or (on OpenShift) oauth-server-ca.crt as key names. While these names suggest specific purposes, all CAs are loaded into Kiali’s global certificate pool and trusted for all TLS connections. Using additional-ca-bundle.pem is recommended for clarity.

For OpenShift OAuth authentication: On OpenShift, you can alternatively create a separate ConfigMap named <instance-name>-oauth-cabundle with the key oauth-server-ca.crt. See the OpenShift authentication documentation for details. However, adding your CA to kiali-cabundle under additional-ca-bundle.pem achieves the same result.

On OpenShift

On OpenShift, the Kiali Operator automatically creates a ConfigMap named <kiali-instance-name>-cabundle-openshift (e.g., kiali-cabundle-openshift) with the annotation service.beta.openshift.io/inject-cabundle: "true". This tells OpenShift to automatically inject the cluster’s service CA into the ConfigMap.

This means that by default, Kiali on OpenShift already trusts:

The system CAs
The OpenShift service CA (used by services with serving certificates)

If you need to add additional CAs beyond the OpenShift service CA, create a separate ConfigMap named <kiali-instance-name>-cabundle (e.g., kiali-cabundle):

apiVersion: v1
kind: ConfigMap
metadata:
  name: kiali-cabundle
  namespace: istio-system  # Or your Kiali namespace
data:
  additional-ca-bundle.pem: |
    -----BEGIN CERTIFICATE-----
    MIIDxTCCAq2gAwIBAgIQAqxcJmoLQ...
    ... (your CA certificate) ...
    -----END CERTIFICATE-----

The operator uses a projected volume that automatically combines both ConfigMaps, so your custom CAs work alongside the OpenShift service CA.

How It Works

When Kiali starts, it loads certificates from:

System certificate pool: The default trusted CAs from the container’s operating system
Additional CA bundle: Certificates from /kiali-cabundle/additional-ca-bundle.pem (if present)
OpenShift service CA (OpenShift only): Certificates from /kiali-cabundle/service-ca.crt (automatically injected from the <instance-name>-cabundle-openshift ConfigMap)
OpenID server CA (OpenID auth only): Certificates from /kiali-cabundle/openid-server-ca.crt (if present)
OAuth CA bundle (OpenShift with OAuth auth): Certificates from /kiali-cabundle/oauth-server-ca.crt (if the <instance-name>-oauth-cabundle ConfigMap exists)

All these certificates are combined into a single certificate pool used for all HTTPS connections to external services.

On OpenShift: The operator uses a projected volume that automatically combines multiple ConfigMap sources (<instance-name>-cabundle-openshift, <instance-name>-cabundle, and <instance-name>-oauth-cabundle) into the /kiali-cabundle mount path. This means you don’t need to manually merge ConfigMaps - each ConfigMap can be managed independently.

Automatic refresh: Kiali watches CA bundle files for changes using filesystem notifications (fsnotify) and automatically refreshes the certificate pool without requiring a pod restart. When you update the ConfigMap, Kubernetes propagates the changes to the mounted volume based on the kubelet’s sync interval (default: 60 seconds). Once the files are updated on disk, Kiali detects and applies them immediately. Total propagation time is typically 0-90 seconds after the ConfigMap update.

Skipping Certificate Verification

If you need to temporarily skip certificate verification (for testing purposes only), you can set insecure_skip_verify: true in the authentication configuration for each external service:

spec:
  external_services:
    prometheus:
      auth:
        insecure_skip_verify: true
    grafana:
      auth:
        insecure_skip_verify: true
    tracing:
      auth:
        insecure_skip_verify: true

Security warning: Disabling certificate verification makes Kiali vulnerable to man-in-the-middle attacks. Only use this option for testing purposes, never in production.

Common Scenarios

Internal Corporate CA

If your organization has an internal CA that issues certificates for internal services:

Obtain the root CA certificate (public part only) from your security team
Create the ConfigMap with the CA certificate as shown above

Self-Signed Certificates

For development or testing environments using self-signed certificates:

Export the certificate from your service (usually the same certificate that was generated)
Create the ConfigMap with that certificate

Istio Service Mesh mTLS

If your external services are part of the Istio service mesh and use Istio’s mTLS:

Kiali typically accesses these services through their Kubernetes service names, which may bypass the sidecar
If you need to go through the mesh, you may need to add Istio’s root CA to the bundle

cert-manager Issued Certificates

If you use cert-manager with a private CA:

The CA certificate is typically stored in a Secret (e.g., my-ca-secret with key ca.crt)
Extract the CA and add it to the ConfigMap:

kubectl get secret my-ca-secret -n cert-manager -o jsonpath='{.data.ca\.crt}' | base64 -d > ca.crt
kubectl create configmap kiali-cabundle -n istio-system --from-file=additional-ca-bundle.pem=ca.crt

Troubleshooting

Certificate Errors in Logs

If you see errors like x509: certificate signed by unknown authority in Kiali logs:

Verify the ConfigMap exists and has the correct name
Check that the key is exactly additional-ca-bundle.pem
Ensure the certificate is in valid PEM format
Verify the CA certificate is the correct one (the root or intermediate CA that signed the service’s certificate)

Verifying the ConfigMap is Mounted

Check that the ConfigMap is properly mounted in the Kiali pod:

kubectl exec -n istio-system deploy/kiali -- ls -la /kiali-cabundle/

You should see your CA bundle file listed.

Testing Certificate Chain

To verify your CA certificate is correct, you can test it outside of Kiali:

# Get the server's certificate chain
openssl s_client -connect prometheus.istio-system:9090 -showcerts

# Verify against your CA
openssl verify -CAfile your-ca.pem server-cert.pem

2.12.2 - Grafana

This page describes how to configure Grafana for Kiali.

Grafana configuration

Istio provides preconfigured Grafana dashboards for the most relevant metrics of the mesh. Although Kiali offers similar views in its metrics dashboards, it is not in Kiali’s goals to provide the advanced querying options, nor the highly customizable settings, that are available in Grafana. Thus, it is recommended that you use Grafana if you need those advanced options.

Kiali can provide a direct link from its metric dashboards to the equivalent or most similar Grafana dashboard, which is convenient if you need the powerful Grafana options.

The Grafana links will appear in the Kiali metrics pages. For example:

Kiali Grafana Links

For these links to appear in Kiali you need to manually configure the Grafana URL and the dashboards that come preconfigured with Istio, like in the following example:

Kiali will query Grafana and try to fetch the configured dashboards. For this reason Kiali must be able to reach Grafana, authenticate, and find the Istio dashboards. The Istio dashboards must be installed in Grafana for the links to appear in Kiali.

spec:
  external_services:
    grafana:
      enabled: true
      # Grafana service name is "grafana" and is in the "telemetry" namespace.
      internal_url: 'http://grafana.telemetry:3000/'
      # Public facing URL of Grafana
      external_url: 'http://my-ingress-host/grafana'
      # Grafana datasource UID when there are multiple
      datasource_uid: ""
      dashboards:
      - name: "Istio Service Dashboard"
        variables:
          datasource: "var-datasource"
          namespace: "var-namespace"
          service: "var-service"
      - name: "Istio Workload Dashboard"
        variables:
          datasource: "var-datasource"          
          namespace: "var-namespace"
          workload: "var-workload"
          datasource: "var-datasource"
      - name: "Istio Mesh Dashboard"
      - name: "Istio Control Plane Dashboard"
      - name: "Istio Performance Dashboard"
      - name: "Istio Wasm Extension Dashboard"

The described configuration is done in the Kiali CR when Kiali is installed using the Kiali Operator. If Kiali is installed with the Helm chart then the correct way to configure this is via regular –set flags.

Grafana authentication configuration

The Kiali CR provides authentication configuration that will be used to connect to your Grafana instance and for detecting your Grafana version in the Mesh graph.

spec:
  external_services:
    grafana:
      enabled: true
      auth:
        insecure_skip_verify: false
        password: "pwd"
        token: ""
        type: "basic"
        use_kiali_token: false
        username: "user"
      health_check_url: ""

To configure a secret to be used as a password, see this FAQ entry.

TLS Certificate Configuration

If your Grafana server uses HTTPS with a certificate issued by a private CA, see the TLS Configuration page to learn how to configure Kiali to trust your CA.

2.12.3 - Perses

This page describes how to configure Perses for Kiali.

Perses configuration

The Perses community dashboards provide preconfigured Perses dashboards for the most relevant mesh metrics. Although Kiali offers similar views in its metrics dashboards, it is not in Kiali’s goals to provide the advanced querying options, nor the highly customizable settings, that are available in Perses. They are the same as those provided by Istio’s Grafana add-on. Thus, it is recommended that you use Perses if you need those advanced options.

Kiali, from version v2.15, can provide a direct link from its metric dashboards to the equivalent or most similar Perses dashboard, which is convenient if you need the powerful Perses options.

The Perses links will appear in the Kiali metrics pages. For example:

Kiali Perses Links

For these links to appear in Kiali you need to manually configure the Perses URL and the dashboards that come preconfigured with Istio, like in the following example:

Kiali will query Perses and try to fetch the configured dashboards. For this reason Kiali must be able to reach Perses, authenticate, and find the Istio dashboards. The Istio dashboards must be installed in Perses for the links to appear in Kiali.

spec:
  external_services:
    perses:
      enabled: true
      # Perses service name is "perses" and is in the "telemetry" namespace.
      internal_url: 'http://perses.telemetry:4000/'
      # Public facing URL of Perses
      external_url: 'http://my-ingress-host/perses'
      dashboards:
        - name: "Istio Service Dashboard"
          variables:
            namespace: "var-namespace"
            service: "var-service"
            datasource: "var-datasource"
        - name: "Istio Workload Dashboard"
          variables:
            namespace: "var-namespace"
            workload: "var-workload"
        - name: "Istio Mesh Dashboard"

        - name: "Istio Ztunnel Dashboard"
          variables:
            namespace: "var-namespace"
            workload: "var-workload"
      # Perses project
      project: "istio"

When running Perses with the cluster observability operator in OpenShift, it requires an additional configuration item (Available from Kiali >2.17), so the url format can be compatible with the plugin UI URL:

spec:
  external_services:
    perses:
      ...
      url_format: "openshift"

The internal URL shouldn’t be set to avoid an internal validation of the Dashboards. The external URL should be set to the OpenShift cluster, without the additional path.

Perses authentication configuration

The Kiali CR provides authentication configuration that will be used to connect to your Perses instance and for detecting your Perses version in the Mesh graph.

Kiali Perses Mesh_page

Just basic authentication is supported. This will be configured in Perses as native authentication.

spec:
  external_services:
    perses:
      enabled: true
      auth:
        insecure_skip_verify: false
        password: "pwd"
        type: "basic"
        username: "user"
      health_check_url: ""

To configure a secret to be used as a user or password, see this FAQ entry.

TLS Certificate Configuration

If your Perses server uses HTTPS with a certificate issued by a private CA, see the TLS Configuration page to learn how to configure Kiali to trust your CA.

2.12.4 - Prometheus

This page describes how to configure Prometheus for Kiali.

Prometheus configuration

Kiali requires Prometheus to generate the topology graph, show metrics, calculate health and for several other features. If Prometheus is missing or Kiali can’t reach it, Kiali won’t work properly.

By default, Kiali assumes that Prometheus is available at the URL of the form http://prometheus.<istio_namespace_name>:9090, which is the usual case if you are using the Prometheus Istio add-on. If your Prometheus instance has a different service name or is installed in a different namespace, you must manually provide the endpoint where it is available, like in the following example:

spec:
  external_services:
    prometheus:
      # Prometheus service name is "metrics" and is in the "telemetry" namespace
      url: "http://metrics.telemetry:9090/"

Notice that you don’t need to expose Prometheus outside the cluster. It is enough to provide the Kubernetes internal service URL.

Kiali maintains an internal cache of some Prometheus queries to improve performance (mainly, the queries to calculate Health indicators). It would be very rare to see data delays, but should you notice any delays you may tune caching parameters to values that work better for your environment.

See the Kiali CR reference page for the current default values.

Compatibility with Prometheus-like servers

Although Kiali assumes a Prometheus server and is tested against it, there are TSDBs that can be used as a Prometheus replacement despite not implementing the full Prometheus API.

Community users have faced two issues when using Prometheus-like TSDBs:

Kiali may report that the TSDB is unreachable, and/or
Kiali may show empty metrics if the TSBD does not implement the /api/v1/status/config.

To fix these issues, you may need to provide a custom health check endpoint for the TSDB and/or manually provide the configurations that Kiali reads from the /api/v1/status/config API endpoint:

spec:
  external_services:
    prometheus:
      # Fix the "Unreachable" metrics server warning.
      health_check_url: "http://custom-tsdb-health-check-url"
      # Fix for the empty metrics dashboards
      thanos_proxy:
        enabled: true
        retention_period: "7d"
        scrape_interval: "30s"

Prometheus Tuning

Production environments should not be using the Istio Prometheus add-on, or carrying over its configuration settings. That is useful only for small, or demo installations. Instead, Prometheus should have been installed in a production-oriented way, following the Prometheus documentation.

This section is primarily for users where Prometheus is being used specifically for Kiali, and possible optimizations that can be made knowing that Kiali does not utilize all of the default Istio and Envoy telemetry.

Metric Thinning

Istio and Envoy generate a large amount of telemetry for analysis and troubleshooting. This can result in significant resources being required to ingest and store the telemetry, and to support queries into the data. If you use the telemetry specifically to support Kiali, it is possible to drop unnecessary metrics and unnecessary labels on required metrics. This FAQ Entry displays the metrics and attributes required for Kiali to operate.

To reduce the default telemetry to only what is needed by Kiali¹ users can add the following snippet to their Prometheus configuration. Because things can change with different versions, it is recommended to ensure you use the correct version of this documentation based on your Kiali/Istio version.

The metric_relabel_configs: attribute should be added under each job name defined to scrape Istio or Envoy metrics. Below we show it under the kubernetes-pods job, but you should adapt as needed. Be careful of indentation.

    - job_name: kubernetes-pods
      metric_relabel_configs:
      - action: drop
        source_labels: [__name__]
        regex: istio_agent_.*|istiod_.*|istio_build|citadel_.*|galley_.*|pilot_[^psx].*|envoy_cluster_[^u].*|envoy_cluster_update.*|envoy_listener_[^dh].*|envoy_server_[^mu].*|envoy_wasm_.*
      - action: labeldrop
        regex: chart|destination_app|destination_version|heritage|.*operator.*|istio.*|release|security_istio_io_.*|service_istio_io_.*|sidecar_istio_io_inject|source_app|source_version

Applying this configuration should reduce the number of stored metrics by about 20%, as well as reducing the number of attributes stored on many remaining metrics.

Metric Thinning with Crippling

The section above drops metrics unused by Kiali. As such, making those configuration changes should not negatively impact Kiali behavior in any way. But some very heavy metrics remain. These metrics can also be dropped, but their removal will impact the behavior of Kiali. This may be OK if you don’t use the affected features of Kiali, or if you are willing to sacrifice the feature for the associated metric savings. In particular, these are “Histogram” metrics. Istio is planning to make some improvements to help users better configure these metrics, but as of this writing they are still defined with fairly inefficient default “buckets”, making the number of associated time-series quite large, and the overhead of maintaining and querying the metrics, intensive. Each histogram actually is comprised of 3 stored metrics. For example, a histogram named xxx would result in the following metrics stored into Prometheus:

xxx_bucket
- The most intensive metric, and is required to calculate percentile values.
xxx_count
- Required to calculate ‘avg’ values.
xxx_sum
- Required to calculate rates over time, and for ‘avg’ values.

When considering whether to thin the Histogram metrics, one of the following three approaches is recommended:

If the relevant Kiali reporting is needed, keep the histogram as-is.
If the relevant Kiali reporting is not needed, or not worth the additional metric overhead, drop the entire histogram.
If the metric chart percentiles are not required, drop only the xxx_bucket metric. This removes the majority of the histogram overhead while keeping rate and average (non-percentile) values in Kiali.

These are the relevant Histogram metrics:

istio_request_bytes

This metric is used to produce the Request Size chart on the metric tabs. It also supports Request Throughput edge labels on the graph.

Appending |istio_request_bytes_.* to the drop regex above would drop all associated metrics and would prevent any request size/throughput reporting in Kiali.
Appending |istio_request_bytes_bucket to the drop regex above, would prevent any request size percentile reporting in the Kiali metric charts.

istio_response_bytes

This metric is used to produce the Response Size chart on the metric tabs. And also supports Response Throughput edge labels on the graph

Appending |istio_response_bytes_.* to the drop regex above would drop all associated metrics and would prevent any response size/throughput reporting in Kiali.
Appending |istio_response_bytes_bucket to the drop regex above would prevent any response size percentile reporting in the Kiali metric charts.

istio_request_duration_milliseconds

This metric is used to produce the Request Duration chart on the metric tabs. It also supports Response Time edge labels on the graph.

Appending |istio_request_duration_milliseconds_.* to the drop regex above would drop all associated metrics and would prevent any request duration/response time reporting in Kiali.
Appending |istio_request_duration_milliseconds_bucket to the drop regex above would prevent any request duration/response time percentile reporting in the Kiali metric charts or graph edge labels.

Scrape Interval

The Prometheus globalScrapeInterval is an important configuration option². The scrape interval can have a significant effect on metrics collection overhead as it takes effort to pull all of those configured metrics and update the relevant time-series. And although it doesn’t affect time-series cardinality, it does affect storage for the data-points, as well as having impact when computing query results (the more data-points, the more processing and aggregation).

Users should think carefully about their configured scrape interval. Note that the Istio addon for prometheus configures it to 15s. This is great for demos but may be too frequent for production scenarios. The prometheus helm charts set a default of 1m, which is more reasonable for most installations, but may not be the desired frequency for any particular setup.

The recommendation for Kiali is to set the longest interval possible, while still providing a useful granularity. The longer the interval the less data points scraped, thus reducing processing, storage, and computational overhead. But the impact on Kiali should be understood. It is important to realize that request rates (or byte rates, message rates, etc) require a minumum of two data points:

rate = (dp2 - dp1) / timePeriod

That means for Kiali to show anything useful in the graph, or anywhere rates are used (many places), the minimum duration must be >= 2 x globalScrapeInterval. Kiali will eliminate invalid Duration options given the globalScrapeInterval.

Kiali does a lot of aggregation and querying over time periods. As such, the number of data points will affect query performance, especially for larger time periods.

For more information, see the Prometheus documentation.

TSDB retention time

The Prometheus tsdbRetentionTime is an important configuration option. It has a significant effect on metrics storage, as Prometheus will keep each reported data-point for that period of time, performing compaction as needed. The larger the retention time, the larger the required storage. Note also that Kiali queries against large time periods, and very large data-sets, may result in poor performance or timeouts.

The recommendation for Kiali is to set the shortest retention time that meets your needs and/or operational limits. In some cases users may want to offload older data to a secondary store. Kiali will eliminate invalid Duration options given the tsdbRetentionTime.

For more information, see the Prometheus documentation.

Prometheus authentication configuration

The Kiali CR provides authentication configuration that will be used also for querying the version check to provide information in the Mesh graph.

spec:
  external_services:
    prometheus:
      auth:
        insecure_skip_verify: false
        password: "pwd"
        token: ""
        type: "basic"
        use_kiali_token: false
        username: "user"
      health_check_url: ""

To configure a secret to be used as a password, see this FAQ entry.

TLS Certificate Configuration

If your Prometheus server uses HTTPS with a certificate issued by a private CA, see the TLS Configuration page to learn how to configure Kiali to trust your CA.

Some non-essential telemetry remains in order to not over-complicate the configuration change. The remaining telemetry is typically negligible. ↩︎
Note that Prometheus can be configured such that individual scrape points can override the global setting, but Kiali is not currently concerned with this corner case. ↩︎

2.12.5 - Tracing

Configuration to setup Kiali with Jaeger or Grafana Tempo.

Jaeger is the default tracing provider for Kiali. From Kiali version 1.74, Tempo support is also included. This page describes how to configure Jaeger and Grafana Tempo in Kiali.

2.12.5.1 - Jaeger

This page describes how to configure Jaeger for Kiali.

Jaeger configuration

Jaeger is a highly recommended service because Kiali uses distributed tracing data for several features, providing an enhanced experience.

By default, Kiali will try to reach Jaeger at the GRPC-enabled URL of the form http://tracing.<istio_namespace_name>:16685/jaeger, which is the usual case if you are using the Jaeger Istio add-on. If this endpoint is unreachable, Kiali will disable features that use distributed tracing data.

If your Jaeger instance has a different service name or is installed to a different namespace, you must manually provide the endpoint where it is available, like in the following example:

spec:
  external_services:
    tracing:
      # Enabled by default. Kiali will anyway fallback to disabled if
      # Jaeger is unreachable.
      enabled: true
      # Jaeger service name is "tracing" and is in the "telemetry" namespace.
      # Make sure the URL you provide corresponds to the non-GRPC enabled endpoint
      # if you set "use_grpc" to false.
      internal_url: "http://tracing.telemetry:16685/jaeger"
      use_grpc: true
      # Public facing URL of Jaeger
      external_url: "http://my-jaeger-host/jaeger"

Minimally, you must provide spec.external_services.tracing.internal_url to enable Kiali features that use distributed tracing data. However, Kiali can provide contextual links that users can use to jump to the Jaeger console to inspect tracing data more in depth. For these links to be available you need to set the spec.external_services.tracing.external_url to the URL where you expose Jaeger outside the cluster.

Default values for connecting to Jaeger are based on the Istio’s provided sample add-on manifests. If your Jaeger setup differs significantly from the sample add-ons, make sure that Istio is also properly configured to push traces to the right URL.

Jaeger authentication configuration

The Kiali CR provides authentication configuration that will be used also for querying the version check to provide information in the Mesh graph.

spec:
  external_services:
    tracing:
      enabled: true
      auth:
        insecure_skip_verify: false
        password: "pwd"
        token: ""
        type: "basic"
        use_kiali_token: false
        username: "user"
      health_check_url: ""

To configure a secret to be used as a password, see this FAQ entry.

TLS Certificate Configuration

If your Jaeger server uses HTTPS with a certificate issued by a private CA, see the TLS Configuration page to learn how to configure Kiali to trust your CA.

2.12.5.2 - Grafana Tempo

This page describes how to configure Grafana Tempo for Kiali.

Grafana Tempo Configuration
- Using the Grafana Tempo API
- Using the Jaeger frontend with Grafana Tempo tracing backend
  - Tanka
  - Tempo Operator
Configuration table
Tempo tuning
Tempo cache
Tempo authentication configuration

Grafana Tempo Configuration

There are two possibilities to integrate Kiali with Grafana Tempo:

Using the Grafana Tempo API: This option returns the traces from the Tempo API in OpenTelemetry format.
Using the Jaeger frontend with the Grafana Tempo backend.
Appendix: Configuration table

Using the Grafana Tempo API

There are two steps to set up Kiali and Grafana Tempo:

Set up the Kiali CR updating the Tracing and Grafana sections.
Set up a Tempo data source in Grafana.

Set up the Kiali CR

This is a configuration example to set up Kiali tracing with Grafana Tempo:

spec:
  external_services:
    tracing:
      # Enabled by default. Kiali will anyway fallback to disabled if
      # Tempo is unreachable.
      enabled: true
      health_check_url: "https://tempo-instance.grafana.net"
      # Tempo service name is "query-frontend" and is in the "tempo" namespace.
      # Make sure the URL you provide corresponds to the non-GRPC enabled endpoint
      # It does not support grpc yet, so make sure "use_grpc" is set to false.
      internal_url: "http://tempo-tempo-query-frontend.tempo.svc.cluster.local:3200/"
      provider: "tempo"
      tempo_config:
        org_id: "1"
        datasource_uid: "a8d2ef1c-d31c-4de5-a90b-e7bc5252cd00"
        url_format: "grafana"
      use_grpc: false
      # Public facing URL of Tempo 
      external_url: "https://grafana-istio-system.apps-crc.testing/"

Kiali uses the external_url to construct “View in tracing” links in the UI. For the Tempo provider the default url_format is grafana. So, by default the URL will have the Grafana UI format when linking to specific services and traces.

It is also possible to set url_format to openshift. In this case the URL will redirect to the UI Plugin in the OpenShift console. When it is set to openshift, there are other settings as well:

spec:
  external_services:
    tracing:
      tempo_config:
        name: "sample"
        namespace: "tempo"
        tenant: "default"
        url_format: "openshift"

When the tenant is specified, if internal_url doesn’t have a path, it will be autocompleted with the Tempo path. For this example:

internal_url: https://tempo-sample-gateway.tempo.svc.cluster.local:8080/

Will be autocompleted to: https://tempo-sample-gateway.tempo.svc.cluster.local:8080/api/traces/v1/{tenant}/tempo

The other valid option for url_format is jaeger, used when the Jaeger UI is available in Tempo.

Set up a Tempo Datasource in Grafana

We can optionally set up a default Tempo datasource in Grafana so that you can view the Tempo tracing data within the Grafana UI, as you see here:

Kiali grafana_tempo

To set up the Tempo datasource, go to the Home menu in the Grafana UI, click Data sources, then click the Add new data source button and select the Tempo data source. You will then be asked to enter some data to configure the new Tempo data source:

Kiali grafana_tempo

The most important values to set up are the following:

Mark the data source as default, so the URL that Kiali uses will redirect properly to the Tempo data source.
Update the HTTP URL. This is the internal URL of the HTTP tempo frontend service. e.g. http://tempo-tempo-query-frontend.tempo.svc.cluster.local:3200/

Additional configuration

The Traces tab in the Kiali UI will show your traces in a bubble chart:

Kiali grafana_tempo

Increasing performance is achievable by enabling gRPC access, specifically for query searches. However, accessing the HTTP API will still be necessary to gather information about individual traces. This is an example to configure the gRPC access:

spec:
  external_services:
    tracing:
      enabled: true
      # grpc port defaults to 9095
      grpc_port: 9095 
      internal_url: "http://query-frontend.tempo:3200"
      provider: "tempo"
      use_grpc: true
      external_url: "http://my-tempo-host:3200"

Service check URL

By default, Kiali will check the service health in the endpoint /status/services, but sometimes, this is exposed in a different url, which can lead to a component unreachable message:

component_unreachable

This can be changed with the health_check_url configuration option.

spec:
  external_services:
    tracing:
      health_check_url: "http://query-frontend.tempo:3200"

Configuration for the Grafana Tempo Datasource

In order to correctly redirect Kiali to the right Grafana Tempo Datasource, there are a couple of configuration options to update:

spec:
  external_services:
    tracing:
      tempo_config:
        org_id: "1"
        datasource_uid: "a8d2ef1c-d31c-4de5-a90b-e7bc5252cd00"

org_id is usually not needed since “1” is the default value which is also Tempo’s default org id. The datasource_uid needs to be updated in order to redirect to the right datasource in Grafana versions 10 or higher.

Using the Jaeger frontend with Grafana Tempo tracing backend

It is possible to use the Grafana Tempo tracing backend exposing the Jaeger API. tempo-query is a Jaeger storage plugin. It accepts the full Jaeger query API and translates these requests into Tempo queries.

Since Tempo is not yet part of the built-in addons that are part of Istio, you need to manage your Tempo instance.

Tanka

The official Grafana Tempo documentation explains how to deploy a Tempo instance using Tanka. You will need to tweak the settings from the default Tanka configuration to:

Expose the Zipkin collector
Expose the GRPC Jaeger Query port

When the Tempo instance is deployed with the needed configurations, you have to set meshConfig.defaultConfig.tracing.zipkin.address from Istio to the Tempo Distributor service and the Zipkin port. Tanka will deploy the service in distributor.tempo.svc.cluster.local:9411.

The external_services.tracing.internal_url Kiali option needs to be set to: http://query-frontend.tempo.svc.cluster.local:16685.

Tempo Operator

The Tempo Operator for Kubernetes provides a native Kubernetes solution to deploy Tempo easily in your system.

After installing the Tempo Operator in your cluster, you can create a new Tempo instance with the following CR:

kubectl create namespace tempo
kubectl apply -n tempo -f - <<EOF
apiVersion: tempo.grafana.com/v1alpha1
kind: TempoStack
metadata:
  name: smm
spec:
  storageSize: 1Gi
  storage:
    secret:
      type: s3
      name: object-storage
  template:
    queryFrontend:
      component:
        resources:
          limits:
            cpu: "2"
            memory: 2Gi
      jaegerQuery:
        enabled: true
        ingress:
          type: ingress
EOF

Note the name of the bucket where the traces will be stored in our example is called object-storage. Check the Tempo Operator documentation to know more about what storages are supported and how to create the secret properly to provide it to your Tempo instance.

Now, you are ready to configure the meshConfig.defaultConfig.tracing.zipkin.address field in your Istio installation. It needs to be set to the 9411 port of the Tempo Distributor service. For the previous example, this value will be tempo-smm-distributor.tempo.svc.cluster.local:9411.

Now, you need to configure the internal_url setting from Kiali to access the Jaeger API. You can point to the 16685 port to use GRPC or 16686 if not. For the given example, the value would be http://tempo-ssm-query-frontend.tempo.svc.cluster.local:16685.

There is a related tutorial with detailed instructions to setup Kiali and Grafana Tempo with the Operator.

Configuration table

Supported versions

Kiali Version	Jaeger	Tempo	Tempo with JaegerQuery
<= 1.79 (OSSM 2.5)	✅	❌	✅
> 1.79	✅	✅	✅

Minimal configuration for Kiali <= 1.79

In external_services.tracing

	http	grpc
Jaeger	`.internal_url = 'http://jaeger_service_url:16686/jaeger'` `.use_grpc = false`	`.internal_url = 'http://jaeger_service_url:16685/jaeger'` `.use_grpc = true (Not required: by default)`
Tempo	`.internal_url = 'http://query_frontend_url:16686'` `.use_grpc = false`	`.internal_url = 'http://query_frontend_url:16685'` `.use_grpc = true (Not required: by default)`

Minimal configuration for Kiali > 1.79

	http	grpc
Jaeger	`.internal_url = 'http://jaeger_service_url:16686/jaeger'` `.use_grpc = false`	`.internal_url = 'http://jaeger_service_url:16685/jaeger'` `.use_grpc = true (Not required: by default)`
Tempo	`internal_url = 'http://query_frontend_url:3200'` `.use_grpc = false` `.provider = 'tempo'`	`.internal_url = 'http://query_frontend_url:3200'` `.grpc_port: 9095` `.provider: 'tempo'` `.use_grpc = true (Not required: by default)`

Tempo tuning

Resources consumption

Grafana Tempo is a powerful tool, but it can lead to performance issues when not configured correctly. For example, the following configuration is not recommended and may lead to OOM issues for simple queries in the query-frontend component:

spec:
  resources:
    total:
      limits:
        memory: 2Gi
        cpu: 2000m

These resources are shared between all the Tempo components. When needed, apply resources to each specific component, instead of applying the resources globally:

spec:
  template:
    queryFrontend:
      component:
        resources:
          limits:
            cpu: "2"
            memory: 2Gi

This Grafana Dashboard is available to measure the resources used in the tempo namespace.

Caching

Tempo offers multi-level caching that is used by default with Tanka and Helm deployment examples. It uses external cache, supporting Memcached and Redis. The lower level cache has a higher hit rate, and caches bloom filters and parquet data. The higher level caches frontend-search data.

Optimizing the cache depends on the application usage, and can be done modifying different parameters:

Connection limit for MemCached: Should be increased in large deployments, as MemCached is set to 1024 by default.
Cache size control: Should be increased when the working set is larger than the size of cache.

Tune search pipeline

There are many parameters to tune the search pipeline, some of these:

max_concurrent_queries: If it is too high it can cause OOM.
concurrent_jobs: How many jobs are done concurrently.
max_retries: When it is too high it can result in a lot of load.

Dedicated attribute columns

When using the vParquet3 storage format , defining dedicated attribute columns can improve the query performance. In order to best choose those columns (Up to 10), a good criteria is to choose attributes that contribute growing the block size (And not those commonly used).

Tempo authentication configuration

The Kiali CR provides authentication configuration that will be used also for querying the version check to provide information in the Mesh graph.

spec:
  external_services:
    tracing:
      enabled: true
      auth:
        insecure_skip_verify: false
        password: "pwd"
        token: ""
        type: "basic"
        use_kiali_token: false
        username: "user"
      health_check_url: ""

To configure a secret to be used as a password, see this FAQ entry.

TLS Certificate Configuration

If your Tempo server uses HTTPS with a certificate issued by a private CA, see the TLS Configuration page to learn how to configure Kiali to trust your CA.

Tempo cache

Kiali 2.2 includes a simple tracing cache for Tempo that stores the last N traces. By default, it is enabled and it keeps the last 200 traces. It can be modified in the Kiali CR with:

spec:
  external_services:
    tracing:
      enabled: true
      tempo_config:
        cache_enabled: true
        cache_capacity: 200

Kiali emits some cache metrics. The following query obtains the cache hit rate:

(sum(kiali_cache_hits_total{name="tempo"})/sum(kiali_cache_requests_total{name="tempo"})) * 100

tempo_metrics_cache

2.13 - TLS Policy

How Kiali enforces TLS versions and cipher suites for its own server and all outbound clients.

Kiali uses one TLS policy for both its inbound server endpoint and every outbound client it creates—HTTP, gRPC, tracing exporters, and OpenID/OAuth HTTP flows. The policy is configured in deployment.tls_config in the Kiali CR. You decide whether the policy comes from the cluster (OpenShift TLSSecurityProfile) or from explicit settings.

Configuration Options

Setting	Description
`source`	`auto` (OpenShift only: reads cluster TLSSecurityProfile) or `config` (use explicit settings)
`min_version`	Minimum TLS version: `TLSv1.2` or `TLSv1.3`
`max_version`	Maximum TLS version: `TLSv1.2` or `TLSv1.3`
`cipher_suites`	List of OpenSSL cipher names for TLS 1.2 (ignored for TLS 1.3)

Platform Defaults

OpenShift: source defaults to auto (uses cluster’s TLSSecurityProfile)
Non-OpenShift: source defaults to config (requires explicit configuration)

Examples

OpenShift: Auto-Discover TLS Policy

On OpenShift, set source: auto to have Kiali automatically read and enforce the cluster’s TLSSecurityProfile from APIServer/cluster:

spec:
  deployment:
    tls_config:
      source: auto

With this configuration, Kiali reads the TLS settings from OpenShift’s API Server and enforces them for all connections. If the cluster profile changes, restart the Kiali pod to pick up the new settings.

Non-OpenShift: Explicit TLS 1.2 and 1.3

For non-OpenShift clusters, or when you want full control over TLS settings, use source: config with explicit values:

spec:
  deployment:
    tls_config:
      source: config
      min_version: TLSv1.2
      max_version: TLSv1.3
      cipher_suites:
      - ECDHE-RSA-AES128-GCM-SHA256
      - ECDHE-ECDSA-AES128-GCM-SHA256
      - ECDHE-RSA-AES256-GCM-SHA384
      - ECDHE-ECDSA-AES256-GCM-SHA384

This allows both TLS 1.2 and TLS 1.3 connections. The cipher suites apply only to TLS 1.2 connections; TLS 1.3 uses Go’s fixed cipher set.

TLS 1.3 Only

To enforce TLS 1.3 exclusively (highest security):

spec:
  deployment:
    tls_config:
      source: config
      min_version: TLSv1.3

When min_version is TLS 1.3, Kiali enforces TLS 1.3-only mode. The cipher_suites setting is ignored because TLS 1.3 cipher selection is managed by Go.

Secure Defaults (Minimal Configuration)

If you set source: config without specifying other values, Kiali applies secure defaults:

spec:
  deployment:
    tls_config:
      source: config

This enforces TLS 1.2 or higher with Kiali’s secure default cipher list for TLS 1.2 connections:

ECDHE-ECDSA-AES128-GCM-SHA256
ECDHE-RSA-AES128-GCM-SHA256
ECDHE-ECDSA-AES256-GCM-SHA384
ECDHE-RSA-AES256-GCM-SHA384
ECDHE-ECDSA-CHACHA20-POLY1305
ECDHE-RSA-CHACHA20-POLY1305

These ciphers use ECDHE for forward secrecy and support both ECDSA and RSA certificates with modern AEAD encryption (AES-GCM and ChaCha20-Poly1305).

Supported Values

TLS Versions

TLS 1.0 and 1.1 are not supported due to known security vulnerabilities. Attempting to use them will cause Kiali to fail at startup.

Supported version strings (case variations accepted):

TLSv1.2 / TLS1.2 / VersionTLS12
TLSv1.3 / TLS1.3 / VersionTLS13

TLS 1.2 Cipher Suites

Specify cipher suites using OpenSSL names:

Cipher Suite
`ECDHE-RSA-AES128-GCM-SHA256`
`ECDHE-ECDSA-AES128-GCM-SHA256`
`ECDHE-RSA-AES256-GCM-SHA384`
`ECDHE-ECDSA-AES256-GCM-SHA384`
`ECDHE-RSA-CHACHA20-POLY1305`
`ECDHE-ECDSA-CHACHA20-POLY1305`
`AES128-GCM-SHA256`
`AES256-GCM-SHA384`

Unsupported cipher names will cause validation failure at startup.

Behavior

Fail-Fast Safety

Kiali refuses to start if:

The source value is invalid
source=auto is used on a non-OpenShift cluster
The OpenShift TLSSecurityProfile cannot be read
An unsupported TLS version or cipher suite is specified

Enforcement Scope

The resolved TLS policy applies to:

Kiali server’s inbound TLS configuration
All outbound HTTP clients (Prometheus, Grafana, tracing exporters, auth flows)
All outbound gRPC clients

Skip-Verify Behavior

Setting skip_verify: true on external services only bypasses certificate validation. TLS versions and cipher suites are still enforced according to the policy.

Policy Refresh

The TLS policy is resolved once at startup and cached for the lifetime of the Kiali process. When using source=auto, if the OpenShift TLSSecurityProfile changes, you must restart the Kiali pod for changes to take effect.

Logging

On startup, Kiali logs which TLS policy source is active and the resolved min/max versions and cipher count. Check these logs to verify the policy in effect or troubleshoot startup failures.

2.14 - Traffic Health

Customizing Health for Request Traffic.

There are times when Kiali’s default thresholds for traffic health do not work well for a particular situation. For example, at times 404 response codes are expected. Kiali has the ability to set powerful, fine-grained overrides for health configuration.

Default Configuration

By default Kiali uses the traffic rate configuration shown below. Application errors have minimal tolerance while client errors have a higher tolerance reflecting that some level of client errors is often normal (e.g. 404 Not Found):

For http protocol 4xx are client errors and 5xx codes are application errors.
For grpc protocol all 1-16 are errors (0 is success).

So, for example, if the rate of application errors is >= 0.1% Kiali will show Degraded health and if > 10% will show Failure health.

# ...
  health_config:
    rate:
      - namespace: ".*"
        kind: ".*"
        name: ".*"
        tolerance:
          - code: "^5\\d\\d$"
            direction: ".*"
            protocol: "http"
            degraded: 0
            failure: 10
          - code: "^4\\d\\d$"
            direction: ".*"
            protocol: "http"
            degraded: 10
            failure: 20
          - code: "^[1-9]$|^1[0-6]$"
            direction: ".*"
            protocol: "grpc"
            degraded: 0
            failure: 10
# ...

Custom Configuration

Custom health configuration is specified in the Kiali CR. To see the supported configuration syntax for health_config see the Kiali CR Reference.

Kiali applies the first matching rate configuration (namespace, kind, etc) and calculates the status for each tolerance. The reported health will be the status with highest priority (see below).

Rate Option	Definition	Default
namespace	Matching Namespaces (regex)	.* (match all)
kind	Matching Resource Types (workload\|app\|service) (regex)	.* (match all)
name	Matching Resource Names (regex)	.* (match all)
tolerance	Array of tolerances to apply.

Tolerance Option	Definition	Default
code	Matching Response Status Codes (regex) [1]	required
direction	Matching Request Directions (inbound\|outbound) (regex)	.* (match all)
protocol	Matching Request Protocols (http\|grpc) (regex)	.* (match all)
degraded	Degraded Threshold(% matching requests >= value)	0
failure	Failure Threshold (% matching requests >= value)	0

[1] The status code typically depends on the request protocol. The special code -, a single dash, is used for requests that don’t receive a response, and therefore no response code.

Kiali reports traffic health with the following top-down status priority :

Priority	Rule (value=% matching requests)	Status
1	value >= FAILURE threshold	FAILURE
2	value >= DEGRADED threshold AND value < FAILURE threshold	DEGRADED
3	value > 0 AND value < DEGRADED threshold	HEALTHY
4	value = 0	HEALTHY
5	No traffic	No Health Information

Examples

These examples use the repo https://github.com/kiali/demos/tree/master/error-rates.

In this repo we can see 2 namespaces: alpha and beta (Demo design).

Alpha

Where nodes return the responses (You can configure responses here):

App (alpha/beta)	Code	Rate
x-server	200	9
x-server	404	1
y-server	200	9
y-server	500	1
z-server	200	8
z-server	201	1
z-server	201	1

The applied traffic rate configuration is:

# ...
health_config:
  rate:
   - namespace: "alpha"
     tolerance:
       - code: "404"
         failure: 10
         protocol: "http"
       - code: "[45]\\d[^\\D4]"
         protocol: "http"
   - namespace: "beta"
     tolerance:
       - code: "[4]\\d\\d"
         degraded: 30
         failure: 40
         protocol: "http"
       - code: "[5]\\d\\d"
         protocol: "http"
# ...

After Kiali adds default configuration we have the following (Debug Info Kiali):

{
  "healthConfig": {
    "rate": [
      {
        "namespace": "/alpha/",
        "kind": "/.*/",
        "name": "/.*/",
        "tolerance": [
          {
            "code": "/404/",
            "degraded": 0,
            "failure": 10,
            "protocol": "/http/",
            "direction": "/.*/"
          },
          {
            "code": "/[45]\\d[^\\D4]/",
            "degraded": 0,
            "failure": 0,
            "protocol": "/http/",
            "direction": "/.*/"
          }
        ]
      },
      {
        "namespace": "/beta/",
        "kind": "/.*/",
        "name": "/.*/",
        "tolerance": [
          {
            "code": "/[4]\\d\\d/",
            "degraded": 30,
            "failure": 40,
            "protocol": "/http/",
            "direction": "/.*/"
          },
          {
            "code": "/[5]\\d\\d/",
            "degraded": 0,
            "failure": 0,
            "protocol": "/http/",
            "direction": "/.*/"
          }
        ]
      },
      {
        "namespace": "/.*/",
        "kind": "/.*/",
        "name": "/.*/",
        "tolerance": [
          {
            "code": "/^5\\d\\d$/",
            "degraded": 0,
            "failure": 10,
            "protocol": "/http/",
            "direction": "/.*/"
          },
          {
            "code": "/^4\\d\\d$/",
            "degraded": 10,
            "failure": 20,
            "protocol": "/http/",
            "direction": "/.*/"
          },
          {
            "code": "/^[1-9]$|^1[0-6]$/",
            "degraded": 0,
            "failure": 10,
            "protocol": "/grpc/",
            "direction": "/.*/"
          }
        ]
      }
    ]
  }
}

What are we applying?

For namespace alpha, all resources
Protocol http if % requests with error code 404 are >= 10 then FAILURE, if they are > 0 then DEGRADED
Protocol http if % requests with others error codes are> 0 then FAILURE.
For namespace beta, all resources
Protocol http if % requests with error code 4xx are >= 40 then FAILURE, if they are >= 30 then DEGRADED
Protocol http if % requests with error code 5xx are > 0 then FAILURE
For other namespaces Kiali will apply the defaults.
Protocol http if % requests with error code 5xx are >= 20 then FAILURE, if they are >= 0.1 then DEGRADED
Protocol grpc if % requests with error code match /^[1-9]$|^1[0-6]$/ are >= 20 then FAILURE, if they are >= 0.1 then DEGRADED

Alpha	Beta

2.15 - Virtual Machine workloads

Ensuring Kiali can visualize a VM WorkloadEntry.

Introduction

Kiali graph visualizes both Virtual Machine workloads (WorkloadEntry) and pod-based workloads, running inside a Kubernetes cluster. You must ensure that the Istio Proxy is running, and correctly configured, on the Virtual Machine. Also, Prometheus must be able to scrape the metrics endpoint of the Istio Proxy running on the VM. Kiali will then be able to read the traffic telemetry for the Virtual Machine workloads, and incorporate the VM workloads into the graph.

Kiali does not currently distinguish between pod-based and VM-based workloads nor does Kiali support viewing additional details for the VM-based workloads beyond what is displayed on the graph. One way to distinguish between the two is to give the VM-based workloads a different version label than the pod-based workloads.

Configuring Prometheus to scrape VM-based Istio Proxy

Once the Istio Proxy is running on a Virtual Machine, configuring Prometheus to scrape the VM’s Istio Proxy metrics endpoint is the only configuration Kiali needs to display traffic for the VM-based workload. Configuring Prometheus will vary between environments. Here is a very simple example of a Prometheus configuration that includes a job to scrape VM based workloads:

- job_name: bookinfo-vms
  honor_timestamps: true
  scrape_interval: 15s
  scrape_timeout: 10s
  metrics_path: /stats/prometheus
  scheme: http
  follow_redirects: true
  static_configs:
  - targets:
    - details-v1:15020
    - productpage-v1:15020
    - ratings-v1:15020
    - reviews-v1:15020
    - reviews-v2:15020
    - reviews-v3:15020

3 - Features

An overview of important Kiali features.

3.1 - Application Wizards

Using Kiali wizards to generate application and request routing configuration.

Istio Application Wizards

Kiali provides Actions to create, update and delete Istio configuration, driven by wizards.

Actions can be applied to a Service

Service Detail Actions

Actions can also be applied to a Workload

Workload Detail Actions

And, actions are available for an entire Namespace

Namespace Actions

Service Actions

Kiali offers a robust set of service actions, with accompanying wizards.

Traffic Management: Request Routing

The Request Routing Wizard allows creating multiple routing rules.

Every rule is composed of a Request Matching and a Routes To section.
The Request Matching section can add multiple filters using HEADERS, URI, SCHEME, METHOD or AUTHORITY HTTP parameters.
The Request Matching section can be empty, in this case any HTTP request received is matched against this rule.
The Routes To section can specify the percentage of traffic that is routed to a specific workload.

Request Routing

Istio applies routing rules in order, meaning that the first rule matching an HTTP request (top-down) performs the routing. The Matching Routing Wizard allows changing the rule order.

Traffic Management: Fault Injection

The Fault Injection Wizard allows injecting faults to test the resiliency of a Service.

HTTP Delay specification is used to inject latency into the request forwarding path.
HTTP Abort specification is used to immediately abort a request and return a pre-specified status code.

Fault Injection

Traffic Management: Traffic Shifting

The Traffic Shifting Wizard allows selecting the percentage of traffic that is routed to a specific workload.

Traffic Shifting

Kiali also provides an analogous action for TCP traffic shifting.

Traffic Management: Request Timeouts

The Request Timeouts Wizard sets up request timeouts in Envoy, using Istio.

HTTP Timeout defines the timeout for a request.
HTTP Retry describes the retry policy to use when an HTTP request fails.

Request Timeouts

Traffic Management: Gateways

Traffic Management Wizards have an Advanced Options section that can be used to extend the scenario.

One available Advanced Option is to expose a Service to external traffic through an existing Gateway or to create a new Gateway for this Service.

Gateway

Traffic Management: Circuit Breaker

Traffic Management Wizards allows defining Circuit Breakers on Services as part of the available Advanced Options.

Connection Pool defines the connection limits for an upstream host.
Outlier Detection implements the Circuit Breaker based on the consecutive errors reported.

Circuit Breaker

Routing Rules Preview

Kiali provides a safe preview environment where users can review the complete YAML definition of the routing configuration and edit the configuration inline before creating.

Preview Configuration

Security: Traffic Policy

Traffic Management Advanced Options allows defining Security and Load Balancing settings.

TLS related settings for connections to the upstream service.
Automatically generate a PeerAuthentication resource for this Service.
Load balancing policies to apply for a specific destination.

Traffic Policy

Workload Actions

Automatic Sidecar Injection

A Workload can be individually managed to control the Sidecar Injection.

A default scenario is to indicate this at Namespace level but there can be cases where a Workload shouldn’t be part of the Mesh or vice versa.

Kiali allows users to alter the Deployment template and propagate this configuration into the Pods.

Workload-specific Disable Sidecar Injection

Namespace Actions

The Kiali Namespaces page (Kiali >= 2.23) offers several Namespace actions.

Namespace Actions

Show

Show actions navigate from a Namespace to its specific Graph, Applications, Workloads, Services or Istio Config pages.

Automatic Sidecar Injection

When Automatic Sidecar Injection is enabled in the cluster, a Namespace can be labeled to enable/disable the injection webhook, controlling whether new deployments will automatically have a sidecar.

Canary Istio upgrade

When Istio Canary revision is installed, a Namespace can be labeled to that canary revision, so the sidecar of canary revision will be injected into workloads of the namespace.

Security: Traffic Policies

Kiali can generate Traffic Policies based on the traffic for a namespace.

For example, at some point a namespace presents a traffic graph like this:

Traffic Policies: Graph

And a user may want to add Traffic Policies to secure that communication. In other words, to prevent traffic other than that currently reflected in the Graph’s Services and Workloads.

Using the Create Traffic Policies action on a namespace, Kiali will generate AuthorizationPolicy resources per every Workload in the Namespace.

Traffic Policies: Sidecars and Authorization Policies

3.2 - Detail Views

Kiali provides list and detail views for your mesh components.

Kiali provides filtered list views of all your service mesh definitions. Each view provides health, details, YAML definitions and links to help you visualize your mesh. There are list and detail views for:

Applications
Istio Configuration
Services
Workloads

Detail list apps Detail list service Detail list workload Detail list Istio config

Selecting an object from the list will bring you to its detail page. For Istio Config, Kiali will present its YAML, along with contextual validation information. Other mesh components present a variety of Tabs.

Overview Tab

Overview is the default Tab for any detail page. The overview tab provides detailed information, including health status, and a detailed mini-graph of the current traffic involving the component. The full set of tabs, as well as the detailed information, varies based on the component type.

Each Overview provides:

links to related components and linked Istio configuration.
health status.
validation information.
an Action menu for actions that can be taken on the component.
- several Wizards are available.

And also type-specfic information. For example:

Service detail includes Network information.
Workload detail provides backing Pod information.

Detail overview app Detail overview service Detail overview workload

Both Workload and Service detail can be customized to some extent, by adding additional details supplied as annotations. This is done through the additional_display_details field in the Kiali CR.

Detail overview additional details

Traffic

The Traffic Tab presents a service, app, or workload’s Inbound and Outbound traffic in a table-oriented way:

Detail traffic

Logs

Workload detail offers a special Logs tab. Kiali offers a special unified logs view that lets users correlate app and proxy logs. It can also add-in trace span information to help identify important traces based on associated logging. More powerful features include substring or regex Show/Hide, full-screen, and the ability to set proxy log level without a pod restart.

Detail logs

Metrics

Each detail view provides Inbound Metrics and/or Outbound Metrics tabs, offering predefined metric dashboards. The dashboards provided are tailored to the relevant application, workload or service level. Application and workload detail views show request and response metrics such as volume, duration, size, or tcp traffic. The service detail view shows request and response metrics for inbound traffic only.

Kiali allows the user to customize the charts by choosing the charted dimensions. It can also present metrics reported by either source or destination proxy metrics. And for troublshooting it can overlay trace spans.

Detail metric inbound Detail metrics outbound

Traces

Each detail view provides a Traces tab with a native integration with Jaeger. For more, see Tracing.

Dashboards

Kiali will display additional tabs for each applicable Built-In Dashboard or Custom Dashboard.

Built-in dashboards

Kiali comes with built-in dashboards for several runtimes, including Envoy, Go, Node.js, and others.

Envoy

The most important built-in dashboard is for Envoy. Kiali offers the Envoy tab for many workloads. The Envoy tab is actually a Built-In Dashboard, but it is very common as it applies to any workload injected with, or that is itself, an Envoy proxy. Being able to inspect the Envoy proxy is invaluable when troublshooting your mesh. The Envoy tab itself offers five subtabs, exposing a wealth of information.

Detail Envoy

Istio’s Envoy sidecars supply some internal metrics, that can be viewed in Kiali. They are different than the metrics reported by Istio Telemetry, which Kiali uses extensively. Some of Envoy’s metrics may be redundant.

Note that the enabled Envoy metrics can be tuned, as explained in the Istio documentation: it’s possible to get more metrics using the statsInclusionPrefixes annotation. Make sure you include cluster_manager and listener_manager as they are required.

For example, sidecar.istio.io/statsInclusionPrefixes: cluster_manager,listener_manager,listener will add listener metrics for more inbound traffic information. You can then customize the Envoy dashboard of Kiali according to the collected metrics.

Go

Contains metrics such as the number of threads, goroutines, and heap usage. The expected metrics are provided by the Prometheus Go client.

Example to expose built-in Go metrics:

        http.Handle("/metrics", promhttp.Handler())
        http.ListenAndServe(":2112", nil)

As an example and for self-monitoring purpose Kiali itself exposes Go metrics.

The pod annotation for Kiali is: kiali.io/dashboards: go

Kiali

Kiali has its own built-in dashboard that helps you observe performance of the Kiali server itself. To view this dashboard, navigate to either the application or workload view of the Kiali server and select the Kiali Internal Metrics tab to see Kiali’s own internal metrics:

Kiali Metrics (app view) Kiali Metrics (workload view)

Node.js

Contains metrics such as active handles, event loop lag, and heap usage. The expected metrics are provided by prom-client.

Example of Node.js metrics for Prometheus:

const client = require('prom-client');
client.collectDefaultMetrics();
// ...
app.get('/metrics', (request, response) => {
  response.set('Content-Type', client.register.contentType);
  response.send(client.register.metrics());
});

Full working example: https://github.com/jotak/bookinfo-runtimes/tree/master/ratings

The pod annotation for Kiali is: kiali.io/dashboards: nodejs

Quarkus

Contains JVM-related, GC usage metrics. The expected metrics can be provided by SmallRye Metrics, a MicroProfile Metrics implementation. Example with maven:

    <dependency>
      <groupId>io.quarkus</groupId>
      <artifactId>quarkus-smallrye-metrics</artifactId>
    </dependency>

The pod annotation for Kiali is: kiali.io/dashboards: quarkus

Spring Boot

Three dashboards are provided: one for JVM memory / threads, another for JVM buffer pools and the last one for Tomcat metrics. The expected metrics come from Spring Boot Actuator for Prometheus. Example with maven:

    <dependency>
      <groupId>org.springframework.boot</groupId>
      <artifactId>spring-boot-starter-actuator</artifactId>
    </dependency>
    <dependency>
      <groupId>io.micrometer</groupId>
      <artifactId>micrometer-core</artifactId>
    </dependency>
    <dependency>
      <groupId>io.micrometer</groupId>
      <artifactId>micrometer-registry-prometheus</artifactId>
    </dependency>

Full working example: https://github.com/jotak/bookinfo-runtimes/tree/master/details

The pod annotation for Kiali with the full list of dashboards is: kiali.io/dashboards: springboot-jvm,springboot-jvm-pool,springboot-tomcat

By default, the metrics are exposed on path /actuator/prometheus, so it must be specified in the corresponding annotation: prometheus.io/path: "/actuator/prometheus"

Thorntail

Contains mostly JVM-related metrics such as loaded classes count, memory usage, etc. The expected metrics are provided by the MicroProfile Metrics module. Example with maven:

    <dependency>
      <groupId>io.thorntail</groupId>
      <artifactId>microprofile-metrics</artifactId>
    </dependency>

Full working example: https://github.com/jotak/bookinfo-runtimes/tree/master/productpage

The pod annotation for Kiali is: kiali.io/dashboards: thorntail

Vert.x

Several dashboards are provided, related to different components in Vert.x: HTTP client/server metrics, Net client/server metrics, Pools usage, Eventbus metrics and JVM. The expected metrics are provided by the vertx-micrometer-metrics module. Example with maven:

    <dependency>
      <groupId>io.vertx</groupId>
      <artifactId>vertx-micrometer-metrics</artifactId>
    </dependency>
    <dependency>
      <groupId>io.micrometer</groupId>
      <artifactId>micrometer-registry-prometheus</artifactId>
    </dependency>

Init example of Vert.x metrics, starting a dedicated server (other options are possible):

      VertxOptions opts = new VertxOptions().setMetricsOptions(new MicrometerMetricsOptions()
          .setPrometheusOptions(new VertxPrometheusOptions()
              .setStartEmbeddedServer(true)
              .setEmbeddedServerOptions(new HttpServerOptions().setPort(9090))
              .setPublishQuantiles(true)
              .setEnabled(true))
          .setEnabled(true));

Full working example: https://github.com/jotak/bookinfo-runtimes/tree/master/reviews

The pod annotation for Kiali with the full list of dashboards is: kiali.io/dashboards: vertx-client,vertx-server,vertx-eventbus,vertx-pool,vertx-jvm

Custom Dashboards

When the built-in dashboards don’t offer what you need, it’s possible to create your own. See Custom Dashboard Configuration for more information.

3.3 - Health

How Kiali reflects your Service Mesh Health.

Kiali help users know whether their service mesh is healthy. This includes the health of the mesh infrastructure itself, and the deployed application services.

Service Mesh Infrastructure Health

Users can quickly confirm the health of their infrastructure by looking at the Kiali Masthead. If Kiali detects any health issues with the infrastructure of the mesh, including multi-cluster setups, it will show an indication in the masthead, severity will be reflected via color, and hovering will show the detail:

Masthead Health

For more detail on how Kiali tracks the Istio infrastructure status, see the Istio Status Feature.

Overview Health

The default Kiali page is an Overview Dashboard. This view will quickly allow you to identify components with issues, including clusters, Istio configuration, control planes and data planes. It provides a chart showing all applications grouped by health, and Service Insights showing the services with the top error rates and p95 latencies.

Overview Health

Graph Health

The Kiali Graph offers a rich visualization of your service mesh traffic. The health of Nodes and Edges is represented via a standard color system using shades of orange and red to reflect degraded and failure-level traffic health. Red or orange nodes or edges may need attention. The color of an edge represents the request health between the relevant nodes. Note that node shape indicates the type of component, such as service, workload, or app.

The health of nodes and edges is refreshed automatically based on the user’s desired refresh interval. The graph can also be paused to examine a particular state, or replayed to re-examine a particular time period.

Graph Health

Health Configuration

Kiali calculates health by combining the individual health of several indicators, such as pods and request traffic. The global health of a resource reflects the most severe health of its indicators.

Health Indicators

The table below lists the current health indicators and whether the indicator supports custom configuration for its health calculation.

Indicator	Supports Configuration
Pod Status	No
Traffic Health	Yes

Icons and colors

Kiali uses icons and colors to indicate the health of resources and associated request traffic.

No Health Information (NA)
Healthy
Degraded
Failure

Custom Request Health

3.4 - Internationalization

How Kiali is displayed in mutliple languages.

Kiali is used worldwide and some users prefer to display Kiali in a language that they are more comfortable with than English. For this reason Kiali supports internationalization, and it can be localized into multiple languages.

Current supported languages are English, Chinese and Spanish.

Language Selector

Kiali provides a language selector in the Masthead to be able to switch Kiali between supported languages:

Masthead language selector

By default the language selector is hidden, and English is the default language.

In order to show the language selector a feature flag must be enabled in the Kiali CR:

spec:
  kiali_feature_flags:
    ui_defaults:
      i18n:
        language: en
        show_selector: true

You can also modify the default language shown in Kiali (values according to ISO 639-1 codes):

Language	Code
Chinese	zh
English	en
Spanish	es

As an example, this is how Kiali displays the Overview page in Spanish:

Overview page in Spanish

If you want to collaborate with us on adding a new language or improving the translation of an existing one, please refer to the internationalization section of the kiali project’s README for UI

3.5 - Istio Ambient Mesh

Visualizing Ambient Mesh with Kiali

Kiali provides visualization for Ambient Mesh components:

Control Plane Ambient Mesh
Ambient namespace
Workloads in Ambient Mesh
Waypoint proxy details
Ztunnel details
Ambient Telemetry
Ambient tracing

The Kiali Ambient features, as well as Ambient Mesh, are evolving. Some of these features are in alpha status. For enhancements or detected issues, don’t hesitate to open a GitHub issue.

Control Plane Ambient Mesh

When the control plane is in Ambient mode, Kiali will show an Ambient badge on the Namespaces page, for the control plane namespace, in the Type column. It will also be reflected in the control plane side-panel on the Mesh page. This badge indicates that Kiali has detected a ztunnel (the L4 component for Ambient) in the control plane namespace.

Ambient Control Plane

For Kiali to detect Ambient, it needs to have access to the namespace where ztunnel is deployed. This is usually the Istio control plane namespace (often istio-system), but on platforms such as OpenShift, it may differ.

Ambient Namespace

When a namespace is labeled with istio.io/dataplane-mode=ambient, it is enrolled in Ambient Mesh. On the Namespaces page, Kiali will show the number of ambient data planes, as well as the number of sidecar data planes.

Ambient Data Plane

Workloads in Ambient Mesh

When a workload, application, or service is part of the Ambient Mesh, a badge will appear in the namespace details. When hovering over this badge, further information about the workload will be displayed:

In Mesh: Indicating that it was included in Ambient, and the traffic is redirected to ztunnel to provide L4 features (L4 authorization and telemetry, and encrypted data transport)

Workload Captured by Ambient

In Mesh with waypoint enabled: Additionally, it can include the L7 badge which means that a waypoint proxy is deployed (providing additional L7 capabilities):

Workloads Captured by Ambient

It is possible to check each pod protocol in the information tooltip. In Ambient, instead of TCP, it uses HBONE.

Pod details protocol

When the workload traffic is handled by a waypoint, the workload details will show a link to the proxy:

Waypoint link

Kiali will correlate the ztunnel and waypoint logs related to the application and provide a checkbox to include them with the application logs. This ensures that all relevant information for the application is available in one place:

Workload logs

Waypoint proxy details

The workload details for a waypoint has specific waypoint data. It is identified with the L7 label:

Waypoint label

The proxy status shows a new info message when some of the Discovery Services are IGNORED, and there are no other errors:

Waypoint proxy status

This condition is usually expected, but it is shown as an info in case it is not.

The waypoint proxy generates traces for the services for which it handles traffic, and this is where it can be checked, because the proxy generates the traces with the waypoint service name:

Waypoint traces

Waypoint proxies have a specific tab to show information about the Services and Workloads enrolled. The Labeled by label identifies where the waypoint label was added. It can be in the namespace, in the service or the workload.

waypoint_tab

For waypoint proxies, it is also possible to see the Envoy tab:

Waypoint Envoy

Ztunnel details

The workload details for a ztunnel workload has specific data. It has a new ztunnel tab containing the configuration for the services and workloads for which it handles traffic. It shows the same information that can be seen using the istioctl ztunnel-config, which can be useful for troubleshooting.

Ztunnel details

Ambient Telemetry

The Traffic graph generated with Ambient telemetry differs slightly from the usual graph, as the HTTP traffic and TCP traffic have different reporters.

The telemetry reported with sidecars represents the kind of traffic for the request (green edges for HTTP, blue edges for TCP). In Ambient, this information depends on the element reporting the Telemetry. The ztunnel will report all the traffic as TCP:

ztunnel graph

The following bookinfo namespace is in Ambient Mesh with a waypoint proxy enabled. Therefore, the telemetry is reported from ztunnel and from the waypoint, resulting in double edges connecting different nodes (Note that the Graph page toolbar offers a Traffic menu, letting you be selective about the protocols shown):

Ambient Telemetry

It is possible to filter the traffic by the Ambient reporter (ztunnel or waypoint) from the Traffic menu option:

Ambient Traffic selector

There is an additional display option, waypoint proxies for the Ambient Mesh, that will display the waypoint proxies in the graph:

Waypoint proxies

The waypoint proxies often serve as both the source and destination of traffic within the same workload, represented in the graph by bidirectional edges. When you click on an edge, the summary panel will display the waypoint proxy as the destination workload. However, you can also view the waypoint as the source by clicking on the double arrow icon located to the left of the “From/To” labels in the summary panel.

bidirectional edges

When the ingress waypoint routing is enabled on a service (istio.io/ingress-use-waypoint=true), the traffic goes from the gateway to the waypoint, instead of going to the service. In that case, the gateway node will show the waypoint icon:

ingress use waypoint

Ambient Tracing

Ambient traces are emitted from the waypoint proxies. The traces involving a workload can be found looking for the waypoint service name.

In order to correlate the waypoint traces from a specific workload, app or service, Kiali looks for traces of the waypoint proxy in which the workload is enrolled. It then filters the traces related to the application using the operation name from the span.

ambient traces

The same approach is used to show the spans in the workload logs:

ambient span logs

Also for the inbound and outbound metrics:

ambient spans

As the workload name is not part of the trace information, there are some gaps in the trace overlay (These gaps will hopefully be fixed in upstream Istio in a future Istio release - see this GitHub issue for details on that enhancement request). Also, for the workload view, there might be traces that are not part of a particular workload, but they are shown because they match the service name of the workload.

Trace overlay

Starting with Istio 1.28, traces are reported using the service name instead of the waypoint name. Starting with Kiali 2.22, there is a configuration option, external_services.tracing.use_waypoint_name (disabled by default), that allows using the waypoint name as the service used for trace lookup.

3.6 - Istio Configuration

Using Kiali to generate Istio mesh-wide configuration.

Kiali is more than observability, it also helps you to configure, update and validate your Istio service mesh.

The Istio configuration view provides advanced filtering and navigation for Istio configuration objects, such as Virtual Services and Gateways. Kiali provides inline config editing and powerful semantic validation for Istio resources.

Istio Config List

Validations

Kiali performs a set of validations on your Istio Objects, such as Destination Rules, Service Entries, and Virtual Services. Kiali’s validations go above and beyond what Istio offers. Where Istio offers mainly static checks for well-formed definitions, Kiali performs semantic validations to ensure that the definitions make sense, across objects, and in some cases even across namespaces. Kiali validations are based on the runtime status of your service mesh.

Check the complete list of validations for further information.

Istio Config Validation

Wizards

Kiali Wizards provide a way to apply an Istio service mesh pattern, letting Kiali generate the Istio Configuration. Wizards are launched from Kiali’s Action menus, located across the UI on relevant pages. Wizards can create and update Istio Config for Routing, Gateways, Security scenarios and more.

Istio Config Page Wizards

These Create Actions are available on the Istio Config page:

Istio Config Create Actions

Authorization Wizards

Kiali allows creation of Istio AuthorizationPolicy resources:

AuthorizationPolicy

Istio PeerAuthentication resources:

PeerAuthentication

Istio RequestAuthentication resources:

RequestAuthentication

Traffic Wizards

Kiali also allows creation of Istio Gateway resources.

Gateway

Istio ServiceEntry resources:

ServiceEntry

Istio Sidecar resources:

Sidecar

K8s Gateway resources:

K8sGateway

K8s Reference Grants resources:

K8sReferenceGrant

Other Kiali Wizards

Kiali also has Wizards available from the Namespaces page (Kiali >= 2.23) and many details pages, such as Service Detail to create routing rules. The Kiali Travel Tutorial goes into several of these wizards.

Namespaces Page Wizards

The Namespaces page (Kiali >= 2.23) has namespace-specific actions for creating traffic policies:

Namespace Actions

Service Wizards

The Service Detail page offers several wizards to create traffic control config:

Service Actions

3.7 - Istio Infrastructure Status

How Kiali monitors your Istio infrastructure.

A service mesh simplifies application services by deferring the non-business logic to the mesh. But for healthy applications the service mesh infrastructure must also be running normally. Kiali monitors the multiple components that make up the service mesh, letting you know if there is an underlying problem.

Istio component status

A component status will be one of: Not found, Not ready, Unreachable, Not healthy and Healthy. Not found means that Kiali is not able to find the deployment. Not ready means no pods are running. Unreachable means that Kiali hasn’t been successfully able to communicate with the component (Prometheus, Grafana and Jaeger). Not healthy means that the deployment doesn’t have the desired amount of replicas running. Otherwise, the component is Healthy and it won’t be shown in the list.

Regarding the severity of each component, there are only two options: core or add-on. The core components are those shown as errors (in red) whereas the add-ons are displayed as warnings (in orange).

By default, Kiali checks that the core components “istiod”, “ingress”, and “egress” are installed and running in the control plane namespace, and that the add-ons “prometheus”, “grafana” and “jaeger” are available.

Mesh page

Detailed information about the Istio infrastructure status is displayed on the mesh page. It shows an infrastructure topology view with core and add-on components, their health, and how they are connected to each other.

Similar to the traffic graph, the left side of the page shows the topology view, while the right side displays information about the selected node. If no node is selected, global infrastructure information is shown, including the status, version, and cluster of every component.

Mesh overview information

Connection issues between Kiali and any component are indicated with a red dotted line and a red health indicator in the target side panel.

Connection issue in the mesh

The specific information shown in the target side panel depends on the type of node selected:

Kiali

When you click on the Kiali node, you can check information such as the version, health status, and configuration values.

Kiali information

Istio control plane

When you click on the Istio control plane, you can check information such as the Istio version, mTLS status, outbound policy, CPU and memory metrics, configuration table, and more.

Istio control plane information

Control plane Namespace

When you click on the control plane namespace box, you can check the numbers of Namespaces managed by Control Planes, this will show the data plane migration status between different versions of Istio installations.

Control plane namespace information

Data plane

When you click on the cluster data plane, you can check the basic information of each namespace belonging to that data plane (Istio configuration, traffic inbound/outbound), similar to what you can see on the overview page.

Data plane information

Add-on components

When you click on the “prometheus”, “grafana” or “jaeger” node, , its health status, version, and configuration values are displayed:

Add-on information

3.8 - Multi-cluster

Advanced Mesh Deployment and Multi-cluster support.

A basic Istio mesh deployment has a single control plane with a single data plane, deployed on a single Kubernetes cluster. But Istio supports a variety of advanced deployment models. It allows a mesh to span multiple primary (control plane) and/or remote (data plane only) clusters, and can use a single or multi-network approach. The only strict rule is that within a mesh service names are unique. A non-basic mesh deployment generally involves multiple clusters. See installation instructions for more detail on installing advanced mesh deployments.

A single Kiali install can currently work with at most one mesh, one metric store and one trace store but it can be configured for “single cluster” or “multi-cluster”. All clusters must be part of the same mesh and report to the same metric and trace store, whether directly or via some sort of aggregator. See the multi-cluster configuration page for more information on requirements.

For multi-cluster configurations, Kiali provides a unified view and management of your mesh across clusters.

Graph View: Cluster and Namespace Boxing

Istio provides cluster names in the traffic telemetry for multi-cluster installations. The Kiali graph can use this information to better visualize clusters and namespaces. The Display menu offers two multi-cluster related options: Cluster Boxes and Namespace Boxes. When enabled, either separately or together, the graph will generate boxes to help identify the relevant nodes and edges, and to see traffic traveling between them.

Each box type supports selection and provides a side-panel summary of traffic. Below we see a Bookinfo traffic graph for when Bookinfo services are deployed across the East and West clusters. The West cluster box is selected. You see traffic for all services and workloads across both clusters. Single cluster configurations will show some traffic across clusters but not all.

Multi-cluster traffic graph

Unified Multi-cluster configuration

Unlike single-cluster configurations, multi-cluster configurations show list/details pages across all clusters.

List Views: Aggregated mesh view

With a multi-cluster Kiali configuration, you can view all apps, workloads, services, and Istio config in your mesh from a single place.

Multi-cluster list pages

Detail Views: Dig into details across clusters

The detail pages provide the same functionality across all clusters that you have access to for a single cluster, including viewing logs, metrics, traces, envoy details, and more.

Multi-cluster detail pages

Overview: Cross cluster namespace info

The overview page shows namespace information across all configured clusters.

Multi-cluster overview

Mesh view: Cluster, Istio and data plane boxing

The mesh graph displays infrastructure information for multiple clusters, Istio control planes, and data planes according to the Istio deployment (primary-remote or multi-primary).

Multi-cluster mesh

3.9 - Security

How Kiali visualizes mTLS.

Kiali gives support to better understand how mTLS is used in Istio meshes. Find those helpers in the Mesh page, Traffic Graph, Overview Page, and specific validations.

Mesh indicator

At the right panel of the Istio Control Plane from the Mesh page, Kiali shows a lock when the mesh has enabled mTLS for the whole service mesh. It means that all the communications in the mesh uses mTLS.

mTLS mesh-wide strict

Kiali shows a hollow lock when either the mesh is configured in PERMISSIVE mode or there is a misconfiguration in the mesh-wide mTLS configuration.

mTLS mesh-wide permissive

Namespace locks

The Namespaces page shows all the available namespaces with aggregated data. Besides the health and validations, Kiali also shows the namespace-wide mTLS status. Similar to the Mesh page, it shows a lock when strict mTLS is enabled or an open lock when permissive. A red open lock is shown when mTLS is disabled. When the namespace doesn’t include an mTLS policy and it is inherited from the mesh, a down arrow is shown and the inherited mTLS is described in the badge.

Overview: Namespace mTLS

Graph

The mTLS method is used to establish communication between microservices. In the graph, Kiali has the option to show which edges are using mTLS and with what percentage during the selected period. When an edge shows a lock icon it means at least one request with mTLS enabled is present. In case there are both mTLS and non-mTLS requests, the side-panel will show the percentage of requests using mTLS.

Enable the option in the Display dropdown, select the security badge.

Graph: Edge mTLS

Validations

Kiali has different validations to help troubleshoot configurations related to mTLS such as DestinationRules and PeerAuthentications.

Validation supporting mTLS configuration

3.10 - Topology

How Kiali visualizes the mesh topology.

Kiali offers multiple ways for users to examine their mesh Topology. Each combines several information types to help users quickly evaluate the health of their service architecture.

Overview

Kiali’s default landing page is the mesh Overview. It presents a high-level view of the mesh clusters, control planes, data planes, and configuration, for the user. It combines service and application information, along with telemetry, validations and health, to provide a holistic summary of the system behavior. It highlights potential issues that the user may need to investigate.

Topology overview

The Namespaces page (Kiali >= 2.23) provides numerous filtering, sorting and presentation options for the available namespaces. From here users can perform namespace-level Actions, or quickly navigate to more detailed views.

Topology namespace overview

Graph

The Kiali Graph offers a powerful visualization of your mesh traffic. The topology combines real-time request traffic with your Istio configuration information to present immediate insight into the behavior of service mesh, allowing you to quickly pinpoint issues. Multiple Graph Types allow you to visualize traffic as a high-level service topology, a low-level workload topology, or as an application-level topology.

Graph nodes are decorated with a variety of information, pointing out various route routing options like virtual services and service entries, as well as special configuration like fault-injection and circuit breakers. It can identify mTLS issues, latency issues, error traffic and more. The Graph is highly configurable, can show traffic animation, and has powerful Find and Hide abilities.

You can configure the graph to show the namespaces and data that are important to you, and display it in the way that best meets your needs.

Topology graph

Health

Colors in the graph represent the health of your service mesh. A node colored red or orange might need attention. The color of an edge between components represents the health of the requests between those components. The node shape indicates the type of component such as services, workloads, or apps.

The health of nodes and edges is refreshed automatically based on the user’s preference. The graph can also be paused to examine a particular state, or replayed to re-examine a particular time period.

Topology graph health

Side-Panel

The collapsible side-panel summarizes the current graph selection, or the graph as a whole. A single-click will select the node, edge, or box of interest. The side panel provides:

Charts showing traffic and response times.
Health details.
Links to fully-detailed pages.
Response Code and Host breakdowns.
Traces involving the selected component.

Topology graph side-panel app Topology graph side-panel service Topology graph side-panel workload

Node Detail

A single-click selects a graph node. A double-click drills in to show the node’s Detail Graph. The node detail graph visualizes traffic from the point-of-view of that node, meaning it shows only the traffic reported by that node’s Istio proxy.

You can return back to the main graph, or double-click to change to a different node’s detail graph.

Topology graph node detail

Traffic Animation

Kiali offers several display options for the graph, including traffic animation.

For HTTP traffic, circles represent successful requests while red diamonds represent errors. The more dense the circles and diamonds the higher the request rate. The faster the animation the faster the response times.

TCP traffic is represented by offset circles where the speed of the circles indicates the traffic speed.

Topology graph animation

Ranking

Nodes can be ranked in the graph based on pre-defined criteria such as ’number of inbound edges’. Combined with the find/hide feature, this allows you to quickly highlight or filter for important workloads, services, and applications.

Rankings are normalized to fit between 1..100 and nodes may tie with each other in rank. Ranking starts at 1 for the top ranked nodes so when ranking nodes based on ’number of inbound edges’, the node(s) with the most inbound edges would have rank 1. Node(s) with the second most inbound edges would have rank 2. Each selected criteria contributes equally to a node’s ranking. Although 100 rankings are possible, only the required number of rankings are assigned, starting at 1.

Topology graph ranking

Graph Types

Kiali offers four different traffic-graph renderings:

The workload graph provides the a detailed view of communication between workloads.
The app graph aggregates the workloads with the same app labeling, which provides a more logical view.
The versioned app graph aggregates by app, but breaks out the different versions providing traffic breakdowns that are version-specific.
The service graph provides a high-level view, which aggregates all traffic for defined services.

Topology graph type workload Topology graph type app Topology graph type versioned app Topology graph service

Replay

Graph replay allows you to replay traffic from a selected past time-period. This gives you a chance to thoroughly examine a time period of interest, or share it with a co-worker. The graph is fully bookmarkable, including replay.

Operation Nodes

Istio v1.6 introduced Request Classification. This powerful feature allows users to classify requests into aggregates, called “Operations” by convention, to better understand how a service is being used. If configured in Istio the Kiali graph can show these as Operation nodes. The user needs only to enable the “Operation Nodes” display option. Operations can span services, for example, “VIP” may be configured for both CarRental and HotelRental services. To see total “VIP” traffic then display operation nodes without service nodes. To see “VIP” traffic specific to each service then also enable the “Service Nodes” display option.

When selected, an Operation node also provides a side-panel view. And when double-clicked a node detail graph is also provided.

Because operation nodes represent aggregate traffic they are not compatible with Service graphs, which themselves are already logical aggregates. For similar reasons response time information is not available on edges leading into or out of operation nodes. But by selecting the edge the response time information is available in the side panel (if configured).

Operation nodes are represented as pentagons in the Kiali graph:

Topology graph operation

3.11 - Tracing

How Kiali integrates Distributed Tracing with Jaeger.

Kiali offers a native integration with two different Distributed Tracing platforms, Jaeger and Grafana Tempo. As such, users can access trace visualizations. But more than that, Kiali incorporates tracing into several correlated views, making your investment in trace data even more valuable.

For a quick glimpse at Kiali tracing features, see below. For a detailed explanation of tracing in Kiali, see this 3-part Trace my mesh blog series. There is a detailed guide to configure access to the different distributed tracing platform, Jaeger and Grafana Tempo.

Workload detail

When investigating a workload, click the Traces tab to visualize your traces in a chart. When selecting a trace Kiali presents a tab for trace detail, and a tab for span details. Kiali always tries to surface problem areas, Kiali uses a heatmap approach to help the user identify problem traces or spans.

Trace detail Span detail

Heatmaps

A heatmap that you see in the Workload’s Tracing tab is a matrix that compares a specific trace’s request duration against duration metrics aggregated over time.

Heatmap

Each trace has a corresponding heatmap matrix. Each cell in the matrix corresponds to a specific metric aggregate; the value and color of a cell represents the difference between that metric and the duration of the matrix’s associated trace.

For example, the top-right cell of the heatmap above shows that the duration of the request represented by the trace (5.96ms) was 17.5ms faster than the 99th percentile of all inbound requests to the workload within the last 10 minutes.

The color of a cell will range between red and green; the more green a cell is, the faster the duration of the trace was compared to the aggregate metric data. A red cell indicates the associated trace was much slower compared to the aggregate metric data and so examining that trace could help detect a potential bottleneck or problem.

Metric Correlation

Kiali offers span overlays on Metric charts. The user can simply enable the spans option to generate the overlay. Clicking any span will navigate back to the Traces tab, focused on the trace of interest.

Metrics with Tracing

Graph Correlation

Kiali users often use the Graph Feature to visualize their mesh traffic. In the side panel, When selecting a graph node, the user will be presented with a Traces tab, which lists available traces for the time period. When selecting a trace the graph will display an overlay for the trace’s spans. And the side panel will display span details and offer links back to the trace detail views.

Graph with Tracing

Logs Correlation

Kiali works to correlate the standard pillars of observability: traces, metrics and logs. Kiali can present a unified view of log and trace information, in this way letting users use logs to identify traces of interest. When enabling the spans option Kiali adds trace entries to the workload logs view. Below, in time-sorted order, the user is presented with a unified view of application logs (in white), Envoy proxy logs (in gold), and trace spaces (in blue). Clicking a span of interest brings you to the detail view for the trace of interest.

Logs with Tracing

3.12 - Validation

A description and complete list of Kiali validations.

Istio Config Validation

Disabling validations

In certain environments, particularly those with a high volume of configurations or limited resources, the Istio validation process can be time and resource intensive, potentially causing delays. To prioritize speed and resource efficiency in such scenarios, Kiali offers the option to disable these validations by configuring the validation_reconcile_interval setting to “0s” within the Kiali CR.

Below is an example of a Kiali CR with validations disabled:

spec:
  external_services:
    istio:
      validation_reconcile_interval: "0s"

The complete list of validations:

AuthorizationPolicy

KIA0101 - Namespace not found for this rule

AuthorizationPolicy enables access control on workloads. Each policy effects only to a group of request. For instance, all requests started from a workload on a list of namespaces. The present validation points out those rules referencing a namespace that don’t exist in the cluster.

Resolution

Either remove the namespace from the list, correct if there is any typo or create a new namespace.

Severity

Warning

Example

apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
 name: httpbin
 namespace: default
spec:
 selector:
   matchLabels:
     app: httpbin
     version: v1
 rules:
 - from:
   - source:
       principals: ["cluster.local/ns/default/sa/sleep"]
   - source:
       namespaces:
         - default
         - non-existing # warning
         - unexisting # warning
   to:
   - operation:
       methods: ["GET"]
       paths: ["/info*"]
   - operation:
       methods: ["POST"]
       paths: ["/data"]
   when:
   - key: request.auth.claims[iss]
     values: ["https://accounts.google.com"]

KIA0102 - Only HTTP methods and fully-qualified gRPC names are allowed

An AuthorizationPolicy has an Operation field where is defined the oprations allowed for a request. In the method field are listed all the allowed methods that request can have. This validation appears when a problem is found in there. The only methods accepted are: either HTTP valid methods or fully-qualified names of gRPC service in the form of “/package.service/method”

Resolution

Either change or remove the violating method. It has to be either a HTTP valid method or a fully-qualified names of a gRPC service in the form of “/package.service/method”

Severity

Warning

Example

apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
 name: httpbin
 namespace: default
spec:
 selector:
   matchLabels:
     app: httpbin
     version: v1
 rules:
 - from:
   - source:
       principals: ["cluster.local/ns/default/sa/sleep"]
   - source:
       namespaces:
         - default
   to:
   - operation:
       methods:
         - "GET"
         - "/package.service/method"
         - "WRONG" # Warning
         - "non-fully-qualified-grpc" # Warning
       paths: ["/info*"]
   - operation:
       methods: ["POST"]
       paths: ["/data"]
   when:
   - key: request.auth.claims[iss]
     values: ["https://accounts.google.com"]

KIA0104 - This host has no matching entry in the service registry

AuthorizationPolicy enables access control on workloads. Each policy effects only to a group of request going to a specific destination. For instance, allow all the request going to details host.

The present validation points out those rules referencing a host that don’t exist in the authorization policy namespace. Kiali considers services, virtual services and service entries. Those hosts that refers to hosts outside of the object namespace will be presented with an unknown error.

Resolution

Either remove the host from the list, correct if there is any typo or deploy a new service, service entry or a virtual service pointing to that host.

Severity

Error

Example

apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
 name: httpbin
 namespace: default
spec:
 selector:
   matchLabels:
     app: httpbin
     version: v1
 rules:
 - from:
   - source:
       principals: ["cluster.local/ns/default/sa/sleep"]
   - source:
       namespaces:
         - default
   to:
   - operation:
       hosts:
         - wrong # Error
         - ratings
         - details.default
         - reviews.default.svc.cluster.local
         - productpage.outside # Unknown
         - google.com # Service Entry present. No error
         - google.org # Service Entry not present, wrong domain. Error.
       methods:
         - "GET"
         - "/package.service/method"
       paths: ["/info*"]
   - operation:
       methods: ["POST"]
       paths: ["/data"]
   when:
   - key: request.auth.claims[iss]
     values: ["https://accounts.google.com"]

KIA0105 - This field requires mTLS to be enabled

AuthorizationPolicy has a Source field which specifies the source identities of a request. The Source field accepts principals and namespaces which require mTLS be enabled.

A validation Error message on a principals or namespaces fields means that mTLS is not enabled.

This validation appears only when autoMtls is disabled.

Resolution

Either remove this field or enable autoMtls.

Severity

Error

Example

apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
 name: httpbin
 namespace: bookinfo
spec:
 selector:
   matchLabels:
     app: httpbin
     version: v1
 rules:
 - from:
   - source:
       principals: ["cluster.local/ns/default/sa/sleep"]
   - source:
       namespaces:
         - default
   to:
   - operation:
       methods: ["GET"]
       paths: ["/info*"]
   - operation:
       methods: ["POST"]
       paths: ["/data"]
   when:
   - key: request.auth.claims[iss]
     values: ["https://accounts.google.com"]

KIA0106 - Service Account not found for this principal

AuthorizationPolicy has a Source field which specifies the source identities of a request. The Source field allows principals to be specified - a list of peer identities derived from the peer certificate. A peer identity is in the format of <TRUST_DOMAIN>/ns/<NAMESPACE>/sa/<SERVICE_ACCOUNT>, for example, cluster.local/ns/default/sa/productpage.

A validation Error message on a principal value means that, while the specified Service Account may exist, it is not referenced by any Pod in the system.

Resolution

Correct the principal to refer to an existing Service Account, make sure that the value is in correct format without a typo, and make sure at least one Pod references the Service Account.

Severity

Error

Example

apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
 name: httpbin
 namespace: default
spec:
 selector:
   matchLabels:
     app: httpbin
     version: v1
 rules:
 - from:
   - source:
       principals: ["cluster.local/ns/default/sa/sleep"]
   - source:
       namespaces:
         - default
   to:
   - operation:
       hosts:
         - ratings
         - details.default
         - reviews.default.svc.cluster.local
       methods:
         - "GET"
       paths: ["/info*"]
   - operation:
       methods: ["POST"]
   when:
   - key: request.auth.claims[iss]
     values: ["https://accounts.google.com"]

KIA0107 - Service Account for this principal found on a remote cluster

A validation Warning message on a principal value means, that the specified Service Account was found in a cluster different from that of the AuthorizationPolicy.

Resolution

Kiali currently does not verify if the SPIRE is configured on the workload of the remote cluster.

Severity

Warning

Destination rules

KIA0201 - More than one DestinationRules for the same host subset combination

This validation warning could be a result of duplicate definition of the same subsets as well as from rules that apply to all subsets. Also, a combination of one Destination Rule (DR) applying to all subsets and another defining behavior for only some subsets triggers this validation warning.

Istio silently ignores the duplicate subsets and merge these destination rules without letting the user know. Only the first seen rule (by Istio) per subset is used and information from multiple definitions is not merged. While the routing might work correctly, this is most likely a configuration error. It may lead to a undesired behavior if one of the offending rules is removed or modified and that is probably not the intention of the deployer of this service. Also, if the two offending destination rules have different policies for traffic routing the wrong one might be used.

Resolution

Either merge the settings to a single DR or split the subsets in such a way that they do not interleave. This ensures that the routing behavior stays consistent.

Severity

Warning

Example

apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  name: reviews-dr1
spec:
  host: reviews
  trafficPolicy:
    loadBalancer:
      simple: RANDOM
  subsets:
  - name: v1
    labels:
      version: v1
  - name: v2
    labels:
      version: v2
  - name: v3
    labels:
      version: v3
---
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  name: reviews-dr2
spec:
  host: reviews
  trafficPolicy:
    loadBalancer:
      simple: RANDOM
  subsets:
  - name: v1
    labels:
      version: v1

KIA0202 - This host has no matching entry in the service registry (service, workload or service entries)

Istio applies traffic rules for services after the routing has happened. These can include different settings such as connection pooling, circuit breakers, load balancing, and detection. Istio can define the same rules for all services under a host or different rules for different versions of the service. The host must have a service that is defined in the platform’s service registry or as a ServiceEntry. Short names are extended to include ‘.namespace.cluster’ using the namespace of the destination rule, not the service itself. FQDN is evaluated as is. It is recommended to use the FQDN to prevent any confusion.

If the host is not found, Istio ignores the defined rules.

Resolution

Correct the host to point to a correct service, in this namespace or with FQDN to other namespaces, or deploy the missing service to the mesh.

Severity

Error

There is an exception to the severity level: It will be shown as a Warning when OutboundTrafficPolicy Mode for MeshConfig is set to ALLOW_ANY.

Example

apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  name: reviews
spec:
  host: notpresent
  trafficPolicy:
    loadBalancer:
      simple: RANDOM
  subsets:
  - name: v1
    labels:
      version: v1
  - name: v2
    labels:
      version: v2
  - name: v3
    labels:
      version: v3

KIA0203 - This subset’s labels are not found in any matching host

Istio applies traffic rules for services after the routing has happened. These can include different settings such as connection pooling, circuit breakers, load balancing, and detection. Istio can define the same rules for all services under a host or different rules for different versions of the service. The host must a service that is defined in the platform’s service registry or as a ServiceEntry. Short names are extended to include ‘.namespace.cluster’ using the namespace of the destination rule, not the service itself. FQDN is evaluated as is. It is recommended to use the FQDN to prevent any confusion.

Subsets can override the global settings defined in the DR for a host.

If the host is not found, Istio ignores the defined rules.

If the not found subset is not referenced in any Virtual Service, the severity of this error is changed to Info.

Resolution

Correct the host to point to a correct service, in this namespace or with FQDN to other namespaces, or deploy the missing service to the mesh. If the hostname is equal to the one used otherwise in the DR, consider removing the duplicate host resolution.

Also, verify that the labels are correctly matching a workload with the intended service.

Severity

Error

Example

apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  name: reviews
spec:
  host: reviews
  trafficPolicy:
    loadBalancer:
      simple: RANDOM
  subsets:
  - name: v1
    labels:
      version: v10
  - name: v2
    labels:
      notfoundlabel: v2
  - name: v3
    labels:
      version: v3

KIA0204 - mTLS settings of a non-local Destination Rule are overridden

Istio allows you to define DestinationRule at three different levels: mesh, namespace and service level. A mesh may have multiple DRs. In case of having two DestinationRules on the first one is at a lower level than the second one, the first one overrides the TLS values of the second one.

Resolution

This validation aims to warn Kiali users that they may be disabling/enabling mTLS from the higher DestinationRule. Merging the TLS settings to one of the DestinationRules is the only way to fix this validation. However, this is a valid scenario so it might be impossible to remove this warning.

Severity

Warning

Example

apiVersion: "networking.istio.io/v1alpha3"
kind: "DestinationRule"
metadata:
  name: "default"
  namespace: "istio-system"
spec:
  host: "*.local"
  trafficPolicy:
    tls:
      mode: ISTIO_MUTUAL
---
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  name: reviews
spec:
  host: reviews
  trafficPolicy:
    loadBalancer:
      simple: RANDOM
  subsets:
  - name: v1
    labels:
      version: v1
  - name: v2
    labels:
      version: v2
  - name: v3
    labels:
      version: v3

KIA0205 - PeerAuthentication enabling mTLS at mesh level is missing

Istio has the ability to define mTLS communications at mesh level. In order to do that, Istio needs one DestinationRule and one PeerAuthentication. The DestinationRule configures all the clients of the mesh to use mTLS protocol on their connections. The PeerAuthentication defines what authentication methods that can be accepted on the workload of the whole mesh. If the PeerAuthentication is not found or doesn’t exist and the mesh-wide DestinationRule is on ISTIO_MUTUAL mode, all the communication returns 500 errors.

Resolution

Add a PeerAuthentication within the istio-system namespace without specifying targets but setting peers mtls mode to STRICT or PERMISSIVE. The PeerAuthentication should be like this.

Severity

Error

Example

# AutoMtls disabled, no PeerAuthentication at mesh-level defined
apiVersion: "networking.istio.io/v1alpha3"
kind: "DestinationRule"
metadata:
  name: "default"
  namespace: "istio-system"
spec:
  host: "*.local"
  trafficPolicy:
    tls:
      mode: ISTIO_MUTUAL

KIA0206 - PeerAuthentication enabling namespace-wide mTLS is missing

Istio has the ability to define mTLS communications at namespace level. In order to do that, Istio needs both a DestinationRule and a PeerAuthentication targeting all the clients/workloads of the specific namespace. The PeerAuthentication allows mTLS authentication method for all the workloads within a namespace. The DestinationRule defines all the clients within the namespace to start communications in mTLS mode. If the PeerAuthentication is not found and the DestinationRule is on STRICT mode in that namespace but there is the DestinationRule enabling mTLS, all the communications within that namespace returns 500 errors.

Resolution

A PeerAuthentication enabling mTLS method is needed for the workloads in the namespace. Otherwise all the clients start mTLS connections that those workloads won’t be ready to manage. Add a PeerAuthentication without specifying targets but setting mTLS mode to STRICT or PERMISSIVE in the same namespace as the DestinationRule.

Severity

Error

Example

apiVersion: "networking.istio.io/v1alpha3"
kind: "DestinationRule"
metadata:
  name: "enable-mtls"
  namespace: "bookinfo"
spec:
  host: "*.bookinfo.svc.cluster.local"
  trafficPolicy:
    tls:
      mode: ISTIO_MUTUAL

KIA0207 - PeerAuthentication with TLS strict mode found, it should be permissive

Istio needs both a DestinationRule and PeerAuthentication to enable mTLS communications. The PeerAuthentication configures the authentication method accepted for all the targeted workloads. The DestinationRule defines which is the authentication method that the clients of specific workloads has to start communications with.

Resolution

Kiali has found that there is a DestinationRule sending traffic without mTLS authentication method. There are two different ways to fix this situation. You can either change the PeerAuthentication applying to PERMISSIVE mode or change the DestinationRule to start communications using mTLS.

Severity

Error

Example

apiVersion: "networking.istio.io/v1alpha3"
kind: "DestinationRule"
metadata:
  name: "disable-mtls"
  namespace: "bookinfo"
spec:
  host: "*.bookinfo.svc.cluster.local"
  trafficPolicy:
    tls:
      mode: DISABLE
---
apiVersion: "networking.istio.io/v1alpha3"
kind: "DestinationRule"
metadata:
  name: "enable-mtls"
  namespace: "bookinfo"
spec:
  host: "*.bookinfo.svc.cluster.local"
  trafficPolicy:
    tls:
      mode: ISTIO_MUTUAL
---
apiVersion: "security.istio.io/v1beta1"
kind: "PeerAuthentication"
metadata:
  name: "default"
  namespace: "bookinfo"
spec:
  mtls:
    mode: STRICT

KIA0208 - PeerAuthentication enabling mTLS found, permissive mode needed

Kiali found a DestinationRule starting communications without TLS but there was a PeerAuthentication allowing all services in the mesh to accept only requests in mTLS.

Resolution

There are two ways to fix this situation. You can either change the PeerAuthentication to enable PERMISSIVE mode to all the workloads in the mesh or change the DestinatonRule to enable mTLS instead of disabling it (change the mode to ISTIO_MUTUAL).

Severity

Error

Example

apiVersion: "networking.istio.io/v1alpha3"
kind: "DestinationRule"
metadata:
  name: "default"
  namespace: "bookinfo"
spec:
  host: "*.bookinfo.svc.cluster.local"
  trafficPolicy:
    tls:
      mode: DISABLE
---
apiVersion: "security.istio.io/v1beta1"
kind: "PeerAuthentication"
metadata:
  name: "default"
  namespace: "istio-system"
spec:
  mtls:
    mode: STRICT

KIA0209 - DestinationRule Subset has not labels

A DestinationRule subset without labels may miss the destination endpoint linked with a specific workload.

If there is any other subset with valid labels, the severity of this warning is changed to Info.

Resolution

Validate that a subset is properly configured.

Severity

Warning

K8s Gateway API

Note that with the support of K8s Gateway API, a new mechanism of subsetting is introduced. Which means, that each version of a service should have a separate Service pointing to that particular version. And instead of the usage of DestinationRules, there should be a K8s HTTPRoute object created, referencing to Services per version in it’s rules.

Gateways

KIA0301 - More than one Gateway for the same host port combination

Gateway creates a proxy that forwards the inbound traffic for the exposed ports. If two different gateways expose the same ports for the same host, this creates ambiguity inside Istio as either of these gateways could handle the traffic. This is most likely a configuration error. This check is done across all namespaces the user has access to.

There is one exception: when both gateways points to a different ingress. Then the ambiguity doesn’t exist and, in consequence, no validation is shown. Kiali considers that two gateways points to the same ingress if they share the exact same selector.

Resolution

Remove the duplicate gateway entries or merge the two gateway definitions into a single one.

When one of the duplicate Gateways has a wildcard in hosts, there is an option ‘skip_wildcard_gateway_hosts’ in Kiali CR, by setting it to ’true’, it will ignore Gateways with wildcards in hosts during validation. As Istio considers such a Gateway with a wildcard in hosts as the last in order, after the Gateways with FQDN in hosts.

Severity

Warning

Example

apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
  name: bookinfo-gateway # Validation shown
  namespace: bookinfo
spec:
  selector:
    istio: ingressgateway # use istio default controller
  servers:
  - port:
      number: 80
      name: http
      protocol: HTTP
    hosts:
    - "*"
---
apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
  name: bookinfo-gateway-copy # Validation shown
  namespace: bookinfo2
spec:
  selector:
    istio: ingressgateway # use istio default controller
  servers:
  - port:
      number: 80
      name: http
      protocol: HTTP
    hosts:
    - "*"
---
apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
  name: bookinfo-gateway-diff-ingress # No validations shown
  namespace: bookinfo
spec:
  selector:
    istio: ingressgateway-pub # Using different ingress
  servers:
  - port:
      number: 80
      name: http
      protocol: HTTP
    hosts:
    - "*"
---

KIA0302 - No matching workload found for gateway selector in this namespace

This validation checks the current namespace for matching workloads as this is recommended, and potentially in the future required, by the Istio. Excluded from this check are the default “istio-ingressgateway” and “istio-egressgateway” workloads which are included in Istio by default.

Although your traffic might be correctly routed to a workload in other namespace, this is not a guaranteed behavior and thus a warning is flagged in such cases also.

Resolution

Deploy the missing workload or fix the selector to target a correct location.

Severity

Warning

Example

apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
  name: bookinfo-gateway
  namespace: bookinfo
spec:
  selector:
    app: nonexisting # workload doesn't exist in the namespace
  servers:
  - port:
      number: 80
      name: http
      protocol: HTTP
    hosts:
    - "*"

Mesh Policies

KIA0401 - Mesh-wide Destination Rule enabling mTLS is missing

Istio has the ability to define mTLS communications at mesh level. In order to do that, Istio needs one DestinationRule and one PeerAuthentication. The DestinationRule configures all the clients of the mesh to use mTLS protocol on their connections. The PeerAuthentication defines what authentication methods can be accepted on the workload of the whole mesh. If the DestinationRule is not found or doesn’t exist and the PeerAuthentication is on STRICT mode, all the communication returns 500 errors.

Resolution

Add a DestinationRule with “*.cluster” host and ISTIO_MUTUAL as tls trafficPolicy mode. The DestinationRule should be like this.

Severity

Error

Example

# Make sure there isn't any DestinationRule enabling meshwide mTLS
apiVersion: "security.istio.io/v1beta1"
kind: "PeerAuthentication"
metadata:
  name: "default"
  namespace: "istio-system"
spec:
  mtls:
    mode: STRICT

PeerAuthentication

KIA0501 - Destination Rule enabling namespace-wide mTLS is missing

Istio has the ability to define mTLS communications at namespace level. In order to do that, Istio needs one DestinationRule and one PeerAuthentication. The DestinationRule configures all the clients of the namespace to use mTLS protocol on their connections. The PeerAuthentication defines what authentication methods can be accepted on a specific group of workloads. PeerAuthentications without target field specified will target all the workloads within its namespace. If the DestinationRule is not found or doesn’t exist in the namespace and the namespace-wide PeerAuthentication is on STRICT mode, all the communication will return 500 errors.

Resolution

Add a DestinationRule with “*.namespace.svc.cluster.local” host and ISTIO_MUTUAL as tls trafficPolicy mode. The DestinationRule should be like this.

Severity

Error

Example

# Make sure there isn't any DestinationRule enabling meshwide mTLS
apiVersion: "security.istio.io/v1beta1"
kind: "PeerAuthentication"
metadata:
  name: "default"
  namespace: "bookinfo"
spec:
  mtls:
    mode: STRICT

KIA0505 - Destination Rule disabling namespace-wide mTLS is missing

PeerAuthentication objects are used to define the authentication methods that a set of workloads can accept: Mutual, Istio Mutual, Simple or Disabled.

This validation warns the scenario where there is one PeerAuthentication at namespace level with DISABLE mode but there is DestinationRule at namespace or mesh level enabling mTLS. With this scenario, all the traffic flowing between the services in that namespace will fail.

Resolution

You can either change the namespace/mesh-wide Destination Rule to DISABLE mode or change the current PeerAuthentication to allow mTLS (mode STRICT or PERMISSIVE).

Severity

Error

Example

apiVersion: "security.istio.io/v1beta1"
kind: "PeerAuthentication"
metadata:
  name: "default"
  namespace: "bookinfo"
spec:
  mtls:
    mode: DISABLE
---
apiVersion: "networking.istio.io/v1alpha3"
kind: "DestinationRule"
metadata:
  name: "enable-mtls"
  namespace: bookinfo
spec:
  host: "*.bookinfo.svc.cluster.local"
  trafficPolicy:
    tls:
      mode: ISTIO_MUTUAL

KIA0506 - Destination Rule disabling mesh-wide mTLS is missing

PeerAuthentication objects are used to define the authentication methods that a set of workloads can accept: Mutual, Istio Mutual, Simple or Disabled.

This validation warns the scenario where there is one PeerAuthentication at mesh level with DISABLE mode but there is DestinationRule at mesh level enabling mTLS. With this scenario, all the traffic flowing between the services in that namespace will fail.

Resolution

You can either change the mesh-wide Destination Rule to DISABLE mode or change the current PeerAuthentication to allow mTLS (mode STRICT or PERMISSIVE).

Severity

Error

Example

apiVersion: "security.istio.io/v1beta1"
kind: "PeerAuthentication"
metadata:
  name: "default"
  namespace: "istio-system"
spec:
  mtls:
    mode: DISABLE
---
apiVersion: "networking.istio.io/v1alpha3"
kind: "DestinationRule"
metadata:
  name: "enable-mtls"
  namespace: bookinfo
spec:
  host: "*.local"
  trafficPolicy:
    tls:
      mode: ISTIO_MUTUAL

Ports

KIA0601 - Port name must follow [-suffix] form

Istio requires the service ports to follow the naming form of ‘protocol-suffix’ where the ‘-suffix’ part is optional. If the naming does not match this form (or is undefined), Istio treats all the traffic TCP instead of the defined protocol in the definition. Dash is a required character between protocol and suffix. For example, ‘http2foo’ is not valid, while ‘http2-foo’ is (for http2 protocol).

Resolution

Rename the service port name field to follow the form and the traffic flows correctly.

Severity

Error

Example

apiVersion: v1
kind: Service
metadata:
  name: ratings-java-svc
  namespace: bookinfo
  labels:
    app: ratings
    service: ratings-svc
spec:
  ports:
  - port: 9080
    name: wrong-http
  selector:
    app: ratings-java
    version: v1

KIA0602 - Port appProtocol must follow form

Istio also optionally supports the appProtocol in service ports, following the form of ‘protocol’. When port name field does not contain the protocol the appProtocol field is considered as a protocol. If the naming does not match this form, Istio treats all the traffic TCP instead of the defined protocol in the definition.

Resolution

Rename the service port appProtocol field to follow the form and the traffic flows correctly.

Severity

Error

Example

apiVersion: v1
kind: Service
metadata:
  name: ratings-java-svc
  namespace: bookinfo
  labels:
    app: ratings
    service: ratings-svc
spec:
  ports:
  - port: 3306
    name: database
    appProtocol: wrong-mysql
  selector:
    app: ratings-java
    version: v1

Services

KIA0701 - Deployment exposing same port as Service not found

Service definition has a combination of labels and port definitions that are not matching to any workloads. This means the deployment will be unsuccessful and no traffic can flow between these two resources. The port is read from the Service ‘TargetPort’ definition first and if undefined, the ‘Port’ field is used as Kubernetes defaults the ‘TargetPort’ to ‘Port’. If the ‘TargetPort’ is using a integer, the port numbers are compared and if the ‘TargetPort’ is a string, the deployment’s portName is used for comparison.

Resolution

Fix the port definitions in the workload or in the service definition to ensure they match.

Severity

Warning

Example

Invalid example with port definitions unmatched:

apiVersion: v1
kind: Service
metadata:
  name: ratings-java-svc
  namespace: ratings-java
  labels:
    app: ratings
    service: ratings-svc
spec:
  ports:
  - port: 9080
    name: http
  selector:
    app: ratings-java
    version: v1
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: ratings-java
  namespace: ratings-java
  labels:
    app: ratings-java
    version: v1
spec:
  replicas: 1
  template:
    metadata:
      annotations:
         sidecar.istio.io/inject: "true"
      labels:
        app: ratings-java
        version: v1
    spec:
      containers:
      - name: ratings-java
        image: pilhuhn/ratings-java:f
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 8080

Valid example using targetPort definition matching:

apiVersion: v1
kind: Service
metadata:
  name: ratings-java-svc
  namespace: ratings-java
  labels:
    app: ratings
    service: ratings-svc
spec:
  ports:
  - port: 9080
    targetPort: 8080
    name: http
  selector:
    app: ratings-java
    version: v1
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: ratings-java
  namespace: ratings-java
  labels:
    app: ratings-java
    version: v1
spec:
  replicas: 1
  template:
    metadata:
      annotations:
         sidecar.istio.io/inject: "true"
      labels:
        app: ratings-java
        version: v1
    spec:
      containers:
      - name: ratings-java
        image: pilhuhn/ratings-java:f
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 8080

Sidecars

KIA1004 - This host has no matching entry in the service registry

The Sidecar resources are used for configuring the sidecar proxies in the service mesh. IstioEgressListener specifies the properties of an outbound traffic listener on the sidecar proxy attached to a workload instance.

In the hosts field, there is the list of hosts exposed to the workload. Each host in the list have the namespace/dnsName format where both namespace and dnsName may have non-obvious values. namespace may be either ., ~, * or an actual namespace name. dnsName has to be a FQDN representing a service, virtual service or a service entry. This FQDN may use the wildcard character.

See more information about the syntax of both namespace and dnsName into istio documentation.

Resolution

Make sure there is a service, virtual service or service entry matching with the host.

Severity

Warning

Example

apiVersion: networking.istio.io/v1alpha3
kind: Sidecar
metadata:
  name: servicenotfound
  namespace: bookinfo
spec:
  workloadSelector:
    labels:
      app: reviews
  egress:
  - port:
      number: 3306
      protocol: MYSQL
      name: egressmysql
    captureMode: NONE
    bind: 127.0.0.1
    hosts:
    - "bookinfo/*.bookinfo.svc.cluster.local" # Bookinfo running into bookinfo ns
    - "default/kiali.io" # Service entry present in the namespace
    - "bookinfo/bogus.bookinfo.svc.cluster.local" # Bogus service into bookinfo doesn't exist
    - "bogus-ns/reviews.bookinfo.svc.cluster.local" # Cross-namespace validation: unable to verify validity

KIA1006 - Global default sidecar should not have workloadSelector

The Sidecar resources are used for configuring the sidecar proxies in the service mesh. By default, all the sidecars are configured with the default sidecar instance specified in the control plane namespace (usually istio-system). In case there are sidecar resources in the namespaces where your applications are, this default sidecar resource won’t be considered. The sidecar in your namespace will be applied.

Having workloadSelector in your global default sidecar won’t make any effect in the other sidecars living outside of the control plane namespace.

Resolution

Make sure you don’t have the workloadSelector in this global sidecar resource. In case you need specific settings for specific workloads, move those settings to the sidecar resources in your application namespaces.

Severity

Warning

Example

apiVersion: networking.istio.io/v1alpha3
kind: Sidecar
metadata:
  name: default
  namespace: istio-system
spec:
  workloadSelector: # Default sidecar can't have labels
    labels:
      version: v1
  egress:
  - port:
      number: 3306
      protocol: MYSQL
      name: egressmysql
    captureMode: NONE
    bind: 127.0.0.1
    hosts:
    - "bookinfo/reviews.bookinfo.svc.cluster.local"
    - "bookinfo/details.bookinfo.svc.cluster.local"

KIA1007 - OutboundTrafficPolicy shown with empty mode value is ambiguous

Due to issues with the Istio client and the protobuf library it uses, the way some defaults are handled becomes ambiguous. When a Sidecar resource spec.outboundTrafficPolicy.mode is left unset or is explicitly set to REGISTRY_ONLY, the Kiali UI will show the value as unset (e.g. if nothing is set inside outboundTrafficPolicy, its value will be shown as {}). In this case, you are not guaranteed to know what the value of mode truly is. So in the case where Kiali UI shows mode as empty, you cannot know if Istio will be using a value of REGISTRY_ONLY or ALLOW_ANY.

Resolution

You cannot determine the value of mode using the Kiali UI when this condition arises. You must inspect the Sidecar object using other means to determine the value (e.g. use kubectl get sidecar your-side-car-name -o jsonpath='{.spec.outboundTrafficPolicy.mode}').

Severity

Informational

Example

Both of these Sidecar resources will show mode as unset/empty in the Kiali UI YAML editor:

apiVersion: networking.istio.io/v1beta1
kind: Sidecar
...
spec:
  outboundTrafficPolicy:
    mode: REGISTRY_ONLY

apiVersion: networking.istio.io/v1beta1
kind: Sidecar
...
spec:
  # according to Istio documentation, the default will be ALLOW_ANY
  outboundTrafficPolicy: {}

VirtualServices

KIA1101 - DestinationWeight on route doesn’t have a valid service (host not found)

VirtualService routes matching requests to a service inside your mesh. Routing can also match a subset of traffic to a certain version of it for example. Any service inside the mesh must be targeted by its name, the IP address are only allowed for hosts defined through a Gateway. Host must be in a short name or FQDN format. Short name will evaluate to VS’ namespace, regardless of where the actual service might be placed.

Warning

Example

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: reviews
spec:
  hosts:
    - reviews
  http:
  - route:
    - destination:
        host: reviews
        subset: v1
      weight: 90
    - destination:
        host: reviews
        subset: v2
      weight: 10
---
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: reviews-cp
spec:
  hosts:
    - reviews
  http:
  - route:
    - destination:
        host: reviews
        subset: v1
      weight: 90
    - destination:
        host: reviews
        subset: v2
      weight: 10

KIA1107 - Subset not found

VirtualService routes matching requests to a service inside your mesh. Routing can also match a subset of traffic to a certain version of it for example. The subsets referred in a VirtualService have to be defined in one DestinationRule.

If one route in the VirtualService points to a subset that doesn’t exist Istio won’t be able to send traffic to a service.

Resolution

Fix the routes that points to a non existing subsets. It might be fixing a typo in the subset’s name or defining the missing subset in a DestinationRule.

Severity

Error

Example

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: reviews
spec:
  hosts:
    - reviews
  http:
  - route:
    - destination:
        host: reviews
        subset: nosubset
      weight: 90
    - destination:
        host: reviews
        subset: v2
      weight: 10

KIA1108 - Preferred nomenclature: /

A virtual service may include a list of gateways which the defined routes should be applied to. Gateways in other namespaces may be referred to by /; specifying a gateway with no namespace qualifier is the same as specifying the VirtualService’s namespace.

Resolution

Move the nomenclature of the gateways into the supported Istio form: /

Example

kind: VirtualService
apiVersion: networking.istio.io/v1alpha3
metadata:
  name: bookinfo
  namespace: bookinfo
spec:
  hosts:
    - '*'
  gateways:
    - bookinfo-gateway.bookinfo.svc.cluster.local # unsupported format
    - bookinfo/bookinfo-gateway # works
  http:
    - match:
        - uri:
            exact: /productpage
        - uri:
            prefix: /static
        - uri:
            exact: /login
        - uri:
            exact: /logout
        - uri:
            prefix: /api/v1/products
      route:
        - destination:
            host: productpage
            port:
              number: 9080
---
kind: Gateway
apiVersion: networking.istio.io/v1alpha3
metadata:
  name: bookinfo-gateway
  namespace: bookinfo
spec:
  servers:
    - hosts:
        - '*'
      port:
        name: http
        number: 80
        protocol: HTTP
  selector:
    istio: ingressgateway

WorkloadEntries

KIA1201 - Missing one or more addresses from matching WorkloadEntries

This validation shows, that the address field’s value of Workload Entry is not matching to any address of Service Entry.

Resolution

Add missing Service Entry which address will match the Workload Entry’s address.

Severity

Warning

Example

apiVersion: networking.istio.io/v1beta1
kind: WorkloadEntry
metadata:
  name: ratings-v1
  namespace: bookinfo
spec:
  serviceAccount: ratings-vm
  address: 3.3.3.3
  labels:
    app: ratings
    version: v1
  ports:
    http: 9080
---
apiVersion: networking.istio.io/v1beta1
kind: ServiceEntry
metadata:
  name: ratings-unmatching-address
  namespace: bookinfo
spec:
  addresses:
    - 4.4.4.4 # This IP is not in any WorkloadEntry. It needs 3.3.3.3 to work.
  hosts:
    - ratings
  location: MESH_INTERNAL
  resolution: STATIC
  ports:
    - number: 9080
      name: http
      protocol: HTTP
      targetPort: 9080
  workloadSelector:
    labels:
      app: ratings-unmatching

K8s Routes

KIA1401 - Route is pointing to a non-existent or inaccessible K8s gateway

Gateway API Protocol Route could be pointing to a [k8s] Gateway that the Route wants to be attached to. When the namespace field is not specified it takes Gateways from the current Route’s namespace. Here the error indicates that the referenced Gateway is not found in the provided namespace. If the [k8s] Gateway is in another namespace, then that gateway should be shared to the Route’s namespace. The Gateway API supports cross-namespace routing, allowing Gateways and Routes to be deployed into different namespaces with routes attaching to Gateways across namespace boundaries.

Resolution

Fix the parentRefs field to target to an existing gateway. Or share the Gateway to the namespace where the Route is located.

Severity

Error

Example

kind: HTTPRoute
apiVersion: gateway.networking.k8s.io/v1alpha2
metadata:
  name: httproute
  namespace: bookinfo
spec:
  parentRefs:
    - group: gateway.networking.k8s.io
      kind: Gateway
      namespace: istio-system
      name: gatewayapi
  hostnames:
    - details

KIA1402 - Reference doesn’t have a valid service (Service name not found)

Gateway API Route could be pointing to a Service inside your mesh the Route sends the traffic to. A Service name should be specified, not a hostname. This Service can be a certain version of a parent Service, but in that case a separate Service is required to be created. When the namespace field is not specified it takes Service from the current Route’s namespace. In a case of referencing to a Service from remote namespace, a ReferenceGrant object needs to be created to enable cross namespace references. Here the error indicates that the referenced Service is not found in the provided namespace or the ReferenceGrant is missing (in a case of remote namespace).

Resolution

Correct the backendRefs name to point to a correct Service (in this namespace or to other namespaces):

Deploy the missing Service to the mesh, create a ReferenceGrant object in a case of remote namespace.
Or remove the configuration linking to that non-existing Service.

Severity

Error

Example

kind: HTTPRoute
apiVersion: gateway.networking.k8s.io/v1alpha2
metadata:
  name: httproute
  namespace: bookinfo
spec:
  parentRefs:
    - group: gateway.networking.k8s.io
      kind: Gateway
      namespace: istio-system
      name: gatewayapi
  hostnames:
    - reviews
  rules:
    - matches:
        - path:
            type: PathPrefix
            value: /get
      backendRefs:
        - group: ''
          kind: Service
          name: reviews-v1
          namespace: default
          port: 9080
---
apiVersion: gateway.networking.k8s.io/v1beta1
kind: ReferenceGrant
metadata:
  name: refgrant
  namespace: default
spec:
  from:
  - group: gateway.networking.k8s.io
    kind: HTTPRoute
    namespace: bookinfo
  to:
  - group: ""
    kind: Service

Workloads

KIA1301 - This workload is not covered by any authorization policy

Istio Authorization Policy enables access control on workloads in the mesh. Auth Policy selector will match with workloads in the same namespace as the authorization policy. If the authorization policy is in the root namespace, the selector will additionally match with workloads in all namespaces. This validation shows, that the selector match did not happen.

Resolution

Add Autorization Policy which selector matches with Workload’s label selector.

Severity

Warning

Warning

Generic

KIA0002 - More than one selector-less object in the same namespace

This validation refers to the usage of the selector. Selector-less Istio objects are those objects that don’t have the selector field specified. Therefore, objects that apply to all the workloads of a namespace (or whole mesh if the namespace is the same as the control plane namespace).

This validation warns you that you have two different objects living in the same namespace. This may leave an non-deterministic or unexpected behavior on the workloads of the namespace.

Resolution

The natural solution is to merge both objects. In case there are different behaviors you want to apply, consider to define the selector field targeting a specific set of workloads.

Severity

Error

Example

apiVersion: "security.istio.io/v1beta1"
kind: "PeerAuthentication"
metadata:
  name: "default"
  namespace: "bookinfo"
spec:
  mtls:
    mode: STRICT
---
apiVersion: "security.istio.io/v1beta1"
kind: "PeerAuthentication"
metadata:
  name: "duplicate"
  namespace: "bookinfo"
spec:
  mtls:
    mode: STRICT

KIA0003 - More than one object applied to the same workload

This validation refers to the usage of the selector. In this field are defined the labels of the workloads that this object will be applied to. It might be one or more workloads in the same namespace.

Choose existing and accessible namespace to export to.

Severity

Error

K8s Gateway

KIA1501 - More than one K8s Gateway for the same host and port combination

A k8s Gateway defines a point where traffic can be translated to Services, within the cluster. This is defined through listeners or addresses. This validation warns when finding multiple Listener definitions for the same port and host combination, in different k8s Gateways. In this case the traffic handling can be in conflict.

Warning

GWAPI

Workload Groups

KIA1701 - Service Account not found in this namespace

WorkloadGroup describes a collection of workload instances. It enables specifying the properties of a single workload for bootstrap and provides a template for WorkloadEntry, using the specified service account from the same namespace.

A validation Warning message on a template means that, while the specified serviceAccount may exist, it is not referenced by any Pod in the same namespace.

Resolution

Correct the template to refer to an existing Service Account from the same namespace, make sure that the value is in correct format without a typo, and make sure at least one Pod references the Service Account.

Severity

Warning

Example

apiVersion: networking.istio.io/v1
kind: WorkloadGroup
metadata:
  name: ratings-vm
  namespace: bookinfo
  labels:
    app: ratings-vm
spec:
  template:
    serviceAccount: default

KIA1702 - More than one Workload Group with duplicate labels found in the same namespace

The set of labels from Workload Group spec metadata will be associated with each workload instance during the bootstrap process.

A validation Warning message means that the labels set in this Workload Group spec metadata are also used in other Workload Group within this namespace.

Resolution

Set a unique set of labels.

Severity

Warning

4 - OSSMC

OpenShift Service Mesh Console

4.1 - OSSMC User Guide

User Guide providing a quick tour of OSSMC functionality

The OpenShift Service Mesh Console (OSSMC) is an extension to the OpenShift Console which provides visibility into your Service Mesh. With the OSSMC plugin installed, a new Service Mesh menu category is available in the navigation menu on the left side of the web console, as well as new Service Mesh tabs that enhance the existing Workloads and Services OpenShift console detail pages.

The features of the OSSMC plugin are the same as those of the standalone Kiali Console, but the pages are organized differently to better integrate with the OpenShift console. The OSSMC plugin does not replace the Kiali Console, and after installing the OSSMC plugin, you can still access the standalone Kiali Console. This User Guide, however, will discuss the extensions you see from within the OpenShift Console itself.

The OSSMC only supports a single tenant today. Whether that tenant is configured to access only a subset of OpenShift projects or has access cluster-wide to all projects does not matter, however, only a single tenant can be accessed.

Overview

The Overview page provides a summary of your mesh by showing cards representing the namespaces participating in the mesh. Each namespace card has summary metric graphs and additional health details. There are links in the cards that take you to other pages within OSSMC.

Overview

Traffic Graph

The Traffic Graph page provides the full topology view of your mesh. The mesh is represented by nodes and edges - each node representing a component of the mesh and each edge representing traffic flowing through the mesh between components.

Graph

Istio Config

The Istio Config page provides a list of all Istio configuration files in your mesh with a column that provides a quick way to know if the configuration for each resource is valid.

Istio Config

Mesh

The Mesh page provides detailed information about the Istio infrastructure status. It shows an infrastructure topology view with core and add-on components, their health, and how they are connected to each other.

Istio Config

Workload

The Workloads view has a tab Service Mesh that provides a lot of mesh-related detail for the selected workload. The details are grouped into several sub-tabs: Overview, Traffic, Logs, Inbound Metrics, Outbound Metrics, Traces, and Envoy.

Workload: Overview

The Workload: Overview sub-tab provides a summary of the selected workload including a localized topology graph showing the workload with all inbound and outbound edges and nodes.

Workload: Overview

Workload: Traffic

The Workload: Traffic sub-tab provides information about all inbound and outbound traffic to the workload.

Workload: Traffic

Workload: Logs

The Workload: Logs sub-tab provides the logs for the workload’s containers. You can view container logs individually or in a unified fashion, ordered by log time. This is especially helpful to see how the Envoy sidecar proxy logs relate to your workload’s application logs. You can enable the tracing span integration which then allows you to see which logs correspond to trace spans.

Workload: Logs

Workload: Metrics

You can see both inbound and outbound metric graphs in the corresponding sub-tabs. All the workload metrics can be displayed here, providing you with a detail view of the performance of your workload. You can enable the tracing span integration which allows you to see which spans occurred at the same time as the metrics. You can then click on a span marker in the graph to view the specific spans associated with that timeframe.

Workload: Metrics

Workload: Traces

The Traces sub-tab provides a chart showing the trace spans collected over the given timeframe. Click on a bubble to drill down into those trace spans; the trace spans can provide you the most low-level detail within your workload application, down to the individual request level.

Workload: Traces

The trace details view will give further details, including heatmaps that provide you with a comparison of one span in relation to other requests and spans in the same timeframe.

Workload: Traces Details

If you hover over a cell in a heatmap, a tooltip will give some details on the cell data:

Workload: Traces Heatmap

When the OpenShift tracing UI plugin is enabled, Kiali will try to auto discover the plugin settings and the View in Tracing Kiali link will redirect to the plugin (for Kiali 2.8.0+). If the plugin config needs to be adjusted, the following settings should be included in the plugin-conf ConfigMap:

{
  ...
  "observability": {
    "instance": "sample",
    "namespace": "tempo",
    "tenant": "default"
  }
}

Workload: Envoy

The Envoy sub-tab provides information about the Envoy sidecar configuration. This is useful when you need to dig down deep into the sidecar configuration when debugging things such as connectivity issues.

Workload: Envoy

Services

The Services view has a tab Service Mesh that provides mesh-related detail for the selected service. The details are grouped into several sub-tabs: Overview, Traffic, Inbound Metrics, Traces. These sub-tabs are similar in nature as the Workload sub-tabs with the same names and serve the same functions.

Services: Overview

Projects

The Projects view has a tab Service Mesh that provides traffic graph information about that project. It is the same information shown in the Traffic Graph page but specific to that project.

Projects: Overview

5 - Tutorials

Kiali Tutorials.

The following tutorials are designed to help users understand how to use Istio with Kiali, features, configuration, etc. They are highly recommended!

5.1 - Kiali and Grafana Tempo Query integration

Learn how to setup Kiali with Grafana Tempo Query.

This tutorial goes throw the process to setup up Grafana Tempo Query as the Kiali default distribution tracing system.

5.1.1 - Introduction

Kiali and Tempo integration introduction and prerequisites

Introduction

Kiali uses Jaeger as a default distributed tracing backend. In this tutorial, we will replace it for Grafana Tempo.

We will setup a local environment in minikube, and install Kiali with Tempo as a distributed backend. This is a simplified architecture diagram:

Kiali Tempo Architecture

We will install Tempo with the Tempo Operator and enable Jaeger query frontend to be compatible with Kiali in order to query traces.
We will setup Istio to send traces to the Tempo collector using the zipkin protocol. It is enabled by default from version 3.0 or higher of the Tempo Operator.
We will install MinIO and setup it up as object store, S3 compatible.

Environment

We use the following environment:

Istio 1.18.1
Kiali 1.72
Minikube 1.30
Tempo operator TempoStack v3.0

There are different installation methods for Grafana Tempo, but in this tutorial we will use the Tempo operator.

5.1.2 - Kiali and Tempo setup

Steps to install Kiali and Grafana Tempo

We will start minikube:

minikube start

It is a requirement to have cert-manager installed:

kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.12.0/cert-manager.yaml

Install the operator. It is important to download a version 3.0 or higher. In previous versions, the zipkin collector was not exposed, there was no way to change it as it was not defined in the CRD.

kubectl apply -f https://github.com/grafana/tempo-operator/releases/download/v0.3.0/tempo-operator.yaml

We will create the tempo namespace:

kubectl create namespace tempo

We will deploy minio, this is a sample minio.yaml:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  # This name uniquely identifies the PVC. Will be used in deployment below.
  name: minio-pv-claim
  labels:
    app: minio-storage-claim
spec:
  # Read more about access modes here: http://kubernetes.io/docs/user-guide/persistent-volumes/#access-modes
  accessModes:
    - ReadWriteOnce
  resources:
    # This is the request for storage. Should be available in the cluster.
    requests:
      storage: 5Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: minio
spec:
  selector:
    matchLabels:
      app: minio
  strategy:
    type: Recreate
  template:
    metadata:
      labels:
        # Label is used as selector in the service.
        app: minio
    spec:
      # Refer to the PVC created earlier
      volumes:
        - name: storage
          persistentVolumeClaim:
            # Name of the PVC created earlier
            claimName: minio-pv-claim
      initContainers:
        - name: create-buckets
          image: busybox:1.28
          command:
            - "sh"
            - "-c"
            - "mkdir -p /storage/tempo-data"
          volumeMounts:
            - name: storage # must match the volume name, above
              mountPath: "/storage"
      containers:
        - name: minio
          # Pulls the default Minio image from Docker Hub
          image: minio/minio:latest
          args:
            - server
            - /storage
            - --console-address
            - ":9001"
          env:
            # Minio access key and secret key
            - name: MINIO_ACCESS_KEY
              value: "minio"
            - name: MINIO_SECRET_KEY
              value: "minio123"
          ports:
            - containerPort: 9000
            - containerPort: 9001
          volumeMounts:
            - name: storage # must match the volume name, above
              mountPath: "/storage"
---
apiVersion: v1
kind: Service
metadata:
  name: minio
spec:
  type: ClusterIP
  ports:
    - port: 9000
      targetPort: 9000
      protocol: TCP
      name: api
    - port: 9001
      targetPort: 9001
      protocol: TCP
      name: console
  selector:
    app: minio

And apply the yaml:

kubectl apply -n tempo -f minio.yaml

We will create a secret to access minio:

kubectl create secret generic -n tempo tempostack-dev-minio \
--from-literal=bucket="tempo-data" \
--from-literal=endpoint="http://minio:9000" \
--from-literal=access_key_id="minio" \
--from-literal=access_key_secret="minio123"

Install Grafana tempo with the operator. We will use the secret created in the previous step:

kubectl apply -n tempo -f - <<EOF
apiVersion: tempo.grafana.com/v1alpha1
kind: TempoStack
metadata:
  name: smm
spec:
  storageSize: 1Gi
  storage:
    secret:
      type: s3
      name: tempostack-dev-minio
  resources:
    total:
      limits:
        memory: 2Gi
        cpu: 2000m
  template:
    queryFrontend:
      jaegerQuery:
        enabled: true
        ingress:
          type: ingress
EOF

As an optional step, we can check if all the deployments have started correctly, and the services distributor has the port 9411 and the query frontend 16686:

kubectl get all -n tempo

Tempo Services

(Optional) We can test if minio is working with a batch job to send some traces, in this case, to the open telemetry collector:

kubectl apply -f - <<EOF
apiVersion: batch/v1
kind: Job
metadata:
  name: tracegen
spec:
  template:
    spec:
      containers:
        - name: tracegen
          image: ghcr.io/open-telemetry/opentelemetry-collector-contrib/tracegen:latest
          command:
            - "./tracegen"
          args:
            - -otlp-endpoint=tempo-smm-distributor.tempo.svc.cluster.local:4317
            - -otlp-insecure
            - -duration=30s
            - -workers=1
      restartPolicy: Never
  backoffLimit: 4
EOF

And access the minio console:

kubectl port-forward --namespace istio-system service/minio 9001:9001

MinIO console

Install Istio with helm (Option I)

Istio can be installed with Helm following the instructions. The zipkin address needs to be set:

--set values.meshConfig.defaultConfig.tracing.zipkin.address="tempo-smm-distributor.tempo:9411"

And then, install Jaeger as Istio addon.

Install Istio using Kiali source code (Option II)

For development purposes, if we have Kiali source code, we can use the kiali hack scripts:

hack/istio/install-istio-via-istioctl.sh -c kubectl -a "prometheus grafana" -s values.meshConfig.defaultConfig.tracing.zipkin.address="tempo-smm-distributor.tempo:9411"

Install Kiali and bookinfo demo with some traffic generation

Install kiali:

helm install \
    --namespace istio-system \
    --set external_services.tracing.internal_url=http://tempo-smm-query-frontend.tempo:16685 \
    --set external_services.tracing.external_url=http://localhost:16686 \
    --set auth.strategy=anonymous \
    kiali-server \
    kiali/kiali-server

Install bookinfo with traffic generator

curl -L -o install-bookinfo.sh https://raw.githubusercontent.com/kiali/kiali/master/hack/istio/install-bookinfo-demo.sh
chmod +x install-bookinfo.sh
./install-bookinfo.sh -c kubectl -tg -id ${ISTIO_DIR}

And access Kiali:

kubectl port-forward svc/kiali 20001:20001 -n istio-system

Kiali Tempo Traces

5.2 - Travels Demo - Multicluster

Learn how to configure and use Kiali in an Istio multicluster scenario.

This tutorial will demonstrate Kiali capabilities for Istio multicluster, particulary for the primary-remote cluster model.

For more information, check our documentation for multicluster.

5.2.1 - Introduction

Observe the Travels application deployed in multiple clusters with the new capabilities of Kiali.

So far, we know how good Kiali can be to understand applications, their relationships with each other and with external applications.

In the previous tutorial, Kiali was setup to observe just a single cluster. Now, we will expand its capabilities to observe more than one cluster. The extra clusters are remotes, meaning that there is not a control plane on them, they only have user applications.

This topology is called primary-remote and it is very useful to spread applications into different clusters having just one primary cluster, which is where Istio and Kiali are installed.

This scenario is a good choice when as an application administrator or architect, you want to give a different set of clusters to different sets of developers and you also want all these applications to belong to the same mesh. This scenario is also very helpful to give applications high availability capabilities while keeping the observability together (we are referring to just applications in terms of high availability, for Istio, we might want to install a multi-primary deployment model, which is on the roadmap for the multicluster journey for Kiali).

In this tutorial we will be deploying Istio in a primary-remote deployment. At first, we will install the “east” cluster with Istio, then we will add the “west” remote cluster and join it to the mesh. Then we will see how Kiali allows us to observe and manage both clusters and their applications. Metrics will be aggregated into the “east” cluster using Prometheus federation and a single Kiali will be deployed on the “east” cluster.

If you already have a primary-remote deployment, you can skip to instaliing Kiali.

5.2.2 - Prerequisites

How to prepare for running the tutorial.

This tutorial is a walkthrough guide to install everything. For this reason, we will need:

minikube
istioctl
helm

This tutorial was tested on:

Minikube v1.30.1
Istio v1.18.1
Kiali v1.70

Clusters are provided by minikube instances, but this tutorial should work on on any Kubernetes environment.

We will set up some environment variables for the following commands:

CLUSTER_EAST="east"
CLUSTER_WEST="west"
ISTIO_DIR="absolute-path-to-istio-folder"

As Istio will be installed on more than one cluster and needs to communicate between clusters, we need to create certificates for the Istio installation. We will follow the Istio documentation related to certificates to achieve this:

mkdir -p certs
pushd certs

make -f $ISTIO_DIR/tools/certs/Makefile.selfsigned.mk root-ca

make -f $ISTIO_DIR/tools/certs/Makefile.selfsigned.mk $CLUSTER_EAST-cacerts
make -f $ISTIO_DIR/tools/certs/Makefile.selfsigned.mk $CLUSTER_WEST-cacerts

popd

The result is two certificates for then use when installing Istio in the future.

5.2.3 - Deploy East cluster

Deploy the East cluster which will be the primary cluster

Run the following commands to deploy the first cluster:

minikube start -p $CLUSTER_EAST --network istio --memory 8g --cpus 4

For both clusters, we need to configure MetalLB, which is a load balancer. This is because we need to assign an external IP to the required ingress gateways to enable cross cluster communication between Istio and the applications installed.

minikube addons enable metallb -p $CLUSTER_EAST

We set up some environment variables with IP ranges that MetalLB will then assign to the services:

MINIKUBE_IP=$(minikube ip -p $CLUSTER_EAST)
MINIKUBE_IP_NETWORK=$(echo $MINIKUBE_IP | sed -E 's/([0-9]+\.[0-9]+\.[0-9]+)\.[0-9]+/\1/')
MINIKUBE_LB_RANGE="${MINIKUBE_IP_NETWORK}.20-${MINIKUBE_IP_NETWORK}.29"

cat <<EOF | kubectl --context $CLUSTER_EAST apply -f -
apiVersion: v1
kind: ConfigMap
metadata:
  namespace: metallb-system
  name: config
data:
  config: |
    address-pools:
    - name: default
      protocol: layer2
      addresses: [${MINIKUBE_LB_RANGE}]
EOF

We should have the first cluster deployed and ready to use.

5.2.4 - Install Istio on East cluster

Install Istio on the primary cluster

The east cluster is the primary one, consequently is where the istiod process will be installed alongside other applications like Kiali.

Run the following commands to install Istio:

kubectl create namespace istio-system --context $CLUSTER_EAST

kubectl create secret generic cacerts -n istio-system --context $CLUSTER_EAST \
      --from-file=certs/$CLUSTER_EAST/ca-cert.pem \
      --from-file=certs/$CLUSTER_EAST/ca-key.pem \
      --from-file=certs/$CLUSTER_EAST/root-cert.pem \
      --from-file=certs/$CLUSTER_EAST/cert-chain.pem

kubectl --context=$CLUSTER_EAST label namespace istio-system topology.istio.io/network=network1

cat <<EOF > $CLUSTER_EAST.yaml
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
spec:
  values:
    global:
      meshID: mesh1
      multiCluster:
        clusterName: $CLUSTER_EAST
      network: network1
EOF

istioctl install -y --set values.pilot.env.EXTERNAL_ISTIOD=true --context=$CLUSTER_EAST -f $CLUSTER_EAST.yaml

After the installation, we need to create what we called an “east-west” gateway. It’s an ingress gateway just for the cross cluster configuration as we are opting to use the installation for different networks (this will be the case in the majority of the production scenarios).

$ISTIO_DIR/samples/multicluster/gen-eastwest-gateway.sh \
    --mesh mesh1 --cluster $CLUSTER_EAST --network network1 | \
    istioctl --context=$CLUSTER_EAST install -y -f -

Then, we need to expose the istiod service as well as the applications for the cross cluster communication:

kubectl apply --context=$CLUSTER_EAST -n istio-system -f \
    $ISTIO_DIR/samples/multicluster/expose-istiod.yaml

kubectl --context=$CLUSTER_EAST apply -n istio-system -f \
    $ISTIO_DIR/samples/multicluster/expose-services.yaml

export DISCOVERY_ADDRESS=$(kubectl \
    --context=$CLUSTER_EAST \
    -n istio-system get svc istio-eastwestgateway \
    -o jsonpath='{.status.loadBalancer.ingress[0].ip}')

Finally, we need to install Prometheus, which is important and required for Kiali to operate:

kubectl --context $CLUSTER_EAST -n istio-system apply -f $ISTIO_DIR/samples/addons/prometheus.yaml

5.2.5 - Install Kiali

Install Kiali on the primary cluster

Run the following command to install Kiali using the Kiali operator:

kubectl config use-context $CLUSTER_EAST

helm upgrade --install --namespace istio-system --set auth.strategy=anonymous --set deployment.logger.log_level=debug --set deployment.ingress.enabled=true --repo https://kiali.org/helm-charts kiali-server kiali-server

Verify that Kiali is running with the following command:

istioctl dashboard kiali

There are other alternatives to expose Kiali or other Addons in Istio. Check Remotely Accessing Telemetry Addons for more information.

5.2.6 - Install Travels on East cluster

Install the Travels application just on East cluster

Run the following commands to install Travels application on east cluster:

kubectl create namespace travel-agency --context $CLUSTER_EAST
kubectl create namespace travel-portal --context $CLUSTER_EAST
kubectl create namespace travel-control --context $CLUSTER_EAST

kubectl label namespace travel-agency istio-injection=enabled --context $CLUSTER_EAST
kubectl label namespace travel-portal istio-injection=enabled --context $CLUSTER_EAST
kubectl label namespace travel-control istio-injection=enabled --context $CLUSTER_EAST

kubectl apply -f <(curl -L https://raw.githubusercontent.com/kiali/demos/master/travels/travel_agency.yaml) -n travel-agency --context $CLUSTER_EAST
kubectl apply -f <(curl -L https://raw.githubusercontent.com/kiali/demos/master/travels/travel_portal.yaml) -n travel-portal --context $CLUSTER_EAST
kubectl apply -f <(curl -L https://raw.githubusercontent.com/kiali/demos/master/travels/travel_control.yaml) -n travel-control --context $CLUSTER_EAST

After the installation, we can see that the Travels application is running on the east cluster:

Overview

It is important to note that Kiali only observes one istio-system namespace as we did not configure it for multicluster yet.

Go to the Graph page and select the three namespaces related to the Travels demo in the namespace dropdown menu. This shows you the in-cluster traffic:

Graph

So far, we installed everything on one cluster, similarly to the Travels tutorial for a single cluster.

Now we will expand this topology to include a remote cluster. As we commented this situation can be very common in a production scenario, either because we might want to split some applications into different clusters, generally because they are maintained by different developers or for high availability or just making applications available in other zones to reduce latencies.

5.2.7 - Deploy West cluster

Deploy the West cluster which will be the remote cluster

Run the following commands to deploy the second cluster:

minikube start -p $CLUSTER_WEST --network istio --memory 8g --cpus 4

Similar to the east cluster, we configure MetalLB:

minikube addons enable metallb -p $CLUSTER_WEST

MINIKUBE_IP=$(minikube ip -p $CLUSTER_WEST)
MINIKUBE_IP_NETWORK=$(echo $MINIKUBE_IP | sed -E 's/([0-9]+\.[0-9]+\.[0-9]+)\.[0-9]+/\1/')
MINIKUBE_LB_RANGE="${MINIKUBE_IP_NETWORK}.30-${MINIKUBE_IP_NETWORK}.39"

cat <<EOF | kubectl --context $CLUSTER_WEST apply -f -
apiVersion: v1
kind: ConfigMap
metadata:
  namespace: metallb-system
  name: config
data:
  config: |
    address-pools:
    - name: default
      protocol: layer2
      addresses: [${MINIKUBE_LB_RANGE}]
EOF

5.2.8 - Install Istio on West cluster

Install Istio on the remote cluster

This installation will be different as this cluster will be a remote. In a remote cluster, it won’t be an Istio control plane. Istio will install some resources that allows the primary control plane to configure the workloads in the remote cluster like injecting the sidecars and configuring the low level routing.

kubectl create namespace istio-system --context $CLUSTER_WEST

kubectl create secret generic cacerts -n istio-system --context $CLUSTER_WEST \
      --from-file=certs/$CLUSTER_WEST/ca-cert.pem \
      --from-file=certs/$CLUSTER_WEST/ca-key.pem \
      --from-file=certs/$CLUSTER_WEST/root-cert.pem \
      --from-file=certs/$CLUSTER_WEST/cert-chain.pem

kubectl --context=$CLUSTER_WEST annotate namespace istio-system topology.istio.io/controlPlaneClusters=$CLUSTER_EAST
kubectl --context=$CLUSTER_WEST label namespace istio-system topology.istio.io/network=network2

cat <<EOF > $CLUSTER_WEST.yaml
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
spec:
  profile: remote
  values:
    istiodRemote:
      injectionPath: /inject/cluster/$CLUSTER_WEST/net/network2
    global:
      remotePilotAddress: ${DISCOVERY_ADDRESS}
EOF

istioctl install -y --context=$CLUSTER_WEST -f $CLUSTER_WEST.yaml

We will also install a Prometheus instance on the remote. We will federate both Prometheus, with the east’s one being the place where all metrics will be gathered together:

kubectl apply -f $ISTIO_DIR/samples/addons/prometheus.yaml --context $CLUSTER_WEST

An important step is to create a secret on east cluster allowing it to fetch information of the remote cluster:

istioctl x create-remote-secret \
    --context=$CLUSTER_WEST \
    --name=$CLUSTER_WEST | \
    kubectl apply -f - --context=$CLUSTER_EAST

Finally, we create the east-west gateway

$ISTIO_DIR/samples/multicluster/gen-eastwest-gateway.sh \
    --mesh mesh1 --cluster $CLUSTER_WEST --network network2 | \
    istioctl --context=$CLUSTER_WEST install -y -f -

Prometheus federation

Kiali requires unified metrics from a single Prometheus endpoint for all clusters, even in a multi-cluster environment. In this tutorial, we will federate the two Prometheus instances, meaning that all the remote’s metrics should be fetched by the main Prometheus.

We will configure east’s Prometheus to fetch west’s metrics:

kubectl patch svc prometheus -n istio-system --context $CLUSTER_WEST -p "{\"spec\": {\"type\": \"LoadBalancer\"}}"

WEST_PROMETHEUS_ADDRESS=$(kubectl --context=$CLUSTER_WEST -n istio-system get svc prometheus -o jsonpath='{.status.loadBalancer.ingress[0].ip}')

curl -L -o prometheus.yaml https://raw.githubusercontent.com/kiali/kiali/master/hack/istio/multicluster/prometheus.yaml

sed -i "s/WEST_PROMETHEUS_ADDRESS/$WEST_PROMETHEUS_ADDRESS/g" prometheus.yaml

kubectl --context=$CLUSTER_EAST apply -f prometheus.yaml -n istio-system

5.2.9 - Configure Kiali for multicluster

In this section we will add some configuration for Kiali to start observing the remote cluster.

We will configure Kiali to access the remote cluster. This will require a secret (similar to the Istio secret) containing the credentials for Kiali to fetch information from the remote cluster:

curl -L -o kiali-prepare-remote-cluster.sh https://raw.githubusercontent.com/kiali/kiali/master/hack/istio/multicluster/kiali-prepare-remote-cluster.sh

chmod +x kiali-prepare-remote-cluster.sh

./kiali-prepare-remote-cluster.sh --kiali-cluster-context $CLUSTER_EAST --remote-cluster-context $CLUSTER_WEST

Finally, upgrade the installation for Kiali to pick up the secret:

kubectl config use-context $CLUSTER_EAST

helm upgrade --install --namespace istio-system --set auth.strategy=anonymous --set deployment.logger.log_level=debug --set deployment.ingress.enabled=true --repo https://kiali.org/helm-charts kiali-server kiali-server

As result, we can quickly see that a new namespace appear in the Overview, the istio-system namespace from west cluster:

Kiali MC

5.2.10 - Install Travels on West cluster

Install new services of the Travels application in the remote cluster.

We are going to deploy two new services just to distribute traffic on the new cluster. These services are travels v2 and v3:

kubectl create ns travel-agency --context $CLUSTER_WEST

kubectl label namespace travel-agency istio-injection=enabled --context $CLUSTER_WEST

kubectl apply -f <(curl -L https://raw.githubusercontent.com/kiali/demos/master/travels/travels-v2.yaml) -n travel-agency --context $CLUSTER_WEST
kubectl apply -f <(curl -L https://raw.githubusercontent.com/kiali/demos/master/travels/travels-v3.yaml) -n travel-agency --context $CLUSTER_WEST

cat <<EOF | kubectl -n travel-agency --context $CLUSTER_WEST apply -f -
apiVersion: v1
kind: Service
metadata:
  name: travels
  labels:
    app: travels
spec:
  ports:
    - name: http
      port: 8000
  selector:
    app: travels
---
apiVersion: v1
kind: Service
metadata:
  name: insurances
  labels:
    app: insurances
spec:
  ports:
    - name: http
      port: 8000
  selector:
    app: insurances
---
apiVersion: v1
kind: Service
metadata:
  name: hotels
  labels:
    app: hotels
spec:
  ports:
    - name: http
      port: 8000
  selector:
    app: hotels
---
apiVersion: v1
kind: Service
metadata:
  name: flights
  labels:
    app: flights
spec:
  ports:
    - name: http
      port: 8000
  selector:
    app: flights
---
apiVersion: v1
kind: Service
metadata:
  name: discounts
  labels:
    app: discounts
spec:
  ports:
    - name: http
      port: 8000
  selector:
    app: discounts
---
apiVersion: v1
kind: Service
metadata:
  name: cars
  labels:
    app: cars
spec:
  ports:
    - name: http
      port: 8000
  selector:
    app: cars
EOF

After the installation, we can see that traffic is flowing to the remote cluster too:

Travels MC

This is happening automatically, Istio balances the traffic to both services. The key thing to notice here is that there is a concept called namespace sameness in Istio that is very important when planning our multicluster setup.

In both clusters, we can see that we have the same namespaces. They are called the same in both. Also, we can see that the services in both clusters need to exist and be called the same.

When we created the west’s namespaces, they are called the same, and also notice that even if we do not have instances of insurances or cars, we created the services. This is because travel services from the cluster will try to communicate with these services, not caring at all if the applications are in the west or east cluster. Istio will handle the routing in the back.

From this moment, we can start playing with Kiali to introduce some scenarios previously seen in the Travels tutorial.

5.3 - Travels Demo Tutorial

Learn how to use Kiali to configure, observe and manage Istio.

This tutorial uses the Kiali Travels Demo to teach Kiali and Istio features.

5.3.1 - Prerequisites

How to prepare for running the tutorial.

Platform Setup

This tutorial assumes you will have access to a Kubernetes cluster with Istio installed.

This tutorial has been tested using:

a Minikube installation.
an OpenShift installation.

Tip

Platform dependent tasks will be indicated with a special note like this.

This tutorial has been tested using:

minikube v1.16.0, istio 1.8.1 and kiali v1.28.0
openshift v4.8.3, istio 1.11.0 and kiali v1.39.0

Install Istio

Once you have your Kubernetes cluster ready, follow the Istio Getting Started to install and setup a demo profile that will be used in this tutorial.

Determining ingress IP and ports and creating DNS entries will be necessary in the following steps.

DNS entries can be added in a basic way to the /etc/hosts file but you can use any other DNS service that allows to resolve a domain with the external Ingress IP.

Minikube

This tutorial uses Minikube tunnel feature for external Ingress IP.

OpenShift

This tutorial uses a route for external Ingress IP.

Update Kiali

Istio defines a specific Kiali version as an addon.

In this tutorial we are going to update Kiali to the latest release version.

Assuming you have installed the addons following the Istio Getting Started guide, you can uninstall Kiali with the command:

kubectl delete -f ${ISTIO_HOME}/samples/addons/kiali.yaml --ignore-not-found

There are multiple ways to install a recent version of Kiali, this tutorial follows the Quick Start using Helm Chart.

helm install \
  --namespace istio-system \
  --set auth.strategy="anonymous" \
  --repo https://kiali.org/helm-charts \
  kiali-server \
  kiali-server

Access the Kiali UI

The Istio istioctl client has an easy method to expose and access Kiali:

${ISTIO_HOME}/bin/istioctl dashboard kiali

There are other alternatives to expose Kiali or other Addons in Istio. Check Remotely Accessing Telemetry Addons for more information.

After the Prerequisites you should be able to access Kiali. Verify its version by clicking the “?” icon and selecting “About”:

Verify Kiali Access

5.3.2 - Install Travel Demo

Installing and understanding the tutorial demo.

Deploy the Travel Demo

This demo application will deploy several services grouped into three namespaces.

Note that at this step we are going to deploy the application without any reference to Istio.

We will join services to the ServiceMesh in a following step.

To create and deploy the namespaces perform the following commands:

OpenShift

OpenShift users can substitute oc for kubectl. OpenShift users will need to add the necessary NetworkAttachmentDefinition to each namespace. Also, the necessary SecurityContextConstraints for the service accounts defined in the namespace (minimally, default).

kubectl create namespace travel-agency
kubectl create namespace travel-portal
kubectl create namespace travel-control

kubectl apply -f <(curl -L https://raw.githubusercontent.com/kiali/demos/master/travels/travel_agency.yaml) -n travel-agency
kubectl apply -f <(curl -L https://raw.githubusercontent.com/kiali/demos/master/travels/travel_portal.yaml) -n travel-portal
kubectl apply -f <(curl -L https://raw.githubusercontent.com/kiali/demos/master/travels/travel_control.yaml) -n travel-control

Check that all deployments rolled out as expected:

$ kubectl get deployments -n travel-control
NAME      READY   UP-TO-DATE   AVAILABLE   AGE
control   1/1     1            1           85s

$ kubectl get deployments -n travel-portal
NAME      READY   UP-TO-DATE   AVAILABLE   AGE
travels   1/1     1            1           91s
viaggi    1/1     1            1           91s
voyages   1/1     1            1           91s

$ kubectl get deployments -n travel-agency
NAME            READY   UP-TO-DATE   AVAILABLE   AGE
cars-v1         1/1     1            1           96s
discounts-v1    1/1     1            1           96s
flights-v1      1/1     1            1           96s
hotels-v1       1/1     1            1           96s
insurances-v1   1/1     1            1           96s
mysqldb-v1      1/1     1            1           96s
travels-v1      1/1     1            1           96s

Understanding the demo application

Travel Portal namespace

The Travel Demo application simulates two business domains organized in different namespaces.

In a first namespace called travel-portal there will be deployed several travel shops, where users can search for and book flights, hotels, cars or insurance.

The shop applications can behave differently based on request characteristics like channel (web or mobile) or user (new or existing).

These workloads may generate different types of traffic to imitate different real scenarios.

All the portals consume a service called travels deployed in the travel-agency namespace.

Travel Agency namespace

A second namespace called travel-agency will host a set of services created to provide quotes for travel.

A main travels service will be the business entry point for the travel agency. It receives a destination city and a user as parameters and it calculates all elements that compose a travel budget: airfare, lodging, car reservation and travel insurance.

Each service can provide an independent quote and the travels service must then aggregate them into a single response.

Additionally, some users, like registered users, can have access to special discounts, managed as well by an external service.

Service relations between namespaces can be described in the following diagram:

Travel Demo Design

Travel Portal and Travel Agency flow

A typical flow consists of the following steps:

. A portal queries the travels service for available destinations. . Travels service queries the available hotels and returns to the portal shop. . A user selects a destination and a type of travel, which may include a flight and/or a car, hotel and insurance. . Cars, Hotels and Flights may have available discounts depending on user type.

Travel Control namespace

The travel-control namespace runs a business dashboard with two key features:

Allow setting changes for every travel shop simulator (traffic ratio, device, user and type of travel).
Provide a business view of the total requests generated from the travel-portal namespace to the travel-agency services, organized by business criteria as grouped per shop, per type of traffic and per city.

Travel Dashboard

5.3.3 - First Steps

Understanding proxy injection and Gateways.

Missing Sidecars

The Travel Demo has been deployed in the previous step but without installing any Istio sidecar proxy.

In that case, the application won’t connect to the control plane and won’t take advantage of Istio’s features.

In Kiali, we will see the new namespaces in the overview page:

Overview

But we won’t see any traffic in the graph page for any of these new namespaces:

Empty Graph

If we examine the Applications, Workloads or Services page, it will confirm that there are missing sidecars:

Missing Sidecar

Enable Sidecars

In this tutorial, we will add namespaces and workloads into the ServiceMesh individually step by step.

This will help you to understand how Istio sidecar proxies work and how applications can use Istio’s features.

We are going to start with the control workload deployed into the travel-control namespace:

Step 1

Enable Auto Injection on the travel-control namespace

Enable Auto Injection per Namespace

Step 2

Enable Auto Injection for control workload

Enable Auto Injection per Workkload

Understanding what happened:

(i) Sidecar Injection

(ii) Automatic Sidecar Injection

Open Travel Demo to Outside Traffic

The control workload now has an Istio sidecar proxy injected but this application is not accessible from the outside.

In this step we are going to expose the control service using an Istio Ingress Gateway which will map a path to a route at the edge of the mesh.

Step 1

Create a DNS entry for the control service associated with the External IP of the Istio Ingress

There are multiple ways to create a DNS entry depending of the platform, servers or services that you are using. This step depends on the platform you have chosen, please review Determining the Ingress IP and Ports for more details.

Minikube

Kubernetes Service EXTERNAL-IP for “LoadBalancer” TYPE is provided in minikube plaform using the minikube tunnel tool.

For minikube we will check the External IP of the Ingress gateway:

$ kubectl get services/istio-ingressgateway -n istio-system
NAME                   TYPE           CLUSTER-IP     EXTERNAL-IP    PORT(S)                                                                      AGE
istio-ingressgateway   LoadBalancer   10.101.6.144   10.101.6.144   15021:30757/TCP,80:32647/TCP,443:30900/TCP,31400:30427/TCP,15443:31072/TCP   19h

And we will add a simple entry to the /etc/hosts of the tutorial machine with the desired DNS entry:

...
10.101.6.144 control.travel-control.istio-cluster.org
...

Then, from this machine, the url control.travel-control.istio-cluster.org will resolve to the External IP of the Ingress Gateway of Istio.

OpenShift

OpenShift does not provide the Kubernetes Service EXTERNAL-IP for “LoadBalancer” TYPE. Instead, you can expose the istio-ingressgateway service.

For OpenShift we will expose the Ingress gateway as a service:

$ oc expose service istio-ingressgateway -n istio-system
$ oc get routes -n istio-system
NAME                   HOST/PORT                                  PATH   SERVICES               PORT    TERMINATION          WILDCARD
istio-ingressgateway   <YOUR_ROUTE_HOST>                                 istio-ingressgateway   http2                        None

Then, from this machine, the host <YOUR_ROUTE_HOST> will resolve to the External IP of the Ingress Gateway of Istio. For OpenShift we will not define a DNS entry, instead, where you see control.travel-control.istio-cluster.org in the steps below, subsitute the value of <YOUR_ROUTE_HOST>.

Step 2

Use the Request Routing Wizard on the control service to generate a traffic rule

Request Routing Wizard

Use “Add Route Rule” button to add a default rule where any request will be routed to the control workload.

Routing Rule

Use the Advanced Options and add a gateway with host control.travel-control.istio-cluster.org and create the Istio config.

Create Gateway

Verify the Istio configuration generated.

Istio Config

Step 3

Test the control service by pointing your browser to \http://control.travel-control.istio-cluster.org

Test Gateway

Step 4

Review travel-control namespace in Kiali

Travel Control Graph

Understanding what happened:

External traffic enters into the cluster through a Gateway
Traffic is routed to the control service through a VirtualService
Kiali Graph visualizes the traffic telemetry reported from the control sidecar proxy
- Only the travel-control namespace is part of the mesh

(i) Istio Gateway

(ii) Istio Virtual Service

5.3.4 - Observe

Observability with Kiali: graphs, metrics, logs, tracing…

Enable Sidecars in all workloads

An Istio sidecar proxy adds a workload into the mesh.

Proxies connect with the control plane and provide Service Mesh functionality.

Automatically providing metrics, logs and traces is a major feature of the sidecar.

In the previous steps we have added a sidecar only in the travel-control namespace’s control workload.

We have added new powerful features but the application is still missing visibility from other workloads.

Step 1

Switch to the Workload graph and select multiple namespaces to identify missing sidecars in the Travel Demo application

Missing Sidecars

That control workload provides good visibility of its traffic, but telemetry is partially enabled, as travel-portal and travel-agency workloads don’t have sidecar proxies.

Step 2

Enable proxy injection in travel-portal and travel-agency namespaces

In the First Steps of this tutorial we didn’t inject the sidecar proxies on purpose to show a scenario where only some workloads may have sidecars.

Typically, Istio users annotate namespaces before the deployment to allow Istio to automatically add the sidecar when the application is rolled out into the cluster. Perform the following commands:

kubectl label namespace travel-agency istio-injection=enabled
kubectl label namespace travel-portal istio-injection=enabled

kubectl rollout restart deploy -n travel-portal
kubectl rollout restart deploy -n travel-agency

Verify that travel-control, travel-portal and travel-agency workloads have sidecars deployed:

Updated Workloads

Step 3

Verify updated telemetry for travel-portal and travel-agency namespaces

Updated Telemetry

Graph walkthrough

The graph provides a powerful set of Graph Features to visualize the traffic topology of the service mesh.

In this step, we will show how to use the Graph to show relevant information in the context of the Travel Demo application.

Our goal will be to identify the most critical service of the demo application.

Step 1

Select all travel- namespaces in the graph and enable Traffic Distribution edge labels in the Display Options:

Graph Request Distribution

Review the status of the mesh, everything seems healthy, but also note that hotels service has more load compared to other services inlcuded in the travel-agency namespace.

Step 2

Select the hotels service, use the graph side-panel to select a trace from the Traces tab:

Hotels Normal Trace

Combining telemetry and tracing information will show that there are traces started from a portal that involve multiple services but also other traces that only consume the hotels service.

Hotels Single Trace

Step 3

Select the main travels application and double-click to zoom in

Travels Zoom

The graph can focus on an element to study a particular context in detail. Note that a contextual menu is available using right-click, to easily shortcut the navigation to other sections.

Application details

Kiali provides Detail Views to navigate into applications, workloads and services.

These views provide information about the structure, health, metrics, logs, traces and Istio configuration for any application component.

In this tutorial we are going to learn how to use them to examine the main travels application of our example.

Step 1

Navigate to the travels application

Travels Application

An application is an abstract group of workloads and services labeled with the same “application” name.

From Service Mesh perspective this concept is significant as telemetry and tracing signals are mainly grouped by “application” even if multiple workloads are involved.

At this point of the tutorial, the travels application is quite simple, just a travels-v1 workload exposed through the travels service. Navigate to the travels-v1 workload detail by clicking the link in the travels application overview.

Travels-v1 Workload

Step 2

Examine Outbound Metrics of travels-v1 workload

Travels-v1 Metrics

The Metrics tab provides a powerful visualization of telemetry collected by the Istio proxy sidecar. It presents a dashboard of charts, each of which can be expanded for closer inspection. Expand the Request volume chart:

Travels-v1 Request Volume Chart

Metrics Settings provides multiple predefined criteria out-of-the-box. Additionally, enable the spans checkbox to correlate metrics and tracing spans in a single chart.

We can see in the context of the Travels application, the hotels service request volume differs from that of the other travel-agency services.

By examining the Request Duration chart also shows that there is no suspicious delay, so probably this asymmetric volume is part of the application business’ logic.

Step 3

Review Logs of travels-v1 workload

The Logs tab provides a unified view of application container logs with the Istio sidecar proxy logs. It also offers a spans checkbox, providing a correlated view of both logs and tracing, helping identify spans of interest.

From the application container log we can spot that there are two main business methods: GetDestinations and GetTravelQuote.

In the Istio sidecar proxy log we see that GetDestinations invokes a GET /hotels request without parameters.

Travels-v1 Logs GetDestinations

However, GetTravelQuote invokes multiple requests to other services using a specific city as a parameter.

Travels-v1 Logs GetTravelQuote

Then, as discussed in the Travel Demo design, an initial query returns all available hotels before letting the user choose one and then get specific quotes for other destination services.

That scenario is shown in the increase of the hotels service utilization.

Step 4

Review Traces of workload-v1

Now we have identified that the hotels service has more use than other travel-agency services.

The next step is to get more context to answer if some particular service is acting slower than expected.

The Traces tab allows comparison between traces and metrics histograms, letting the user determine if a particular spike is expected in the context of average values.

Travels-v1 Traces

In the same context, individual spans can be compared in more detail, helping to identify a problematic step in the broader scenario.

Travels-v1 Spans

5.3.5 - Connect

Using Kiali to configure Istio’s traffic management.

Request Routing

The Travel Demo application has several portals deployed on the travel-portal namespace consuming the travels service deployed on the travel-agency namespace.

The travels service is backed by a single workload called travels-v1 that receives requests from all portal workloads.

At a moment of the lifecycle the business needs of the portals may differ and new versions of the travels service may be necessary.

This step will show how to route requests dynamically to multiple versions of the travels service.

Step 1

Deploy travels-v2 and travels-v3 workloads

To deploy the new versions of the travels service execute the following commands:

kubectl apply -f <(curl -L https://raw.githubusercontent.com/kiali/demos/master/travels/travels-v2.yaml) -n travel-agency
kubectl apply -f <(curl -L https://raw.githubusercontent.com/kiali/demos/master/travels/travels-v3.yaml) -n travel-agency

Travels-v2 and travels-v3

As there is no specific routing defined, when there are multiple workloads for travels service the requests are uniformly distributed.

Travels graph before routing

Step 2

Investigate the http headers used by the Travel Demo application

The Traffic Management features of Istio allow you to define Matching Conditions for dynamic request routing.

In our scenario we would like to perform the following routing logic:

All traffic from travels.uk routed to travels-v1
All traffic from viaggi.it routed to travels-v2
All traffic from voyages.fr routed to travels-v3

Portal workloads use HTTP/1.1 protocols to call the travels service, so one strategy could be to use the HTTP headers to define the matching condition.

But, where to find the HTTP headers ? That information typically belongs to the application domain and we should examine the code, documentation or dynamically trace a request to understand which headers are being used in this context.

There are multiple possibilities. The Travel Demo application uses an Istio Annotation feature to add an annotation into the Deployment descriptor, which adds additional Istio configuration into the proxy.

Istio Config annotations

In our example the HTTP Headers are added as part of the trace context.

Then tracing will populate custom tags with the portal, device, user and travel used.

Step 3

Use the Request Routing Wizard on travels service to generate a traffic rule

Travels Service Request Routing

We will define three “Request Matching” rules as part of this request routing. Define all three rules before clicking the Create button.

In the first rule, we will add a request match for when the portal header has the value of travels.uk.

Define the exact match, like below, and click the “Add Match” button to update the “Matching selected” for this rule.

Add Request Matching

Move to “Route To” tab and update the destination for this “Request Matching” rule. Then use the “Add Route Rule” to create the first rule.

Route To

Add similar rules to route traffic from viaggi.it to travels-v2 workload and from voyages.fr to travels-v3 workload.

When the three rules are defined you can use “Create” button to generate all Istio configurations needed for this scenario. Note that the rule ordering does not matter in this scenario.

Rules Defined

The Istio config for a given service is found on the “Istio Config” card, on the Service Details page.

Service Istio Config

Step 4

Verify that the Request Routing is working from the travels-portal Graph

Once the Request Routing is working we can verify that outbound traffic from every portal goes to the single travels workload. To see this clearly use a “Workload Graph” for the “travel-portal” namespace, enable “Traffic Distribution” edge labels and disable the “Service Nodes” Display option:

Travel Portal Namespace Graph

Note that no distribution label on an edge implies 100% of traffic.

Examining the “Inbound Traffic” for any of the travels workloads will show a similar pattern in the telemetry.

Travels v1 Inbound Traffic

Using a custom time range to select a large interval, we can see how the workload initially received traffic from all portals but then only a single portal after the Request Routing scenarios were defined.

Step 5

Update or delete Istio Configuration

Kiali Wizards allow you to define high level Service Mesh scenarios and will generate the Istio Configuration needed for its implementation (VirtualServices, DestinationRules, Gateways and PeerRequests). These scenarios can be updated or deleted from the “Actions” menu of a given service.

To experiment further you can navigate to the travels service and update your configuration by selecting “Request Routing”, as shown below. When you have finished experimenting with Routing Request scenarios then use the “Actions” menu to delete the generated Istio config.

Update or Delete

Fault Injection

The Observe step has spotted that the hotels service has additional traffic compared with other services deployed in the travel-agency namespace.

Also, this service becomes critical in the main business logic. It is responsible for querying all available destinations, presenting them to the user, and getting a quote for the selected destination.

This also means that the hotels service may be one of the weakest points of the Travel Demo application.

This step will show how to test the resilience of the Travel Demo application by injecting faults into the hotels service and then observing how the application reacts to this scenario.

Step 1

Use the Fault Injection Wizard on hotels service to inject a delay

Fault Injection Action

Select an HTTP Delay and specify the “Delay percentage” and “Fixed Delay” values. The default values will introduce a 5 seconds delay into 100% of received requests.

HTTP Delay

Step 2

Understanding source and destination metrics

Telemetry is collected from proxies and it is labeled with information about the source and destination workloads.

In our example, let’s say that travels service (“Service A” in the Istio diagram below) invokes the hotels service (“Service B” in the diagram). Travels is the “source” workload and hotels is the “destination” workload. The travels proxy will report telemetry from the source perspective and hotels proxy will report telemetry from the destination perspective. Let’s look at the latency reporting from both perspectives.

Istio Architecture

The travels workload proxy has the Fault Injection configuration so it will perform the call to the hotels service and will apply the delay on the travels workload side (this is reported as source telemetry).

We can see in the hotels telemetry reported by the source (the travels proxy) that there is a visible gap showing 5 second delay in the request duration.

Source Metrics

But as the Fault Injection delay is applied on the source proxy (travels), the destination proxy (hotels) is unaffected and its destination telemetry show no delay.

Destination Metrics

Step 3

Study the impact of the travels service delay

The injected delay is propagated from the travels service to the downstream services deployed on travel-portal namespace, degrading the overall response time. But the downstream services are unaware, operate normally, and show a green status.

Degraded Response Time

Step 4

Update or delete Istio Configuration

As part of this step you can update the Fault Injection scenario to test different delays. When finished, you can delete the generated Istio config for the hotels service.

Traffic Shifting

In the previous Request Routing step we have deployed two new versions of the travels service using the travels-v2 and travels-v3 workloads.

That scenario showed how Istio can route specific requests to specific workloads. It was configured such that each portal deployed in the travel-portal namespace (travels.uk, viaggi.it and voyages.fr) were routed to a specific travels workload (travels-v1, travels-v2 and travels-v3).

This Traffic Shifting step will simulate a new scenario: the new travels-v2 and travels-v3 workloads will represent new improvements for the travels service that will be used by all requests.

These new improvements implemented in travels-v2 and travels-v3 represent two alternative ways to address a specific problem. Our goal is to test them before deciding which one to use as a next version.

At the beginning we will send 80% of the traffic into the original travels-v1 workload, and will split 10% of the traffic each on travels-v2 and travels-v3.

Step 1

Use the Traffic Shifting Wizard on travels service

Traffic Shifting Action

Create a scenario with 80% of the traffic distributed to travels-v1 workload and 10% of the traffic distributed each to travels-v2 and travels-v3.

Split Traffic

Step 2

Examine Traffic Shifting distribution from the travels-agency Graph

Travels Graph

Step 3

Compare travels workload and assess new changes proposed in travels-v2 and travels-v3

Istio Telemetry is grouped per logical application. That has the advantage of easily comparing different but related workloads, for one or more services.

In our example, we can use the “Inbound Metrics” and “Outbound Metrics” tabs in the travels application details, group by “Local version” and compare how travels-v2 and travels-v3 are working.

Compare Travels Workloads

The charts show that the Traffic distribution is working accordingly and 80% is being distributed to travels-v1 workload and they also show no big differences between travels-v2 and travels-v3 in terms of request duration.

Step 4

Update or delete Istio Configuration

As part of this step you can update the Traffic Shifting scenario to test different distributions. When finished, you can delete the generated Istio config for the travels service.

TCP Traffic Shifting

The Travel Demo application has a database service used by several services deployed in the travel-agency namespace.

At some point in the lifecycle of the application the telemetry shows that the database service degrades and starts to increase the average response time.

This is a common situation. In this case, a database specialist suggests an update of the original indexes due to the data growth.

Our database specialist is suggesting two approaches and proposes to prepare two versions of the database service to test which may work better.

This step will show how the “Traffic Shifting” strategy can be applied to TCP services to test which new database indexing strategy works better.

Step 1

Deploy mysqldb-v2 and mysqldb-v3 workloads

To deploy the new versions of the mysqldb service execute the commands:

kubectl apply -f <(curl -L https://raw.githubusercontent.com/kiali/demos/master/travels/mysql-v2.yaml) -n travel-agency
kubectl apply -f <(curl -L https://raw.githubusercontent.com/kiali/demos/master/travels/mysql-v3.yaml) -n travel-agency

Step 2

Use the TCP Traffic Shifting Wizard on mysqldb service

TCP Traffic Shifting Action

Create a scenario with 80% of the traffic distributed to mysqldb-v1 workload and 10% of the traffic distributed each to mysqldb-v2 and mysqldb-v3.

TCP Split Traffic

Step 3

Examine Traffic Shifting distribution from the travels-agency Graph

MysqlDB Graph

Note that TCP telemetry has different types of metrics, as “Traffic Distribution” is only available for HTTP/gRPC services, for this service we need to use “Traffic Rate” to evaluate the distribution of data (bytes-per-second) between mysqldb workloads.

Step 4

Compare mysqldb workload and study new indexes proposed in mysqldb-v2 and mysqldb-v3

TCP services have different telemetry but it’s still grouped by versions, allowing the user to compare and study pattern differences for mysqldb-v2 and mysqldb-v3.

Compare MysqlDB Workloads

The charts show more peaks in mysqldb-v2 compared to mysqldb-v3 but overall a similar behavior, so it’s probably safe to choose either strategy to shift all traffic.

Step 5

Update or delete Istio Configuration

As part of this step you can update the TCP Traffic Shifting scenario to test a different distribution. When finished, you can delete the generated Istio config for the mysqldb service.

Request Timeouts

In the Fault Injection step we showed how we could introduce a delay in the critical hotels service and test the resilience of the application.

The delay was propagated across services and Kiali showed how services accepted the delay without creating errors on the system.

But in real scenarios delays may have important consequences. Services may prefer to fail sooner, and recover, rather than propagating a delay across services.

This step will show how to add a request timeout for one of the portals deployed in travel-portal namespace. The travel.uk and viaggi.it portals will accept delays but voyages.fr will timeout and fail.

Step 1

Use the Fault Injection Wizard on hotels service to inject a delay

Repeat the Fault Injection step to add delay on hotels service.

Step 2

Use the Request Routing Wizard on travels service to add a route rule with delay for voyages.fr

Add a rule to add a request timeout only on requests coming from voyages.fr portal:

Use the Request Matching tab to add a matching condition for the portal header with voyages.fr value.
Use the Request Timeouts tab to add an HTTP Timeout for this rule.
Add the rule to the scenario.

Request Timeout Rule

A first rule should be added to the list like:

Voyages Portal Rule

Add a second rule to match any request and create the scenario. With this configuration, requests coming from voyages.fr will match the first rule and all others will match the second rule.

Any Request Rule

Step 3

Review the impact of the request timeout in the travels service

Create the rule. The Graph will show how requests coming from voyages.fr start to fail, due to the request timeout introduced.

Requests coming from other portals work without failures but are degraded by the hotels delay.

Travels Graph

This scenario can be visualized in detail if we examine the “Inbound Metrics” and we group by “Remote app” and “Response code”.

Travels Inbound Metrics

As expected, the requests coming from voyages.fr don’t propagate the delay and they fail in the 2 seconds range, meanwhile requests from other portals don’t fail but they propagate the delay introduced in the hotels service.

Step 4

Update or delete Istio Configuration

As part of this step you can update the scenarios defined around hotels and travels services to experiment with more conditions, or you can delete the generated Istio config in both services.

Circuit Breaking

Distributed systems will benefit from failing quickly and applying back pressure, as opposed to propagating delays and errors through the system.

Circuit breaking is an important technique used to limit the impact of failures, latency spikes, and other types of network problems.

This step will show how to apply a Circuit Breaker into the travels service in order to limit the number of concurrent requests and connections.

Step 1

Deploy a new loadtester portal in the travel-portal namespace

In this example we are going to deploy a new workload that will simulate an important increase in the load of the system.

OpenShift

OpenShift users may need to also add the associated loadtester serviceaccount to the necessary securitycontextcontraints.

kubectl apply -f <(curl -L https://raw.githubusercontent.com/kiali/demos/master/travels/travel_loadtester.yaml) -n travel-portal

The loadtester workload will try to create 50 concurrent connections to the travels service, adding considerable pressure to the travels-agency namespace.

Loadtester Graph

The Travel Demo application is capable of handling this load and in a first look it doesn’t show unhealthy status.

Loadtester Details

But in a real scenario an unexpected increase in the load of a service like this may have a significant impact in the overall system status.

Step 2

Use the Traffic Shifting Wizard on travels service to generate a traffic rule

Use the “Traffic Shifting” Wizard to distribute traffic (evenly) to the travels workloads and use the “Advanced Options” to add a “Circuit Breaker” to the scenario.

Traffic Shifting with Circuit Breaker

The “Connection Pool” settings will indicate that the proxy sidecar will reject requests when the number of concurrent connections and requests exceeds more than one.

The “Outlier Detection” will eject a host from the connection pool if there is more than one consecutive error.

Step 3

Study the behavior of the Circuit Breaker in the travels service

In the loadtester versioned-app Graph we can see that the travels service’s Circuit Breaker accepts some, but fails most, connections.

Remember, that these connections are stopped by the proxy on the loadtester side. That “fail sooner” pattern prevents overloading the network.

Using the Graph we can select the failed edge, check the Flags tab, and see that those requests are closed by the Circuit breaker.

Loadtester Flags Graph

If we examine the “Request volume” metric from the “Outbound Metrics” tab we can see the evolution of the requests, and how the introduction of the Circuit Breaker made the proxy reduce the request volume.

Loadtester Outbound Metrics

Step 4

Update or delete Istio Configuration

As part of this step you can update the scenarios defined around the travels service to experiment with more Circuit Breaker settings, or you can delete the generated Istio config in the service.

Understanding what happened:

(i) Circuit Breaking

(ii) Outlier Detection

(iii) Connection Pool Settings

(iv) Envoy’s Circuit breaking Architecture

Mirroring

This tutorial has shown several scenarios where Istio can route traffic to different versions in order to compare versions and evaluate which one works best.

The Traffic Shifting step was focused on travels service adding a new travels-v2 and travels-v3 workloads and the TCP Traffic Shifting showed how this scenario can be used on TCP services like mysqldb service.

Mirroring (or shadowing) is a particular case of the Traffic Shifting scenario where the proxy sends a copy of live traffic to a mirrored service.

The mirrored traffic happens out of band of the primary request path. It allows for testing of alternate services, in production environments, with minimal risk.

Istio mirrored traffic is only supported for HTTP/gRPC protocols.

This step will show how to apply mirrored traffic into the travels service.

Step 1

Use the Traffic Shifting Wizard on travels service

We will simulate the following:

travels-v1 is the original traffic and it will keep 80% of the traffic
travels-v2 is the new version to deploy, it’s being evaluated and it will get 20% of the traffic to compare against travels-v1
But travels-v3 will be considered as a new, experimental version for testing outside of the regular request path. It will be defined as a mirrored workload on 50% of the original requests.

Mirrored Traffic

Step 2

Examine Traffic Shifting distribution from the travels-agency Graph

Note that Istio does not report mirrored traffic telemetry from the source proxy. It is reported from the destination proxy, although it is not flagged as mirrored, and therefore an edge from travels to the travels-v3 workload will appear in the graph. Note the traffic rates reflect the expected ratio of 80/20 between travels-v1 and travels-v2, with travels-v3 at about half of that total.

Mirrored Graph

This can be examined better using the “Source” and “Destination” metrics from the “Inbound Metrics” tab.

The “Source” proxy, in this case the proxies injected into the workloads of travel-portal namespace, won’t report telemetry for travels-v3 mirrored workload.

Mirrored Source Metrics

But the “Destination” proxy, in this case the proxy injected in the travels-v3 workload, will collect the telemetry from the mirrored traffic.

Mirrored Destination Metrics

Step 3

Update or delete Istio Configuration

As part of this step you can update the Mirroring scenario to test different mirrored distributions.

When finished you can delete the generated Istio config for the travels service.

5.3.6 - Secure

Using Kiali to configure and observe mesh security.

Authorization Policies and Sidecars

Security is one of the main pillars of Istio features.

The Istio Security High Level Architecture provides a comprehensive solution to design and implement multiple security scenarios.

In this tutorial we will show how Kiali can use telemetry information to create security policies for the workloads deployed in a given namespace.

Istio telemetry aggregates the ServiceAccount information used in the workloads communication. This information can be used to define authorization policies that deny and allow actions on future live traffic communication status.

Additionally, Istio sidecars can be created to limit the hosts with which a given workload can communicate. This improves traffic control, and also reduces the memory footprint of the proxies.

This step will show how we can define authorization policies for the travel-agency namespace, in the Travel Demo application, for all existing traffic in a given time period.

Once authorization policies are defined, a new workload will be rejected if it doesn’t match the security rules defined.

Step 1

Undeploy the loadtester workload from travel-portal namespace

In this example we will use the loadtester workload as the “intruder” in our security rules.

If we have followed the previous tutorial steps, we need to undeploy it from the system.

kubectl delete -f <(curl -L https://raw.githubusercontent.com/kiali/demos/master/travels/travel_loadtester.yaml) -n travel-portal

We should validate that telemetry has updated the travel-portal namespace and “Security” can be enabled in the Graph Display options.

Travel Portal Graph

Step 2

Create Authorization Policies, and Istio Sidecars, for current traffic for travel-agency namespace

Every workload in the cluster uses a Service Account.

travels.uk, viaggi.it and voyages.fr workloads use the default cluster.local/ns/travel-portal/sa/default ServiceAccount defined automatically per namespace.

This information is propagated into the Istio Telemetry and Kiali can use it to define a set of AuthorizationPolicy rules, and Istio Sidecars.

The Sidecars restrict the list of hosts with which each workload can communicate, based on the current traffic.

The “Create Traffic Policies” action, located in the Overview page, will create these definitions.

Create Traffic Policies

This will generate a main DENY ALL rule to protect the whole namespace, and an individual ALLOW rule per workload identified in the telemetry.

Travel Agency Authorization Policies

It will create also an individual Sidecar per workload, each of them containing the set of hosts.

Travel Agency Sidecars

As an example, we can see that for the travels-v1 workload, the following list of hosts are added to the sidecar.

Travels V1 Sidecar

Step 3

Deploy the loadtester portal in the travel-portal namespace

If the loadtester workload uses a different ServiceAccount then, when it’s deployed, it won’t comply with the AuthorizationPolicy rules defined in the previous step.

OpenShift

OpenShift users may need to also add the associated loadtester serviceaccount to the necessary securitycontextcontraints.

kubectl apply -f <(curl -L https://raw.githubusercontent.com/kiali/demos/master/travels/travel_loadtester.yaml) -n travel-portal

Now, travels workload will reject requests made by loadtester workload and that situation will be reflected in Graph:

Loadtester Denied

This can also be verified in the details page using the Outbound Metrics tab grouped by response code (only the 403 line is present).

Loadtester Denied Metrics

Inspecting the Logs tab confirms that loadtester workload is getting a HTTP 403 Forbidden response from travels workloads, as expected.

Loadtester Logs

Step 4

Update travels-v1 AuthorizationPolicy to allow loadtester ServiceAccount

AuthorizationPolicy resources are defined per workload using matching selectors.

As part of the example, we can show how a ServiceAccount can be added into an existing rule to allow traffic from loadtester workload into the travels-v1 workload only.

AuthorizationPolicy Edit

As expected, now we can see that travels-v1 workload accepts requests from all travel-portal namespace workloads, but travels-v2 and travels-v3 continue rejecting requests from loadtester source.

Travels v1 AuthorizationPolicy

Using “Outbound Metrics” tab from the loadtester workload we can group per “Remote version” and “Response code” to get a detailed view of this AuthorizationPolicy change.

Travels v1 AuthorizationPolicy

Step 5

Verify the proxies clusters list is limited by the Sidecars

According to Istio Sidecar documentation, Istio configures all mesh sidecar proxies to reach every mesh workload. After the sidecars are created, the list of hosts is reduced according to the current traffic. To verify this, we can look for the clusters configured in each proxy.

As an example, looking into the cars-v1 workload, we can see that there is a reduced number of clusters with which the proxy can communicate.

Cars v1 clusters

Step 6

Update or delete Istio Configuration

As part of this step, you can update the AuthorizationPolicies and Istio Sidecars generated for the travel-agency namespace, and experiment with more security rules. Or, you can delete the generated Istio config for the namespace.

5.3.7 - Uninstall Travel Demo

Wrap up the tutorial.

To uninstall the Travel Demo application perform the following commands:

kubectl delete namespace travel-agency
kubectl delete namespace travel-portal
kubectl delete namespace travel-control

6 - Architecture and Terms

High-level description of the Kiali architecture and a glossary of common terms.

6.1 - Architecture

Overview of the Kiali architecture.

Kiali architecture

Kiali is composed of two components: a back-end application running in the container application platform, and a user-facing front-end application. Kiali depends on external services and components provided by the container application platform and Istio.

The following diagram illustrates the components involved in Kiali and its interactions:

Kiali architecture

Kiali back-end

The back-end is the application that runs in the container application platform. It’s written in Go. The code can be found at kiali/kiali GitHub repository.

This is the component that communicates with Istio parts, retrieves and processes data, and exposes this data to the front-end.

The back-end doesn’t need storage. The back-end configuration is managed via the Kiali CR when Kiali is installed via the Kiali operator, or via a configmap when installed via Helm.

Kiali front-end

The front-end is a single page web application, built using Patternfly, React, Typescript and Redux. The code can be found at kiali/kiali frontend folder.

In a standard deployment, the back-end serves the front-end. Then, the front-end queries the Kiali back-end in order to get data and present it to the user.

There are limited options for personalization, the front-end is mainly stateless. Some data may be persisted, such as session credentials, but this data is stored in the browser and won’t be available in other browsers nor other devices.

Istio Service Mesh

Kiali is a console for Istio, and as such, Istio is a requirement. It provides and controls the service mesh. Kiali and Istio are installed separately.

Kiali needs to retrieve Istio data and configurations, which are exposed through Prometheus, the Kubernetes API, and istiod. For environments where istiod is inaccessible, Kiali’s communication with istiod can be disabled.

Prometheus

Prometheus is an Istio dependency. When Istio telemetry is enabled, metrics data is stored in Prometheus. Kiali uses the data stored in Prometheus to figure out the mesh topology, show metrics, calculate health, show possible problems, etc.

Kiali communicates directly with Prometheus and assumes the metrics used by Istio Telemetery. It’s a hard dependency for Kiali, and many Kiali features will not work without it.

Currently, Kiali relies on Istio’s default metrics set. Make sure that these default metrics are always in place. Some metric customization is possible as long as the Kiali requirements are still met. For the current list of required metrics see this FAQ entry.

Kubernetes API

Kiali uses the API of the container application platform in order to fetch and resolve service mesh configurations.

Container application platforms where Kiali is known to work are OKD and Kubernetes. Kiali should also work on the derivatives of these platforms.

Kiali queries the Kubernetes API to retrieve, for example, definitions for namespaces, services, deployments, pods, and other entities. Kiali also makes queries to resolve relationships between the different cluster entities.

The Kubernetes API is also queried to retrieve Istio configurations like virtual services, destination rules, route rules, gateways, and quotas.

Jaeger

Jaeger is optional. When available, Kiali will be able to direct the user to Jaeger’s tracing data. If you need this feature, make sure Kiali is properly configured for Jaeger integration.

Tracing data will be available only if Istio’s distributed tracing is enabled.

As an alternative, Grafana Tempo can be used.

Grafana

Grafana is optional. When available, the metrics pages of Kiali will show a link to direct the user to the same metric in Grafana. If you need this feature, make sure Kiali is properly configured for Grafana integration.

Kiali has basic metric capabilities. It can show the default Istio metrics for workloads, apps and services. It allows to apply some groupings to the provided metrics and fetch metrics for different time ranges. However, Kiali doesn’t allow to customize the views nor customize the Prometheus queries. If you need these capabilities, you’ll want to install Grafana. Follow the Istio documentation to install Grafana if you need it.

6.2 - Terminology

Glossary of terms and concepts used by Kiali.

6.2.1 - Concepts

Shared vocabulary for Kubernetes, Istio and Kiali.

Application

Is a logical grouping of Workloads defined by the application labels that users apply to an object. In Istio it is defined by the Label App. See Istio Label Requirements.

Application Name

It’s the name of the Application deployed in your environment. This name is provided by the Label App on the Workload.

Envoy

A proxy that Istio starts for each pod in the service mesh. For more information see the Istio Envoy Documentation.

Envoy Health

A health check performed by Envoy proxies, for inbound and outbound traffic: see membership_healthy and membership_total from Envoy documentation.

Istio object/configuration Type

This is the type specified in the Istio Config. This could be any of the following types: Gateway, Virtual Service, DestinationRule, ServiceEntry, Rule, Quota or QuotaSpecBinding.

Istio Sidecar

For more information see the Istio Sidecar definition in Istio Sidecar Documentation.

Label

It’s a user-created tag to identify a set of objects.

An empty label selector (that is, one with zero requirements) selects every object in the collection.

A null label selector (which is only possible for optional selector fields) selects no objects.

For example, Istio uses the Label App & Label Version on a Workload to specify the version and the application.

Label App

This is the ‘app’ label on an object. For more information, see Istio Label Requirements.

Label Version

This is the ‘version’ label on an object. For more information, see Istio Label Requirements.

Namespace

Namespaces are intended for use in environments with many users spread across multiple teams, or projects.

Namespaces are a way to divide cluster resources between multiple users.

Quota

A limited or fixed number or amount of resources.

ReplicaSet

Ensures that a specified number of pod replicas are running at any one time.

Service

A Service is an abstraction which defines a logical set of Pods and a policy by which to access them. A Service is determined by a Label.

Service Entry

For more information see the Service Entry definition in Istio Service Entry Documentation.

Virtual Service

For more information see the Virtual Service definition in Istio VirtualService Documentation.

Workload

For more information see the Istio Workload definition.

6.2.2 - Networking

Vocabulary around networking and request traffic.

Destination

For more information see the Destination definition in the Istio Glossary.

Destination Rule

For more information see the Istio Destination Rule Documentation.

Endpoint

A communication endpoint is a type of communication network node. It is an interface exposed by a communicating party or by a communication channel.

Error Rate

It’s the percentage of errors in the traffic to a specific object for a Rate Interval.

Gateway

For more information see the Gateway definition in the Istio Gateway Documentation.

Inbound Metrics

Metrics on requests received by a given Workload, Service or Application.

Outbound Metrics

Metrics on requests emitted by a given Workload or Application.

Port

For more information see the Istio Port Documentation.

Rate Interval

It’s an amount of time. By Default in Kiali last 10 minutes.

Rule

It’s an object that manages external access to the services in a cluster, typically HTTP.

Source

For more information see the Source definition in the Istio Glossary.

Subset

For more information see the Istio Subset Documentation.

7 - FAQ

Frequently Asked Questions about Kiali.

Need More Than Community Support? Enterprise support is provided by Red Hat.

7.1 - Ambient

Questions about Ambient Mesh features.

Why can’t I see the traffic graph when not using a Waypoint?

There can be multiple reasons, but here are some troubleshooting steps:

Is the application correctly enrolled in Ambient? Make sure you see the Ambient label in the control plane card in Kiali, or make sure the namespace is labeled with istio.io/dataplane-mode=ambient.

Ambient dataplane

This means that the traffic graph will have L4 metrics if there is traffic. You want to make sure the traffic selectors select ZTunnel as well as the type of traffic (e.g. Tcp) that is flowing:

Ambient traffic

At least, Tcp and Ztunnel traffic should be selected.

Is there any traffic?

The traffic is created based on the period of time selected. If there is no traffic, the graph won’t be shown. Try to select a longer period of time or enable the Display option to see the idle nodes.

Idle nodes

Are the right metrics generated in Prometheus?

Kiali requires some metrics and attributes to generate the graph. Refer to this FAQ to help you ensure you have the required metrics in your Prometheus server.

For this particular scenario, the most important ones would be the istio_tcp_received_bytes_total and istio_tcp_sent_bytes_total where app=ztunnel. Make sure those metrics exist in Prometheus.

Other graph issues are listed here.

Why can’t I see the traffic graph when the application has a Waypoint proxy?

There can be multiple reasons, but here are some troubleshooting steps:

Is the application correctly enrolled in Ambient? Make sure you see the Ambient label in the control plane card in Kiali, or make sure the namespace is labeled with istio.io/dataplane-mode=ambient.

Ambient dataplane

Also, it must be correctly enrolled in a Waypoint proxy. Check the application details and verify that it has the L7 label and a Waypoint proxy link:

App Enrolled

Is there any traffic?

Idle nodes

Are the right metrics generated in Prometheus?

Kiali requires some metrics and attributes to generate the graph. Refer to this FAQ to help you ensure you have the required metrics in your Prometheus server.

For this particular scenario, the most important ones would be the istio_requests_total where reporter=waypoint. Make sure those metrics exist in Prometheus.

Other graph issues are listed here.

Why can’t I see traces?

In Ambient, Ztunnel doesn’t report traces, as the component is limited to L4 metrics. This means that the application should be enrolled in Waypoint to have traces.

If the application is enrolled in a Waypoint proxy, the traces will be created from the Waypoint itself. First, check if the Waypoint proxy is generating traces in the Tracing provider:

Waypoint traces

If there are no traces, verify:

If Istio is configured correctly to send traces to the tracing backend
If the Waypoint proxy is handling traffic
If the Waypoint proxy is configured correctly to send traces

If there are traces, verify that the Waypoint proxy has traces in Kiali.

Waypoint traces Kiali

If there are no traces in Kiali, there might be a problem with its distributed tracing configuration. Please refer to the distributed tracing FAQ for additional help.

If there are traces, they will be filtered by the service name to be shown in the application details (for Kiali 2.5.0+).

Kiali app traces

In that case, there are some validations to perform:

Kiali version is >= 2.5.0
The service name is the operation name that appears in the traces. Check the Kiali logs for further information.

Why do I see double edges in the Graph?

When the application is part of the Ambient Mesh and also has a Waypoint proxy, it can happen that there are telemetry from L4 (Ztunnel) and L7 (Waypoint).

Duplicated Edges

You can filter by just the Waypoint Traffic to remove the same telemetry reported from different components.

Waypoint traffic

Other tracing issues can be checked here.
Ambient documentation is here.

7.2 - Authentication

Questions about authentication strategy or configuration.

How to obtain a token when logging in via token auth strategy

When configuring Kiali to use the token auth strategy, it requires users to log into Kiali as a specific user via the user’s service account token. Thus, in order to log into Kiali you must provide a valid Kubernetes token.

Note that the following examples assume you installed Kiali in the istio-system namespace.

For Kubernetes prior to v1.24

You can extract a service account’s token from the secret that was created for you when you created the service account.

For example, if you want to log into Kiali using Kiali’s own service account, you can get the token like this:

kubectl get secret -n istio-system $(kubectl get sa kiali-service-account -n istio-system -o "jsonpath={.secrets[0].name}") -o jsonpath={.data.token} | base64 -d

For Kubernetes v1.24+

You can request a short lived token for a service account by issuing the following command:

kubectl -n istio-system create token kiali-service-account

Using the token

Once you obtain the token, you can go to the Kiali login page and copy-and-paste that token into the token field. At this point, you have logged into Kiali with the same permissions as that of the Kiali server itself (note: this gives the user the permission to see everything).

Create different service accounts with different permissions for your users to use. Each user should only have access to their own service accounts and tokens.

How to configure the originating port when Kiali is served behind a proxy (OpenID support)

When using OpenID strategy for authentication and deploying Kiali behind a reverse proxy or a load balancer, Kiali needs to know the originating port of client requests. You may need to setup your proxy to inject a X-Forwarded-Port HTTP header when forwarding the request to Kiali.

For example, when using an Istio Gateway and VirtualService to expose Kiali, you could use the headers property of the route:

spec:
  gateways:
  - istio-ingressgateway.istio-system.svc.cluster.local
  hosts:
  - kiali.abccorp.net
  http:
  - headers:
      request:
        set:
          X-Forwarded-Port: "443"
    route:
    - destination:
        host: kiali
        port:
          number: 20001

7.3 - Distributed Tracing

Questions about the Jaeger integration.

How to know which is the URL for Jaeger or Tempo?

From Kiali 2.11, a new tracing tool in the Mesh page is provided to help troubleshooting and provide possible valid tracing configurations.

Why is Jaeger unreachable or Kiali showing the error “Could not fetch traces”?

Istio components status indicator shows “Jaeger unreachable”:

Jaeger unreachable

While on any Tracing page, error “Could not fetch traces” is displayed:

Could not fetch traces

Apparently, Kiali is unable to connect to Jaeger. Make sure tracing is correctly configured in the Kiali CR.

      tracing:
        auth:
          type: none
        enabled: true
        internal_url: 'http://tracing.istio-system/jaeger'
        external_url: 'http://jaeger.example.com/'
        use_grpc: true

You need especially to pay attention to the internal_url field, which is how Kiali backend contacts the Jaeger service. In general, this URL is written using Kubernetes domain names in the form of http://service.namespace, plus a path.

If you’re not sure about this URL, try to find your Jaeger service and its exposed ports:

$ kubectl get services -n istio-system
...
tracing      ClusterIP      10.108.216.102   <none>        80/TCP      47m
...

To validate this URL, you can try to curl its API via Kiali pod, by appending /api/traces to the configured URL (in the following, replace with the appropriate Kiali pod):

$ kubectl exec -n istio-system -it kiali-556fdb8ff5-p6l2n -- curl http://tracing.istio-system/jaeger/api/traces

{"data":null,"total":0,"limit":0,"offset":0,"errors":[{"code":400,"msg":"parameter 'service' is required"}]}

If you see some returning JSON as in the above example, that should be the URL that you must configure.

If instead of that you see some blocks of mixed HTML/Javascript mentioning JaegerUI, then probably the host+port are correct but the path isn’t.

A common mistake is to forget the /jaeger suffix, which is often used in Jaeger deployments.

It may also happen that you have a service named jaeger-query, exposing port 16686, instead of the more common tracing service on port 80. In that situation, set internal_url to http://jaeger-query.istio-system:16686/jaeger.

If Jaeger needs an authentication, make sure to correctly configure the auth section.

Note that in general, Kiali will connect to Jaeger via GRPC, which provides better performances. If for some reason it cannot be done (e.g. Jaeger being behind a reverse-proxy that doesn’t support GRPC, or that needs more configuration in that purpose), it is possible to switch back to using the http/json API of Jaeger by setting use_grpc to false.

If for some reason the GRPC connection fails and you think it shouldn’t (e.g. your reverse-proxy supports it, and the non-grpc config works fine), please get in touch with us.

Why can’t I see any external link to Jaeger?

In addition to the embedded integration that Kiali provides with Jaeger, it is possible to show external links to the Jaeger UI. To do so, the external URL must be configured in the Kiali CR.

    tracing:
      # ...
      external_url: "http://jaeger.example.com/"

When configured, this URL will be used to generate a couple of links to Jaeger within Kiali. It’s also visible in the Mesh page:

Mesh page

Mesh page Jaeger

Why do I see an external link instead of Kiali’s own Tracing page?

Jaeger integration disabled

On the Application detail page, the Traces tab might redirect to Jaeger via an external link instead of showing the Kiali Tracing view. It happens when you have the external_url field configured, but not internal_url, which means the Kiali backend will not be able to connect to Jaeger.

To fix it, configure internal_url in the Kiali CR.

Why do I see “Missing root span” for the root span of some span details on Traces tab?

Missing root span

In Traces tab, while clicking on a trace, it shows the details of that trace and information about spans. These details also include the root span information. But for the traces for traffic that is not comming from ingress-gateway, the root span information is not available in Jaeger, thus Kiali is displaying “Missing root span” for those traces’ details and tooltips in Traces tab and in Graph pages.

Why do I see “error reading server preface: http2: frame too large” error when Kiali is not able to fetch Traces?

Sometimes this error can occur when there is a problem in the configuration and there is an http URL configured but Kiali is configured to use grpc. For example:

use_grpc: true 
internal_url: "http://jaeger_url:16686/jaeger"

That should be solved when use_grpc: false or using the grpc port internal_url: "http://jaeger_url:16685/jaeger"

Why do I see “[gRPC Tempo] GetAppTraces, Tracing gRPC client error: rpc error” error when Kiali is not able to fetch Traces in Tempo?

This error can occur when use_grpc is true, but the port is not open/accessible.

Why do I see “invalid character ‘p’ after top-level value” error when Kiali is not able to fetch Traces in Tempo?

The Tempo URL is set in internal_url, but the configuration in Kiali CR for external_services.tracing.provider is not tempo.

Error 503

Why do I see “Error fetching traces. AxiosError: Request failed with status code 503” error when Kiali is not able to fetch Traces from Tempo?

This error can occur for several reasons, but it usually means that the internal URL is not the right Tracing API.

Note that Grafana Tempo can also expose a Jaeger API, but the right url needs to be set in the Kiali CR pointing to the Jaeger endpoint.

If that is not the issue, here there are some troubleshooting steps:

Expand the messages icon to find more information about the error.
In the Mesh page, check that the tracing provider is reachable.
In the Mesh page, check the configuration for the tracing provider. Verify the URLs are correct.
Verify the provider (jaeger/tempo) matches the internal/external URL that is configured.
Review the Kiali logs and check for specific tracing errors. Might be helpful to set the log level to debug.
When the log level is set to debug, Kiali will log the complete trace query. It might be useful to test it from a cURL to verify if that is reachable from the Kiali pod and it has results.

Sometimes Tempo is configured outside the Kiali namespace, so there might be additional issues like reachability, certificates setup, etc.

Why can’t I see the link “View in Tracing” when using Tempo?

Some settings need to be configured in order to enable the external_url.

When tempo is set in the Kiali CR external_services.tracing.provider, the default url_format is grafana, and external_services.tracing.external_url needs to be set accordingly.

tracing:
  provider: "tempo"
  external_url: "http://external-grafana-url"
  tempo_config:
    url_format: "grafana"

When url_format is set to jaeger, the external_services.tracing.external_url needs to be set as well:

tracing:
  provider: "tempo"
  external_url: "https://tempo-tempo-query-frontend-tempo.apps-crc.testing/"
  tempo_config:
    url_format: "jaeger"

When url_format is set to openshift, there are additional parameters to set:

tracing:
  provider: "tempo"
  external_url: "https://console-openshift-console.apps-crc.testing/"
  tempo_config:
    name: "sample"
    namespace: "tempo"
    tenant: "default"
    url_format: "openshift"

Where:

name: is the name of the Tempo instance
namespace where the Tempo instance is installed
tenant: The tenant name where the traces are sent

View in Tracing

For OSSMC, when the tracing plugin is enabled, it will redirect automatically to the Tracing UI plugin.

Why can’t I see traces and there are no errors?

First thing to verify will be if Istio is correctly configured to send traces and verify in the Tracing backend if traces do exist.

If the tracing is configured correctly, verify in the tracing backend if there are traces for the services in the Mesh that you are expecting to have traces.

By default, Kiali will search for the service name using service.namespace, but if the traces are create within the namespace selector, the following CR setting should be changed:

tracing:
  namespace_selector: false

For further Tempo configuration options, take a look at the Tempo configuration page

How do I modify the trace limit?

The trace limit can be changed from the UI, and it is available as a Display menu option:

Trace limit

The default value (Set to 100) can be modified in the Kiali CR setting kiali_feature_flags.ui_defaults.tracing.limit.

Why do I see a Jaeger gRPC client error: i/o timeout?

If you are using Tempo Operator 0.20, there is a bug where the gRPC port is closed and Kiali shows an error like:

Could not fetch traces.
GetAppTraces, Jaeger GRPC client error: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 172.30.14.34:16685: i/o timeout"

Other common causes are that the gRPC port is not exposed by the tracing Service, is blocked by a NetworkPolicy/firewall, or that internal_url is pointing to the wrong host/port.

7.4 - General

Questions about Kiali architecture, access, perf, etc…

How do I determine what version I am running?

There are several components within the Istio/Kiali infrastructure that have version information.

To see the Kiali version for the instance running your UI: Go to the Help dropdown menu found at the top-right of the Kiali Console window and select “About”. This will pop up the About dialog box which displays detailed information for the current Kiali instance. From here you can also link to the Mesh page.

Help dropdown menu:

The Kiali About box:

About Box

To see version information for the infrastruction components in your mesh: Go to the main menu and select the “Mesh” page option. This will bring you to a graphical representation of your mesh. The default side-panel will present a summary of the infrastructure components and, if possible to determine, their versions. This will include things like Istio, Prometheus, etc. You can also select graph nodes to see any further details that may be available for that component.

Mesh Page

You can also get much of this same version information in JSON format. From the command line, run something like curl to obtain the version information from the /api endpoint. For example, expose Kiali via port-forwarding so curl can access it:

kubectl port-forward -n istio-system deploy/kiali 20001:20001

And then request the version information via curl:

curl http://localhost:20001/kiali/api

The version information will be provided in a JSON format such as this:

{
  "status": {
    "Kiali commit hash": "c17d0550cfb033900c392ff5813368c1185954f1",
    "Kiali container version": "v1.74.0",
    "Kiali state": "running",
    "Kiali version": "v1.74.0",
    "Mesh name": "Istio",
    "Mesh version": "1.19.0"
  },
  "externalServices": [
    {
      "name": "Istio",
      "version": "1.19.0"
    },
    {
      "name": "Prometheus",
      "version": "2.41.0"
    },
    {
      "name": "Kubernetes",
      "version": "v1.27.3"
    },
    {
      "name": "Grafana"
    },
    {
      "name": "Jaeger"
    }
  ],
  "warningMessages": [],
  "istioEnvironment": {
    "isMaistra": false,
    "istioAPIEnabled": true
  }
}

Obtain the container image being used by the Kiali Server pod:

kubectl get pods --all-namespaces -l app.kubernetes.io/name=kiali -o jsonpath='{.items..spec.containers[*].image}{"\n"}'

This will result in something like: quay.io/kiali/kiali:v1.74.0

Obtain the container image being used by the Kiali Operator pod:

kubectl get pods --all-namespaces -l app.kubernetes.io/name=kiali-operator -o jsonpath='{.items..spec.containers[*].image}{"\n"}'

This will result in something like: quay.io/kiali/kiali-operator:v1.74.0

Obtain the container image being used by the istiod pod:

kubectl get pods --all-namespaces -l app=istiod -o jsonpath='{.items..spec.containers[*].image}{"\n"}'

This will result in something like: gcr.io/istio-release/pilot:1.19.0

If Kiali and/or Istio are installed via helm charts, obtain the helm chart version information:

helm list --all-namespaces

As an example, if you installed Kiali Operator via helm, this will result in something like:

NAME             NAMESPACE        REVISION   UPDATED                                   STATUS     CHART                   APP VERSION
kiali-operator   kiali-operator   1          2023-09-26 09:52:21.266593138 -0400 EDT   deployed   kiali-operator-1.74.0   v1.74.0

Why is the Workload or Application Detail page so slow or not responding?

We have identified a performance issue that happens while visiting the Workload or Application detail page, related to discovering metrics in order to show custom dashboards. Both Kiali and Prometheus may run out of memory.

The immediate workaround is to disable dashboards discovery:

spec:
  external_services:
    custom_dashboards:
      discovery_enabled: "false"

It’s also recommended to consider a more robust setup for Prometheus, like the one described in this Istio guide (see also this Kiali blog post), in order to decrease the metrics cardinality.

What do I need to run Kiali in a private cluster?

Private clusters have higher network restrictions. Kiali needs your cluster to allow TCP traffic between the Kubernetes API service and the Istio Control Plane namespace, for both the 8080 and 15000 ports. This is required for features such as Health and Envoy Dump to work as expected.

Make sure that the firewalls in your cluster allow the connections mentioned above.

Check section Google Kubernetes Engine (GKE) Private Cluster requirements in the Installation Guide.

Open an issue if you have a private cluster with a different provider than GKE. We’ll try to accommodate your scenario and document it for future users.

Does Kiali support Internet Explorer?

No version of Internet Explorer is supported with Kiali. Users may experience some issues when using Kiali through this browser.

To have the best Kiali experience you need a supported browser.

Kiali does not work - What do i do?

If you are hitting a problem, whether it is listed here or not, do not hesitate to open a GitHub Discussion to ask about your situation. If you are hitting a bug, or need a feature, you can vote (using emojis) for any existing bug or feature request found in the GitHub Issues. This will help us prioritize the most needed fixes or enhancements. You can also create a new issue.

See also the Community page which lists more contact channels.

How do I obtain the logs for Kiali?

Kiali operator logs can be obtained from within the Kiali operator pod. For example, if the operator is installed in the kiali-operator namespace:

KIALI_OPERATOR_NAMESPACE="kiali-operator"
kubectl logs -n ${KIALI_OPERATOR_NAMESPACE} $(kubectl get pod -l app=kiali-operator -n ${KIALI_OPERATOR_NAMESPACE} -o name)

Kiali server logs can be obtained from within the Kiali server pod. For example, if the Kiali server is installed in the istio-system namespace:

KIALI_SERVER_NAMESPACE="istio-system"
kubectl logs -n ${KIALI_SERVER_NAMESPACE} $(kubectl get pod -l app=kiali -n ${KIALI_SERVER_NAMESPACE} -o name)

Note that you can configure the logger in the Kiali Server.

Which Istio metrics and attributes are required by Kiali?

To reduce Prometheus storage some users want to customize the metrics generated by Istio. This can break Kiali if the pruned metrics and/or attributes are used by Kiali in its graph or metric features.

Kiali currently requires the following metrics and attributes:

Metric	Notes
container_cpu_usage_seconds_total	used to graph cpu usage in the control plane overview card
container_memory_working_set_bytes	used to graph memory usage in the control plane overview card
process_cpu_seconds_total	used to graph cpu usage in the control plane overview card (if the container metric is not available); used in the Istiod application metrics dashboard
process_resident_memory_bytes	used to graph memory usage in the control plane overview card (if the container metric is not available)

Istio metrics and attributes:

Metric	Notes
istio_build	used to display ztunnel version information
istio_requests_total	used throughout Kiali and the primary metric for http/grpc request traffic
istio_request_bytes_bucket	used in metric displays to calculate throughput percentiles
istio_request_bytes_count	used in metric displays to calculate throughput avg
istio_request_bytes_sum	used throughout Kiali for throughput calculation
istio_request_duration_milliseconds_bucket	used throughout Kiali for response-time calculation
istio_request_duration_milliseconds_count	used throughout Kiali for response-time calculation
istio_request_duration_milliseconds_sum	used throughout Kiali for response-time calculation
istio_request_messages_total	used throughout Kiali for grpc sent message traffic
istio_response_bytes_bucket	used in metric displays to calculate throughput percentiles
istio_response_bytes_count	used in metric displays to calculate throughput avg
istio_response_bytes_sum	used throughout Kiali for throughput calculation
istio_response_messages_total	used throughout Kiali for grpc received message traffic
istio_tcp_connections_closed_total	used in metric displays
istio_tcp_connections_opened_total	used in metric displays
istio_tcp_received_bytes_total	used throughout Kiali for tcp received traffic
istio_tcp_sent_bytes_total	used throughout Kiali for tcp sent traffic
pilot_info	used as discovery metric for the Istiod dashboard
pilot_proxy_convergence_time_sum	used in control plane overview card to show the average proxy push time
pilot_proxy_convergence_time_count	used in control plane overview card to show the average proxy push time; used in the Istiod application metrics dashboard
pilot_services	used in the Istiod application metrics dashboard
pilot_xds	used in the Istiod application metrics dashboard
pilot_xds_pushes	used in the Istiod application metrics dashboard
workload_manager_active_proxy_count	used for ztunnel workload manager active proxy count

Attribute	Metric	Notes
app	istio_tcp_received_bytes_total	used for filtering ztunnel traffic in TCP queries; also included in TCP traffic groupBy clauses
	istio_tcp_sent_bytes_total	used for filtering ztunnel traffic in TCP queries; also included in TCP traffic groupBy clauses
connection_security_policy	istio_requests_total	used only when graph Security display option is enabled
	istio_tcp_received_bytes_total	used only when graph Security display option is enabled
	istio_tcp_sent_bytes_total	used only when graph Security display option is enabled
destination_canonical_revision	all
destination_canonical_service	all
destination_cluster	all
destination_principal	istio_requests_total	used only when graph Security display option is enabled
	istio_request_messages_total
	istio_response_messages_total
	istio_tcp_received_bytes_total
	istio_tcp_sent_bytes_total
destination_service	all
destination_service_name	all
destination_service_namespace	all
destination_workload	all
destination_workload_namespace	all
grpc_response_status	istio_requests_total	used only when request_protocol=“grpc”
	istio_request_bytes_sum
	istio_request_duration_milliseconds_bucket
	istio_request_duration_milliseconds_sum
	istio_response_bytes_sum
reporter	all	both “source” and “destination” metrics are used by Kiali
request_operation	istio_requests_total	used only when request classification is configured. “request_operation” is the default attribute, it is configurable.
	istio_request_bytes_sum
	istio_response_bytes_sum
request_protocol	istio_requests_total
	istio_request_bytes_sum
	istio_response_bytes_sum
response_code	istio_requests_total
	istio_request_bytes_sum
	istio_request_duration_milliseconds_bucket
	istio_request_duration_milliseconds_sum
	istio_response_bytes_sum
response_flags	istio_requests_total
	istio_request_bytes_sum
	istio_request_duration_milliseconds_bucket
	istio_request_duration_milliseconds_sum
	istio_response_bytes_sum
source_canonical_revision	all
source_canonical_service	all
source_cluster	all
source_principal	istio_requests_total
	istio_request_messages_total
	istio_response_messages_total
	istio_tcp_received_bytes_total
	istio_tcp_sent_bytes_total
source_workload	all
source_workload_namespace	all

Envoy metrics:

Metric	Notes
envoy_cluster_upstream_cx_active	used in workload details
envoy_cluster_upstream_rq_total	used in workload details
envoy_listener_downstream_cx_active	used in workload details
envoy_listener_http_downstream_rq	used in workload details
envoy_server_memory_allocated	used in workload details
envoy_server_memory_heap_size	used in workload details
envoy_server_uptime	used in workload details

What is the License?

See here for the Kiali license.

Kiali can be told to restrict the namespaces users can see via the Kiali CR spec.deployment.discovery_selectors field. If there are no discovery selectors defined, Kiali will allow all namespaces unless deployment.cluster_wide_access is false, in which case only Kiali’s own namespace and the Istio control plane namespace will be accessible. If a namespace does not match one of the discovery selectors defined in the Kiali CR spec.deployment.discovery_selectors field at the time Kiali is installed by the operator it will not be visible in the Namespace Selection dropdown; if a new namespace is created after Kiali is installed and that namespace matches one of the discovery selectors, it will only be visible in the Namespace Selection dropdown after the operator creates the necessary Roles for the Kiali Server and restarts the Kiali Server pod (see Operator Namespace Watching). See the Namespace Management documentation for more information.

Note that Istio has its own set of optional discovery selectors that can be configured in the Istio MeshConfig discoverySelectors field, but these Istio discovery selectors are ignored by Kiali.

Kiali also caches namespaces by default for 10 seconds. Therefore, it might take up to the number of seconds specified by spec.kubernetes_config.cache_token_namespace_duration in order for a newly added namespace to be seen by Kiali.

Workload “is not found as” messages

Kiali queries Deployment ,ReplicaSet, ReplicationController, DeploymentConfig, StatefulSet, Job and CronJob controllers. Deployment, ReplicaSet and StatefulSet are always queried, but ReplicationController, DeploymentConfig, Job and CronJobs are excluded by default for performance reasons.

To include them, update the list of excluded_workloads from the Kiali config.

#    ---
#    excluded_workloads:
#    - "CronJob"
#    - "DeploymentConfig"
#    - "Job"
#    - "ReplicationController"
#

An empty list will tell Kiali to query all type of known controllers.

Why Health is not available for services using TCP protocol?

This refers to Service resources. Not Workloads, nor Applications.

Health for Services is calculated based on success rate of traffic. The traffic of HTTP and GRPC protocols is request based and it is possible to inspect each request to check and extract response codes to determine how many requests succeeded and how many erred.

However, HTTP is a widely known protocol. Applications may use other less known protocols to communicate. For these cases, Istio logs the traffic as raw TCP (an opaque sequence of bytes) and is not analyzed. Thus, for Kiali it is not possible to know if any traffic have failed or succeeded and reports Health as unavailable.

Why are the control plane metrics missing from the control plane card?

The control plane metrics are fetched from the Prometheus configured in Kiali.

Kiali will fetch the memory and the CPU metrics related to the Istiod container (discovery) first and will fallback to the metrics related to the istiod process if it couldn’t find the container metrics. If the required metrics are not found then Kiali can not display the related charts or data.

The metrics used are:

Metric	Notes
container_cpu_usage_seconds_total	used for Istiod’s discovery container CPU metric
container_memory_working_set_bytes	used for Istiod’s discovery container memory metric
process_cpu_seconds_total	used for Istiod process CPU metric
process_resident_memory_bytes	used for Istiod process memory metric

7.5 - Graph

Questions about Kiali’s graph, mini-graph, node-detail-graph, or general topology views.

Why is my graph empty?

There are several reasons why a graph may be empty. First, make sure you have selected at least one namespace. Kiali will look for traffic into, out of, and within the selected namespaces. Another reason is that Istio is not actually generating the expected telemetry. This is typically an indication that workload pods have not been injected with the Istio sidecar proxy (Envoy proxy). But it can also mean that there is an issue with Prometheus configuration, and it is not scraping metrics as expected. To verify that telemetry is being reported to Prometheus, see this FAQ entry.

The primary reason a graph is empty is just that there is no measurable request traffic for the selected namespaces, for the selected time period. Note that to generate a request rate, at least two requests must be recorded, and that Kiali only records request rates >= .01 request-per-minute. Check your selection in the Duration dropdown, if it is small, like 1m, you may need to increase the time period to 5m or higher.

You can enable the “Idle Edges” Display option to include request edges that previously had traffic, but not during the requested time period. This is disabled by default to present a cleaner graph, but can be enabled to get a full picture of current and previous traffic.

Older versions of Kiali may show an empty graph for shorter duration options, depending on the Prometheus globalScrapeInterval configuration setting. For more, see this FAQ entry.

Why is my Duration dropdown menu missing entries?

The Duration menu for the graph, and also other pages, does not display invalid options based on the Prometheus configuration. Options greater than the tsdbRetentionTime do not make sense. For example, if Prometheus stores 7 days of metrics then Kiali will not show Duration options greater than 7 days.

More recently, Kiali also considers globalScrapeInterval. Because request-rate calculation requires a minimum of two data-points, Duration options less than 2 times the globalScrapeInterval will not be shown. For example, if Prometheus scrapes every 1m, the 1m Duration option will not be shown. Note that the default globalScrapeInterval for Helm installs of Prometheus is 1m (at the time of this writing).

Why are my TCP requests disconnected in the graph?

Some users are surprised when requests are not connected in the graph. This is normal Istio telemetry for TCP requests if mTLS is not enabled. For HTTP requests, the requests will be connected even without MTLS, because Istio uses headers to exchange workload metadata between source and destination. With the disconnected telemetry you will see an edge from a workload to a terminal service node. That’s the first hop. And then another edge from “Unknown” to the expected destination service/workload. In the graph below, this can be seen for the requests from myapp to redis and mongodb:

Disconnected graph for non-mTLS TCP requests

Why is my external HTTPS traffic showing as TCP?

Istio can’t recognize HTTPS request that go directly to the service, the reason is that these requests are encrypted and are recognized as TCP traffic.

You can however configure your mesh to use TLS origination for your egress traffic. This will allow to see your traffic as HTTP instead of TCP.

Why is the graph badly laid out?

The layout for Kiali Graph may render differently, depending on the data to display (number of graph nodes and their interactions) and it’s sometimes difficult, not to say impossible, to have a single layout that renders nicely in every situation. That’s why Kiali offers a choice of several layout algorithms. However, we may still find some scenarios where none of the proposed algorithms offer a satisfying display. If Kiali doesn’t render your graph layout in a satisfactory manner please switch to another layout option. This can be done from the Graph Toolbar located on the bottom left of the graph. Note that you can select different layouts for the whole graph, and for inside the namespace boxes.

If Kiali doesn’t produce a good graph for you, don’t hesitate to open an issue in GitHub or reach out via the usual channels.

Why are there many unknown nodes in the graph?

In some situations you can see a lot of connections from an “Unknown” node to your services in the graph, because some software external to your mesh might be periodically pinging or fetching data. This is typically the case when you setup Kubernetes liveness probes, or have some application metrics pushed or exposed to a monitoring system such as Prometheus. Perhaps you wouldn’t like to see these connections because they make the graph harder to read.

From the Graph page, you can filter them out by typing node = unknown in the Graph Hide input box.

Graph Hide

For a more definitive solution, you could have these endpoints (like /health or /metrics) exposed on a different port and server than your main application, and to not declare this port in your Pod’s container definition as containerPort. This way, the requests will be completely ignored by the Istio proxy, as mentioned in Istio documentation (at the bottom of that page).

Why do I have missing edges?

Kiali builds the graph from Istio’s telemetry. If you don’t see what you expect it probably means that it has not been reported in Prometheus. This usually means that:

1- The requests are not actually sent.

2- Sidecars are missing.

3- Requests are leaving the mesh and are not configured for telemetry.

For example, If you don’t see traffic going from node A to node B, but you are sure there is traffic, the first thing you should be doing is checking the telemetry by querying the metrics, for example, if you know that MyWorkload-v1 is sending requests to ServiceA try looking for metrics of the type:

istio_requests_total{destination_service="ServiceA"}

If telemetry is missing then it may be better to take it up with Istio.

Which lock icons should I see when I enable the Kiali Graph Security Display option?

Sometimes the Kiali Graph Security Display option causes confusion. The option is disabled by default for optimal performance, but enabling the option typically adds nominal time to the graph rendering. When enabled, Kiali will determine the percentage of mutual TLS (mTLS) traffic on each edge. Kiali will only show lock icons on edges with traffic for edges that have > 0% mTLS traffic.

Kiali determines the mTLS percentage for the edges via the connection_security_policy attribute in the Prometheus telemetry. Note that this is destination telemetry (i.e. reporter="destination").

Why can’t I see traffic leaving the mesh?

See Why do I have missing edges?, and additionally consider whether you need to create a ServiceEntry (or several) to allow the requests to be mapped correctly.

You can check this article on how to visualize your external traffic in Kiali for more information.

Why do I see traffic to PassthroughCluster?

Requests going to PassthroughCluster (or BlackHoleCluster) are requests that did not get routed to a defined service or service entry, and instead end up at one of these built-in Istio request handlers. See Monitoring Blocked and Passthrough External Service Traffic for more information.

Unexpected routing to these nodes does not indicate a Kiali problem, you’re seeing the actual routing being performed by Istio. In general it is due to a misconfiguration and/or missing Istio sidecar. Less often but possible is an actual issue with the mesh, like a sync issue or evicted pod.

Use Kiali’s Workloads list view to ensure sidecars are not missing. Use Kiali’s Istio Config list view to look for any config validation errors.

How do I inspect the underlying metrics used to generate the Kiali Graph?

It is not uncommon for the Kiali graph to show traffic that surprises the user. Often the thought is that Kiali may have a bug. But in general Kiali is just visualizing the metrics generated by Istio. The next thought is that the Istio telemetry generation may have a bug. But in general Istio is generating the expected metrics given the defined configuration for the application.

To determine whether there is an actual bug it can be useful to look directly at the metrics collected by and stored in the Prometheus database. Prometheus provides a basic console that can be opened using the istioctl dashboard command:

> istioctl dashboard prometheus

The above command, assuming Istio and Prometheus are in the default istio-system namespace, should open the Prometheus console in your browser.

Kiali uses a variety of metrics but the primary request traffic metrics for graph generation are these:

istio_requests_total
istio_tcp_sent_bytes_total

The Prometheus query language is very rich but a few basic queries is often enough to gather time-series of interest.

Here is a query that returns time-series for HTTP or GRPC requests to the reviews service in Istio’s BookInfo sample demo:

istio_requests_total{reporter="source", destination_service_name="reviews"}

And here is an example of the results:

Prometheus Console - all attributes

The query above is good for dumping all of the attributes but it can be useful to aggregate results by desired attributes. The next query will get the request counts for the reviews service broken down by source and destination workloads:

sum(istio_requests_total{reporter="source", destination_service_name="reviews"}) by (source_workload, destination_workload)

Prometheus Console - aggregation

The first step to explaining your Kiali graph is to inspect the metrics used to generate the graph. Kiali devs may ask for this info when working with you to solve a problem, so it is useful to know how to get to the Prometheus console.

Why don’t I see response times on my service graph?

Users can select Response Time to label their edges with 95th percentile response times. The response time indicates the amount of time it took for the destination workload to handle the request. In the Kiali graph the edges leading to service nodes represent the request itself, in other words, the routing. Kiali can show the request rate for a service but response time is not applicable to be shown on the incoming edge. Only edges to app, workload, or service entry nodes show response time because only those nodes represent the actual work done to handle the request. This is why a Service graph will typically not show any response time information, even when the Response Time option is selected.

Because Service graphs can show Service Entry nodes the Response Time option is still a valid choice. Edges to Service Entry nodes represent externally handled requests, which do report the response time for the external handling.

Why does my workload graph show service nodes?

Even when Display Service Nodes is disabled a workload graph can show service nodes. Display Service Nodes ensures that you will see the service nodes between two other nodes, providing an edge to the destination service node, and a subsequent edge to the node handling the request. This option injects service nodes where they previously would not be shown. But Kiali will always show a terminal service node when the request itself fails to be routed to a destination workload. This ensures the graph visualizes problem areas. This can happen in a workload or app graph. Of course in a service graph the Display Service Nodes option is simply ignored.

In Kiali v2.x, is the old graph still available?

The “old graph” is the Cytoscape implementation. The “new graph” is the PatternFly implementation. In Kiali v2.0 the new PatternFly graph implementation became the default, and the old Cytoscape implementation was deprecated. In Kiali v2.8, the old Cytoscape graph implementation has been completely removed and is no longer available.

7.6 - Installation

Questions about Kiali installation options or issues.

What is the difference between the operator and the server helm chart?

There are two installation mechanisms from which you can choose when installing the Kiali Server. The first is the recommended installation mechanism - the Kiali Operator. The second installation mechanism is the server helm chart. There are some features that you get with the Kiali Operator that you do not get with the server helm chart. The main differences between the two are mentioned below, though this list may be incomplete.

The operator watches for changes to the multi-cluster remote cluster secret - if a change is detected, the operator automatically rolls out a new Kiali server pod so the server picks up the changes immediately. See the multi-cluster docs for more details.
Both the operator and server helm chart support disabling cluster-wide-access mode (deployment.cluster_wide_access=false) - see the Namespace Management docs for details. However, when using the server helm chart with deployment.cluster_wide_access=false, there are some lifecycle management features that only the operator provides (see the list below), so you may need to manually clean up resources when changing these configurations via the server helm chart. To avoid this, uninstall and reinstall the Kiali server rather than using helm upgrade when modifying these settings.
- The operator automatically cleans up Roles/RoleBindings from namespaces that are no longer accessible when discovery selectors (deployment.discovery_selectors.default) change
- The operator handles transitions when view_only_mode or auth.strategy settings change (RoleBindings are immutable and must be deleted/recreated)
- The operator explicitly cleans up ClusterRole/ClusterRoleBinding resources when switching from cluster_wide_access=true to false
- The operator adds labels to accessible namespaces to mark which Kiali instance manages them

Operator fails due to `cannot list resource "clusterroles"` error

When the Kiali Operator installs a Kiali Server, the Operator will assign the Kiali Server the proper roles/rolebindings so the Kiali Server can access the appropriate namespaces.

The Kiali Operator will check to see if the Kiali CR setting deployment.cluster_wide_access is set to true (which is the default value if it is unset). If it is, this means the Kiali Server is to be given access to all namespaces in the cluster, including namespaces that will be created in the future. In this case, the Kiali Operator will create and assign ClusterRole/ClusterRoleBinding resources to the Kiali Server. But in order to be able to do this, the Kiali Operator must itself be given permission to create those ClusterRole and ClusterRoleBinding resources. When you install the Kiali Operator via OLM, these permissions are automatically granted. However, if you installed the Kiali Operator with the Operator Helm Chart, and if you did so with the value clusterRoleCreator set to false then the Kiali Operator will not be given permission to create cluster roles. In this case, you will be unable to install a Kiali Server if your Kiali CR does not have deployment.cluster_wide_access set to true (and, again, this is the default if unspecified). You will get an error similar to this:

Failed to list rbac.authorization.k8s.io/v1, Kind=ClusterRole:
clusterroles.rbac.authorization.k8s.io is forbidden:
User "system:serviceaccount:kiali-operator:kiali-operator"
cannot list resource "clusterroles" in API group
"rbac.authorization.k8s.io" at the cluster scope

Thus, if you do not give the Kiali Operator the permission to create cluster roles, you must tell the Operator which specific namespaces the Kiali Server can access. When specific namespaces are specified in deployment.discovery_selectors.default, the Kiali Operator will create Role and RoleBindings (not the “Cluster” kinds) and assign them to the Kiali Server.

What values can be set in the Kiali CR?

A Kiali CR is used to tell the Kiali Operator how and where to install a Kiali Server in your cluster. You can install one or more Kiali Servers by creating one Kiali CR for each Kiali Server you want the Operator to install and manage. Deleting a Kiali CR will uninstall its associted Kiali Server.

Most options are described in the pages of the Installation and Configuration sections of the documentation.

If you cannot find some configuration, check the Kiali CR Reference, which briefly describes all available options along with an example CR and all default values. If you are using a specific version of the Operator prior to 1.46, the Kiali CR that is valid for that version can be found in the version tag within the github repository (e.g. Operator v1.25.0 supported these Kiali CR settings).

How to configure some operator features at runtime

First, read Managing configuration of Helm installations in the Installation guide to check if that method works for your case.

Once the Kiali Operator is installed, you can change some of its configuration at runtime in order to utilize certain features that the Kiali Operator provides. These features are configured via environment variables defined in the operator’s deployment.

Only a user with admin permissions can configure these environment variables. You must make sure you know what you are doing before attempting to modify these environment variables. Doing things incorrectly may break the Kiali Operator.

Perform the following steps to configure these features in the Kiali Operator:

Determine the namespace where your operator is located and store that namespace name in $OPERATOR_NAMESPACE. If you installed the operator via helm, it may be kiali-operator. If you installed the operator via OLM, it may be openshift-operators. If you are not sure, you can perform a query to find it:

OPERATOR_NAMESPACE="$(kubectl get deployments --all-namespaces  | grep kiali-operator | cut -d ' ' -f 1)"

Determine the name of the environment variable you need to change in order to configure the feature you are interested in. Here is a list of currently supported environment variables you can set:

ALLOW_AD_HOC_KIALI_NAMESPACE: must be true or false. If true, the operator will be allowed to install the Kiali Server in any namespace, regardless of which namespace the Kiali CR is created. If false, the operator will only install the Kiali Server in the same namespace where the Kiali CR is created - any attempt to do otherwise will cause the operator to abort the Kiali Server installation.
ALLOW_AD_HOC_KIALI_IMAGE: must be true or false. If true, the operator will be allowed to install the Kiali Server with a custom container image as defined in the Kiali CR’s spec.deployment.image_name and/or spec.deployment.image_version. If false, the operator will only install the Kiali Server with the default image. If a Kiali CR is created with spec.deployment.image_name or spec.deployment.image_version defined, the operator will abort the Kiali Server installation.
ALLOW_AD_HOC_CONTAINERS: must be true or false. If true, the operator will be allowed to install additional containers and init containers into the Kiali pod as defined in the Kiali CR’s spec.deployment.additional_pod_containers_yaml and spec.deployment.additional_pod_init_containers_yaml. If false, the operator will not allow any additional containers or init containers to be configured so if a Kiali CR is created with spec.deployment.additional_pod_containers_yaml or spec.deployment.additional_pod_init_containers_yaml defined, the operator will abort the Kiali Server installation.
ALLOW_SECURITY_CONTEXT_OVERRIDE: must be true or false. If true, the operator will be allowed to install the Kiali Server container with a fully customizable securityContext as defined by the user in the Kial CR. If false, the operator will only allow the user to add settings to the securityContext; any attempt to override the default settings in the securityContext will be ignored.
ALLOW_ALL_ACCESSIBLE_NAMESPACES: must be true or false. If true, the operator will allow the user to configure Kiali to access all namespaces in the cluster (i.e. will allow a Kiali CR to have spec.deployment.cluster_wide_access set to true). If false, all Kiali CRs must set spec.deployment.cluster_wide_access to false.
ANSIBLE_DEBUG_LOGS: must be true or false. When true, turns on debug logging within the Operator SDK. For details, see the docs here.
ANSIBLE_VERBOSITY_KIALI_KIALI_IO: Controls how verbose the operator logs are - the higher the value the more output is logged. For details, see the docs here.
ANSIBLE_CONFIG: must be /etc/ansible/ansible.cfg or /opt/ansible/ansible-profiler.cfg. If set to /opt/ansible/ansible-profiler.cfg a profiler report will be dumped in the operator logs after each reconciliation run.
WATCHES_YAML: must be either (a) watches-os.yaml, (b) watches-os-ns.yaml, (c) watches-k8s.yaml or (d) watches-k8s-ns.yaml. If the operator is running on OpenShift, this value must be either (a) or (b); likewise, if the operator is running on a non-OpenShift Kubernetes cluster, this value must be either (c) or (d). If you require the operator to automatically update the Kiali Server with access to new namespaces created in the cluster, set this value to one of the -ns files (e.g. watches-os-ns.yaml or watches-k8s-ns.yaml). This changes the default behavior of the operator such that it will watch for new namespaces getting created and will automatically set up the Kiali Server with the proper access to the new namespace (if such access is to be granted). This namespace watching is not necessary if spec.deployment.cluster_wide_access is set to true in the Kiali CR.

Store the name of the environment variable you want to change in $ENV_NAME:

ENV_NAME="ANSIBLE_CONFIG"

Store the new value of the environment variable in $ENV_VALUE:

ENV_VALUE="/opt/ansible/ansible-profiler.cfg"

The final step depends on how you installed the Kiali Operator:

The commands below assume you are using OpenShift, and as such use oc. If you are using a non-OpenShift Kubernetes environment, simply substitute all the oc references to kubectl.

If you installed the operator via helm, simply set the environment variable on the operator deployment directly:

oc -n ${OPERATOR_NAMESPACE} set env deploy/kiali-operator "${ENV_NAME}=${ENV_VALUE}"

If you installed the operator via OLM, you must set this environment variable within the operator’s CSV and let OLM propagate the new environment variable value down to the operator deployment:

oc -n ${OPERATOR_NAMESPACE} patch $(oc -n ${OPERATOR_NAMESPACE} get csv -o name | grep kiali) --type=json -p "[{'op':'replace','path':"/spec/install/spec/deployments/0/spec/template/spec/containers/0/env/$(oc -n ${OPERATOR_NAMESPACE} get $(oc -n ${OPERATOR_NAMESPACE} get csv -o name | grep kiali) -o jsonpath='{.spec.install.spec.deployments[0].spec.template.spec.containers[0].env[*].name}' | tr ' ' '\n' | cat --number | grep ${ENV_NAME} | cut -f 1 | xargs echo -n | cat - <(echo "-1") | bc)/value",'value':"\"${ENV_VALUE}\""}]"

How can I inject an Istio sidecar in the Kiali pod?

By default, Kiali will not have an Istio sidecar. If you wish to deploy the Kiali pod with a sidecar, you have to define the sidecar.istio.io/inject=true label in the spec.deployment.pod_labels setting in the Kiali CR. In addition, to ensure the sidecar and Kiali server containers start in the correct order, the Istio annotation proxy.istio.io/config should be defined in the spec.deployment.pod_annotations setting in the Kiali CR. For example:

spec:
  deployment:
    pod_labels:
      sidecar.istio.io/inject: "true"
    pod_annotations:
      proxy.istio.io/config: '{ "holdApplicationUntilProxyStarts": true }'

If you are utilizing CNI in your Istio environment (for example, on OpenShift), Istio will not allow sidecars to work when injected in pods deployed in the control plane namespace, e.g. istio-system. (1) (2) (3). In this case, you must deploy Kiali in its own separate namespace. On OpenShift, you can do this using the following instructions.

Determine what namespace you want to install Kiali and create it. Give the proper permissions to Kiali. Create the necessary NetworkAttachmentDefinition. Finally, create the Kiali CR that will tell the operator to install Kiali in this new namespace, making sure to add the proper sidecar injection label as explained earlier.

NAMESPACE="kialins"

oc create namespace ${NAMESPACE}

oc adm policy add-scc-to-group privileged system:serviceaccounts:${NAMESPACE}

cat <<EOM | oc apply -f -
apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
  name: istio-cni
  namespace: ${NAMESPACE}
EOM

cat <<EOM | oc apply -f -
apiVersion: kiali.io/v1alpha1
kind: Kiali
metadata:
  name: kiali
  namespace: ${NAMESPACE}
spec:
  auth:
    strategy: anonymous
  deployment:
    pod_labels:
      sidecar.istio.io/inject: "true"
EOM

After the operator installs Kiali, confirm you have two containers in your pod. This indicates your Kiali pod has its proxy sidecar successfully injected.

$ oc get pods -n ${NAMESPACE}
NAME                    READY   STATUS    RESTARTS   AGE
kiali-56bbfd644-nkhlw   2/2     Running   0          43s

How can I specify a container image digest hash when installing Kiali Server and Kiali Operator?

To tell the operator to install a specific container image using a digest hash, you must use the deployment.image_digest setting in conjunction with the deployment.image_version setting. deployment.image_version is simply the digest hash code and deployment.image_digest is the type of digest (most likely you want to set this value to sha256). So for example, in your Kiali CR you will want something like this:

spec:
  deployment:
    image_version: 63fdb9a9a1aa8fea00015c32cd6dbb63166046ddd087762d0fb53a04611e896d
    image_digest: sha256

Leaving deployment.image_digest unset or setting it to an empty string will tell the operator to assume the deployment.image_version is a tag.

For those that opt not to use the operator to install the server but instead use the server helm chart, the same deployment.image_version and deployment.image_digest values are supported by the Kiali server helm chart.

As for the operator itself, when installing the operator using its helm chart, the values image.tag and image.digest are used in the same manner as the deployment.image_version and deployment.image_digest as explained above. So if you wish to install the operator using a container image digest hash, you will want to use the image.tag and image.digest in a similar way:

helm install --set image.tag=7336eb77199a4d737435a8bf395e1666b7085cc7f0ad8b4cf9456b7649b7d6ad --set image.digest=sha256 ...and the rest of the helm install options...

How can I use a CSI Driver to expose a custom secret to the Kiali Server?

You first must already have a CSI driver and provider installed in your cluster and a valid SecretProviderClass deployed in the namespace where Kiali is installed.

To mount a secret exposed by the CSI Driver, you can use the custom_secret configuration to supply the CSI volume source on the pod. The Kiali CR reference docs have an example. The Kiali Operator or server helm chart will automatically expose the secret as a volume mount into the container at the specified mount location.

Although Kiali retrieves the secret over the Kubernetes API, mounting the secret is required for the CSI Driver to create the backing Kubernetes secret. Note that the custom_secrets optional flag is ignored when mounting secrets from the CSI provider. The secrets are required to exist - then cannot be optional.

How can I use a secret to pass external service credentials to the Kiali Server?

You can use secrets to store the credentials that Kiali must use to authenticate to external services such as Prometheus. How you configure Kiali is dependent upon whether you install the Kiali Server using the Kiali Operator or the Kiali Server Helm Chart.

When Using Kiali Operator

If you are installing using the Kiali Operator, simply set the credential setting to secret:<secretName>:<secretKey>. For details, see the Kiali CR reference docs.

For example, here is how you can set the bearer token that Kiali will use to authenticate with the Prometheus server.

Create a secret with the token.

kubectl -n istio-system create secret generic my-secret --from-literal=my-cred=abc123

Edit the Kiali CR and specify the token field with the value secret:my-secret:my-cred and specify the type as bearer to indicate that authentication will be done with a bearer token.

spec:
  external_services:
    prometheus:
      auth:
        type: bearer
        token: secret:my-secret:my-cred

At this point, the Kiali Server will soon restart and be reconfigured to authenticate to Prometheus with the given token.

If the secret contains a password, as opposed to a token, set type to basic to indicate that Kiali should authenticate using basic authentication using the given username and password you specify in the configuration:

spec:
  external_services:
    prometheus:
      auth:
        type: basic
        username: my-user-name
        password: secret:my-secret:my-cred

For certificate-based authentication (e.g., mTLS to ACM Observability Service), reference certificate files from a secret containing TLS certificates:

spec:
  external_services:
    prometheus:
      auth:
        type: none  # No bearer token, just mTLS
        cert_file: secret:acm-certs:tls.crt
        key_file: secret:acm-certs:tls.key

Note that you can share a secret across multiple external services if they use the same credentials, or you can create multiple secrets if you need to use different credentials for the different external services.

The secret: pattern works for both simple credential values (tokens, passwords, usernames) and file-based credentials (certificates and keys). For certificate files, the secret key name (e.g., tls.crt, tls.key) will be preserved when mounted.

Note about CA certificates: To configure custom CA certificates that Kiali should trust when connecting to external services over HTTPS, see the TLS Configuration page. CA certificates are configured globally via a ConfigMap, not per-service.

You can use secrets as explained above for the following fields in the Kiali CR:

spec.external_services.grafana.auth.cert_file
spec.external_services.grafana.auth.key_file
spec.external_services.grafana.auth.password
spec.external_services.grafana.auth.token
spec.external_services.grafana.auth.username
spec.external_services.perses.auth.cert_file
spec.external_services.perses.auth.key_file
spec.external_services.perses.auth.password
spec.external_services.perses.auth.username
spec.external_services.prometheus.auth.cert_file
spec.external_services.prometheus.auth.key_file
spec.external_services.prometheus.auth.password
spec.external_services.prometheus.auth.token
spec.external_services.prometheus.auth.username
spec.external_services.tracing.auth.cert_file
spec.external_services.tracing.auth.key_file
spec.external_services.tracing.auth.password
spec.external_services.tracing.auth.token
spec.external_services.tracing.auth.username
spec.external_services.custom_dashboards.prometheus.auth.cert_file
spec.external_services.custom_dashboards.prometheus.auth.key_file
spec.external_services.custom_dashboards.prometheus.auth.password
spec.external_services.custom_dashboards.prometheus.auth.token
spec.external_services.custom_dashboards.prometheus.auth.username
spec.login_token.signing_key

When Using Kiali Server Helm Chart

The Kiali Server Helm Chart supports the same secret:<secretName>:<secretKey> syntax as the Kiali Operator. The Helm chart automatically detects when you use this pattern and mounts the referenced secret into the Kiali pod.

For example, to configure Prometheus authentication using a secret:

Create a secret with your credentials:

kubectl -n istio-system create secret generic my-prometheus-creds --from-literal=password=abc123xyz789

Create a Helm values file that references the secret using the secret: pattern:

external_services:
  prometheus:
    auth:
      type: basic
      username: my-user
      password: "secret:my-prometheus-creds:password"

Install with the Kiali Server Helm Chart using that values file. For example, to install in the istio-system namespace:

helm install -f my-values.yaml -n istio-system kiali-server kiali/kiali-server

The Helm chart will automatically mount the secret and configure Kiali to read the credentials from the mounted file. When you start the Kiali Server, you should see a debug message in its logs that says:

Credential file path configured: [/kiali-override-secrets/prometheus-password/value.txt]

NOTE: You must have enabled logging at the debug level to see the above message in the logs.

For certificate-based authentication, use the same secret: pattern:

external_services:
  prometheus:
    auth:
      type: none
      cert_file: "secret:my-tls-certs:tls.crt"
      key_file: "secret:my-tls-certs:tls.key"

For certificate files, the secret key name (e.g., tls.crt, tls.key) is preserved in the mounted file path.

Service Enabled Conditions: For Grafana, Tracing, Perses, and Custom Dashboards services, credentials are only auto-mounted when the respective service is enabled (e.g., external_services.grafana.enabled=true, external_services.custom_dashboards.enabled=true). Prometheus credentials are always processed regardless of any enabled flag.

Note about CA certificates: To configure custom CA certificates for server verification, see the TLS Configuration page. CA certificates are configured globally via a ConfigMap named <instance-name>-cabundle, not per-service via secrets.

How does Kiali handle automatic credential rotation?

Kiali supports automatic credential rotation without requiring a pod restart. This applies to all secret-backed credentials including tokens, passwords, usernames, and certificate files.

How it works:

Kubernetes Secret Update: When an external system (cert-manager, ACM, OpenShift service CA, etc.) updates a Kubernetes secret, Kubernetes automatically updates the mounted files in the Kiali pod within approximately 60 seconds.
Read-on-Use Pattern: Kiali reads credentials from the mounted files each time they are needed, not just at startup. This means updated credentials are automatically picked up.
No Pod Restart: Because credentials are read dynamically, there’s no need to restart the Kiali pod when secrets are rotated.

Timing expectations:

When a secret or ConfigMap is updated in Kubernetes, there are two phases before Kiali uses the new values:

Kubernetes volume sync (0-60 seconds): The kubelet periodically syncs mounted secrets and ConfigMaps to the pod’s filesystem. By default, this happens every 60 seconds (controlled by the kubelet’s syncFrequency setting). In the worst case, you may wait up to 60 seconds for the files to be updated on disk.
Kiali file detection (near-instant): Once Kubernetes updates the files, Kiali’s filesystem watcher (fsnotify) detects the change immediately and reloads the credentials.

In practice, expect credential updates to take effect within 0-90 seconds after updating the secret, depending on where you are in the kubelet’s sync cycle. If your cluster administrator has configured a different syncFrequency, adjust expectations accordingly.

Which credentials support auto-rotation:

All credentials mounted from secrets support automatic rotation:

Tokens (auth.token)
Passwords (auth.password)
Usernames (auth.username)
Client certificates (auth.cert_file)
Client private keys (auth.key_file)
Login token signing key (login_token.signing_key)

Custom CA bundles are configured via the kiali-cabundle ConfigMap (either the global additional-ca-bundle.pem key or a component-specific key such as openid-server-ca.crt). See the TLS Configuration page for details. The deprecated per-service auth.ca_file setting is ignored. To rotate CA certificates, update the ConfigMap content and Kubernetes will refresh the projected volume automatically.

Note: Credentials specified as literal values in the Kiali CR (not using the secret: pattern) are loaded at startup and do not support automatic rotation.

7.7 - Istio Component Status

Questions about Kiali’s Istio infrastructure health checks.

How can I add one component to the list?

If you are interested in adding one more component to the Istio Component Status tooltip, you have the option to add one new component into the Kiali CR, under the spec.external_services.istio.component_status field.

For each component there, you will need to specify the app label of the deployment’s pods, the namespace and whether is a core component or add-on.

One component is ‘Not found’ but I can see it running. What can I do?

The first thing you should do is check the Kiali CR for the spec.external_services.istio.component_status field (see the reference documentation here)

Kiali looks for a Deployment for which its pods have the app label with the specified value in the CR, and lives in that namespace. The app label name may be changed from the default (app) and it is specified in the spec.istio_labels.app_label_name in the Kiali CR.

Ensure that you have specified correctly the namespace and that the deployment’s pod template has the specified label.

One component is ‘Unreachable’ but I can see it running. What can I do?

Kiali considers one component as Unreachable when the component responds to a GET request with a 4xx or 5xx response code.

The URL where Kiali sends a GET request to is the same as it is used for the component consumption. However, Kiali allows you to set a specific URL for health check purposes: the health_check_url setting.

In this example, Kiali uses the Prometheus url for both metrics consumption and health checks.

external_services:
  prometheus:
    url: "http://prometheus.istio-system:9090"

In case that the prometheus.url endpoint doesn’t return 2XX/3XX to GET requests, you can use the following settings to specify which health check URL Kiali should use:

external_services:
  prometheus:
    health_check_url: "http://prometheus.istio-system:9090/healthz"
    url: "http://prometheus.istio-system:9090"

Please read the Kiali CR Reference for more information. Each external service component has its own health_check_url and is_core setting to tailor the experience in the Istio Component Status feature.

7.8 - Performance and Scalability

Questions about Kiali Performance measurements and improvements.

What are some Tips for working with a large mesh?

It can be an observability challenge to work with a large mesh. Here are a few things that can be done to improve the situation.

Resources and Connectivity

Before talking about Kiali features, it is important to understand that Kiali’s performance is dependent on the performance and responsiveness of your metrics database (typically Prometheus), and your tracing store, for installations using tracing. For Prometheus scalability tips, see Prometheus Tuning. See Tempo Tuning if using Tempo for your trace store.

Only when query performance for metrics and traces is good, can Kiali respond in a reasonable way. So, it is also important to provide sufficient connectivity for the API calls to return information in a timely way.

Manual Refresh

By default, Kiali will immediately attempt to populate the page, the default being the Overview page. After the initial page is rendered, most pages will automatically refresh, based on the setting in the “Refresh Interval” dropdown. The default is every 60s. For a large mesh, even the initial page load can be slow, and it can be frustrating if you have to wait for the page to render before being able to enter desired options and/or filters, and then ask for another refresh.

In the dropdown, Kiali offers the “Manual” refresh setting. If selected, Kiali will not refresh the page on a timer. Kiali also offers a “Pause” setting. “Pause” also prevents a timed refresh, but it will refresh on an option or filter change. “Manual” will only refresh on a manual click of the refresh button. To ensure that even the initial page load is avoided, the default can be set in the Kiali CR:spec.kiali_feature_flags.ui_defaults.refresh_interval: manual. With this setting it is possible to “batch” settings changes. For example, when working with the graph you could choose namespaces, update Display settings, and also change Traffic settings, all before rendering the graph.

URL Bookmarks

Kiali pages store most, if not all, of their settings as URL query parameters. So, it can be useful to bookmark pages you’ve configured with desired options and filters. By visiting the bookmarked page those options and filters will be applied immediately.

Large Graphs

Working with large graphs is difficult. A graph does not have to be very large before it becomes complicated and/or dense. Here are a few suggestions.

Tips to reduce graph size and speed up generation

Limit the namespaces selected.
- Each requested namespace is like its own graph request, and then each resulting namespace graph is “stitched” together.
- This may not be possible, in some mesh designs even a single namespace is very populated.
Reduce the protocols selected
- Using the Traffic dropdown, only fetch TCP or HTTP, not both. Different queries are performed for the different protocols.
- In Ambient, you can also choose between ztunnel and waypoint telemetry. This can reduce the number of queries and/or the size of your graph.
Prefer smaller Duration dropdown values.
- The larger the duration, the more metric data that must be processed.
Enable response time edge labels only after minimizing the size of your graph.
- This requires extra queries against Prometheus histograms, and can be expensive.
Enable the Security Display option only after minimizing the size of your graph.
- This requires extra Prometheus queries.
Disable the Service Nodes Display option, if not needed.
- This is enabled by default, and provides valuable routing information, but it does also add extra nodes and edges.
Disable the Virtual Services Display option, if not needed.
- This will take away some of the graph decoration but stops the need to interact with k8s API/objects, which can be heavy.
Prefer workload graph type
- This graph type often renders more quickly than other graph types.
Only enable Operation Nodes as needed.
- This option is very valuable when using request classification, but does require extra queries, and does add extra nodes and edges.

Tips for manipulating your graph

After your graph is generated and rendered in the UI, there are client-side ways to improve your visualization:

Graph Find and Hide
- Find and Hide are very valuable tools. Both use the same simple query language, described in detail in the on-screen help (click the info icon next to the inputs, on the toolbar). It is highly recommended to become familiar with this feature, very simple expressions can be useful.
- Find will highlight the nodes and edges that match the expression. This can help locate nodes and edges in a large graph (or even a small graph).
- Hide will temporarily remove the matching nodes and edges. This can effectively clean up a large graph into a very focused view.
- It is possible to pre-define Find and Hide expressions in your Kiali CR. These pre-defined expressions can even be configured to be applied automatically.
  - For more, see find_options and hide_options in the Kiali CR Reference.
Layouts
- Kiali provides multiple layouts. Many graphs looks best using the default layout, but others may improve using a different layout.
- Layouts are available by clicking the on-screen icons at the bottom of the graph.

Mini-Graphs

One way to avoid a large graph is to avoid it completely. Instead, navigate to a specific object of interest. The detail page offers a mini-graph, centered on the specific service, app or workload. Clicking a node on the mini-graph navigates to that node’s detail page. Mini-graphs tend to generate quickly because they are much more specific than a namespace graph. You can also navigate from the mini-graph back to the main graph, or a node graph. The node graph is similar to the mini-graph but offers all of the main graph options.

Graph Caching

Graph caching was added starting with Kiali v2.21. It caches, with background re-compute, the most recent namespace graph per session. The initial graph is generated synchonously, and is placed in the cache. Due to the vast number of options, there is no easy way to pre-compute the initial graph before it is requested by the user. The graph will then be re-computed in the background with a frequency related to the refresh interval set by the user on the graph page. The larger the interval the less often the graph is re-computed. Subsequent requests for the same graph will return the most recently computed cache entry, and so it should return quickly. The cache entry is evicted if the graph options change, or if the cache is not hit for the configured “inactivity_timeout” duration. This allows a user to navigate away in the UI, and when returning to the graph find that it is ready and updated.

A few notes:

Different tabs in the same browser, for the same user, share a session and therefore a cache entry.
Anonymous login strategy is session-less. and so all anonymous logins share a cache entry.
This feature is considered beta-level, and it’s configuration is not yet part of the CRD schema. Here is the relevant configuration, with the default settings:

spec:
  kiali_internal:
    graph_cache:
      enabled: true
      inactivity_timeout: "10m"
      max_cache_memory_mb: 1000
      refresh_interval: "60s"

What performance and scalability measurements are done?

Performance tests are conducted on setups with 10, 50, 200, 300, 500, and 800 namespaces. Each namespace contains:

1 Service
2 Workloads
2 Istio configurations

What improvements have been made to Kiali’s performance in recent versions?

Performance data is collected using automated performance tests on various setups, ensuring a comprehensive evaluation of improvements. Since the release of Kiali v1.80, significant performance enhancements have been implemented, resulting in up to a 5x improvement in page load times. The performance improvements were achieved by reducing the number of requests made from the Kiali UI to the services. Instead of multiple requests, the process was streamlined to unify these into a single request per cluster. The enhanced performance significantly reduces the time users spend waiting for pages to load, leading to a more efficient and smooth user experience.

Performance Improvements Matrix Per Kiali Version And Section

Kiali	Section	Improvements
1.80	Graph Page	Validations
1.81	Overview Page	mTLS, Metrics, Health
1.82	Applications List	Overall loading
1.83	Workloads List, Services List	Overall loading

These improvements make Kiali more responsive and efficient, particularly in environments with a large number of namespaces, services, and workloads, enhancing usability and productivity.

For a graphical representation of the performance improvements between Kiali v1.79 (before improvements) and v1.85 (after improvements) of the Overview page load times, refer to the chart below:

Kiali Overview Page

7.9 - Validations

Questions about Kiali Validations.

Which formats does Kiali support for specifying hosts?

Istio highly recommends that you always use fully qualified domain names (FQDN) for hosts in DestinationRules. However, Istio does allow you to use other formats like short names (details or details.bookinfo). Kiali only supports FQDN and simple service names as host formats: for example details.bookinfo.svc.cluster.local or details. Validations using the details.bookinfo format might not be accurate.

In the following example it should show the validation of “More than one Destination Rule for the same host subset combination”. Because of the usage of the short name reviews.bookinfo Kiali won’t show the warning message on both destination rules.

apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  name: reviews-dr1
spec:
  host: reviews
  trafficPolicy:
    loadBalancer:
      simple: RANDOM
  subsets:
  - name: v1
    labels:
      version: v1
  - name: v2
    labels:
      version: v2
  - name: v3
    labels:
      version: v3
---
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  name: reviews-dr2
spec:
  host: reviews
  trafficPolicy:
    loadBalancer:
      simple: RANDOM
  subsets:
  - name: v1
    labels:
      version: v1

See the recomendation Istio gives regarding host format: “To avoid potential misconfigurations, it is recommended to always use fully qualified domain names over short names.”

For best results with Kiali, you should use fully qualified domain names when specifying hosts.

8 - Contribution Guidelines

How to contribute to Kiali.

8.1 - How to Contribute

Contribution guidelines to Kiali.

Contributing to the Docs

To contribute to the kiali.io docs see kiali.io on github.

In short you will:

Fork the kiali.io repo on GitHub.
Make your changes and send a pull request (PR).
If you’re not yet ready for a review, add “WIP” to the PR name to indicate it’s a work in progress.
1. Don’t add the Hugo property “draft = true” to the page front matter, it prevents auto-deployment of the content preview.
Wait for the automated PR workflow to do some checks. When it’s ready, you should see a comment like this: deploy/netlify — Deploy preview ready!
Click Details to the right of “Deploy preview ready” to see a preview of your updates.
Continue updating your doc and pushing your changes until you’re happy with the content.

Updating a single page

If you’ve just spotted something you’d like to change while using the docs, there is a shortcut for you:

Click Edit this page in the top right hand corner of the page.
If you don’t already have an up to date fork of the project repo, you are prompted to get one:
1. Click Fork this repository and propose changes or Update your Fork to get an up to date version of the project to edit.
2. The appropriate page in your fork is displayed in edit mode.
Follow the steps above to make, preview, and propose your changes.

Creating an issue

If you’ve found a problem in the docs, but you’re not sure how to fix it yourself, please create an issue in the kiali.io repo. You can also create an issue about a specific page by clicking the Create Documentation Issue button in the top right hand corner of the page.

Contributing to the Code

For code contribution see the kiali project’s CONTRIBUTING page.

8.2 - Development Environment

How to set up a development environment

Introduction

In this section it is explained how to set up a development environment:

As described in Architecture, we would need to have the Kiali dependencies running in an OpenShift or Kubernetes
We will use a port forward to access those services outside the cluster.
We will have the project source running locally. In this case we will set up an IDE.
Bookinfo application example will also be running on our cluster.

development_environment

Prerequisites

Development tools are installed:
- oc or kubectl
- go
- make
- npm
- yarn
- gcc
Kiali source code: We will fork the 3 kiali repositories, and then, clone them in a local folder:
Istio and the required services are running in Minikube or OpenShift. To install it following the above schema, it is possible to use the following scripts (From the Kiali repository):
- hack/istio/install-istio-via-istioctl.sh: Installs the latest Istio release into istio-system namespace along with the Prometheus, Grafana, and Jaeger addons.
- hack/istio/install-bookinfo-demo.sh: Installs the Bookinfo demo that is found in the Istio release that was installed via the hack/istio/install-istio-via-istioctl.sh hack script.
  - Pass in -tg to also install a traffic generator that will send messages periodically into the Bookinfo demo.
  - If using Minikube and the -tg option, make sure you pass in the Minikube profile name via -mp if the profile name is not minikube.

Port forward

Before the setup, we will need to do a port-forward of the services that kiali is using.

We can use the hack/run-kiali.sh script for this purpose. It can work without any options. Pass –help to see the options it takes.

An example to run it following the above schema:

./run-kiali.sh -pg 13000:3000 -pp 19090:9090 -app 8080 -es false -iu http://127.0.0.1:15014

Local Configuration File

The go process will require a configuration to point to these services and other specific configurations. This file will be places in ~/kiali/config.yaml, and referenced later by the GO local process.

api:
  namespaces:
    exclude:
      - istio-operator
      - kube.*
      - openshift.*
      - ibm.*
      - kiali-operator
    label_selector: ""
server:
  address: localhost
  port: 8000
  static_content_root_directory: /home/userTests/kiali-static-files
in_cluster: false
deployment:
  accessible_namespaces: [ "**" ]
extensions:
  iter_8:
    enabled: true
external_services:
  istio:
    istio_canary_revision:
      current: prod
      upgrade: canary
    url_service_version: http://localhost:15014/version
    config_map_name: istio
    istio_identity_domain: svc.cluster.local
  prometheus:
    url: http://localhost:19090
    cache_enabled: true
  tracing:
    enabled: true
    internal_url: http://localhost:16685/jaeger
    external_url: http://localhost:16686/jaeger
    use_grpc: false
    whitelist_istio_system:
    - jaeger-query
    - istio-ingressgateway
    namespace: istio-system
    port: 443
    service: tracing
    auth:
      insecure_skip_verify: false
      password: cTSM/77tNZ0yGw/ZJXkO7IObbemLJjFkCp4GuqLzXIgE8RWrJvWjFViv9Dpu0SguxD3N/oCUPJnyreoHuSCNZ9kFTrHgRl033waUpTAYZPCEzMPw9Rui5C3/o5x4bclHq0IQ8OGr5LuN2L1WCXrEo9iUntPMovbsP1Alqwh0LZ79ztIkObNBNniX1tuo0fM9O53QKSAjGBnK13LFjHC7wXo+mWw1fzHf9x4jib6UDbeuzHfugDS0Mtj4E9QDRHjpPUrh66dVib4kCJ4nMO19BuiIk+OgbNdhBhg3wn1fn7F6+d/i6Mbq/C/OJylSL6ewUVwIvIAmcRM/jdTqdz0w
      type: basic
      use_kiali_token: false
      username: internal
  grafana:
    internal_url: http://localhost:13000
    external_url: http://localhost:13000
    dashboards:
      - name: "Istio Service Dashboard"
        variables:
          namespace: "var-namespace"
          service: "var-service"
      - name: "Istio Workload Dashboard"
        variables:
          namespace: "var-namespace"
          workload: "var-workload"
  custom_dashboards:
    enabled: false
#health_config:
#  rate:
#    - namespace: "alpha"
#      tolerance:
#        - code: "4XX"
#          degraded: 30
#          failure: 50
#          protocol: "http"
#        - code: "5XX"
#          degraded: 30
#          failure: 50
#          protocol: "http"
#    - namespace: "beta"
#      tolerance:
#        - code: "[4]\\d\\d"
#          degraded: 30
#          failure: 40
#          protocol: "http"
#        - code: "[5]\\d\\d"
#          protocol: "http"
auth:
  strategy: anonymous
login_token:
  signing_key: test
kubernetes_config:
  cache_enabled: true
  cache_duration: 300
  cache_namespaces:
    - bookinfo
    - istio-system
  cache_token_namespace_duration: 120
  excluded_workloads: []
kiali_feature_flags:
  istio_injection_action: true
  istio_upgrade_action: false
istio_labels:
  app_label_name: app
  injection_label_name: istio-injection
  injection_label_rev: istio.io/rev
  version_label_name: version

Local Processes

In this section we will start the 3 local processes for kiali:

kiali-core: The backend Go process
kiali-ui: The frontend React process
browser: The Javascript debugger process.

In this example, we will create the configurations in the Jetbrains Golang IDE.

kiali-core

To run the Kiali backend. kiali-core

kiali-ui

In order to forward the requests to the backend propertly, we will need to add the following line in kiali/frontend/package.json:

"proxy": "http://localhost:8000",

kiali-ui

browser

This process is required to debug the frontend. browser

After running the 3 processes, we should be able to access Kiali GUI in localhost:3000

Using VisualStudio Code

To run kiali in a debugger, a file “launch.json” should be created in your local kiali local repo’s .vscode directory (e.g. home/source/kiali/kiali/.vscode/launch.json). The file should look like:

{
  "version": "0.2.0",
  "configurations": [
    {
      "name": "Launch Kiali to use hack script services",
      "type": "go",
      "request": "launch",
      "mode": "debug",
      "program": "${workspaceRoot}/kiali.go",
      "cwd": "${env:HOME}/tmp/run-kiali",
      "args": ["-config", "${env:HOME}/tmp/run-kiali/run-kiali-config.yaml"],
      "env": {
        "KUBERNETES_SERVICE_HOST": "127.0.0.1",
        "KUBERNETES_SERVICE_PORT": "8001",
        "LOG_LEVEL": "trace"
      }     
    }
  ]
}

run-kiali.sh should be started like this:

hack/run-kiali.sh --tmp-root-dir $HOME/tmp --enable-server false

9 - Kiali Integrations

Integration with other tools and platforms.

9.1 - Kiali Backstage Plugin

The Kiali Plugin for Backstage

The Kiali Backstage Plugin provides information about each Service Mesh object related with an entity in Backstage, a framework for building developer portals.

Kiali-Backstage-tab

The plugin has different views to be included in Backstage:

Added as cards to see resource lists
Added as a tab which includes all the predefined Kiali cards
Added as a page, which views are not going to be filtered by entities and offer a full Kiali view

Kiali-Backstage-page

The Kiali Backstage plugin is released as a technology preview in Red Hat Developer Hub.

Documentation

Get Involved

Development guide

9.2 - OSSM Console

OpenShift Service Mesh Console - Dynamic plugin for OpenShift

OpenShift Service Mesh Console (OSSMC) is a Kiali integration for OpenShift Console based on OpenShift dynamic plugins technology. It integrates part of the Kiali UI functionality into the OpenShift Console, providing visibility of the Service Mesh.

OSSMC

OSSMC was first released in September 2022 as a developer preview. It has since be released GA in October 2023.

Documentation

Get Involved

Releases

Release list

10 - AI

Use AI assistants with Kiali (Chatbot) and MCP integrations.

10.1 - Kiali Chatbot

Query Kiali and your service mesh using an AI assistant.

Kiali Chatbot is Kiali’s built-in AI assistant in the Kiali UI. It lets you ask questions about your service mesh and get answers backed by live data from Kiali and its configured backends (Prometheus, tracing, Kubernetes, etc.).

It does not require an external MCP server. Kiali includes its own set of MCP-style tools internally, so the AI can call them without depending on a separate MCP deployment.

Kiali Chatbot

Status

The Kiali chatbot was first released in Kiali version 2.22 and it is in Dev preview.

How does it work

At a high level:

The Kiali UI sends your chat request (prompt + context + selected model) to the Kiali backend.
Kiali selects the configured provider/model from chat_ai.
The provider calls the LLM with a set of internal MCP tools (defined in Kiali under kiali/ai/mcp).
The LLM may request tool calls (e.g. mesh graph, traces, resource details, workload logs, Istio config operations).
Kiali executes those tool calls against Kiali/Kubernetes/Prometheus/tracing backends and returns the final answer, including optional UI navigation actions and documentation citations.

For configuration keys (enable/disable, providers/models, store), see the chat_ai section in the Kiali CR spec.

Kiali Chatbot architecture

Tool schemas (inputs/outputs)

Kiali Chatbot uses internal tools with defined input schemas and structured outputs.

Configuring the Kiali Chatbot

The Kiali Chatbot is disabled by default. To enable it, set chat_ai.enabled: true. When enabled, you will see the chatbot icon in the Kiali UI:

You must also configure at least one provider and model (including an API key), and pick a default provider/model.

Switching model providers

Kiali Chatbot providers and models are configured in chat_ai:

Providers: OpenAI-compatible (type: openai) and Google (type: google).
Models are selected by name (per-provider) and can be enabled/disabled.
API keys can be set inline (not recommended) or via secret:<secret-name>:<key-in-secret>.

Example configuration (showing an OpenAI-compatible provider using Gemini via OpenAI endpoint):

chat_ai:
  enabled: true
  default_provider: "openai"
  providers:
    - name: "openai"
      enabled: true
      description: "OpenAI API Provider"
      type: "openai"
      config: "default"
      default_model: "gemini"
      models:
        - name: "gemini"
          enabled: true
          model: "gemini-2.5-pro"
          description: "Model provided by Google with OpenAI API Support"
          endpoint: "https://generativelanguage.googleapis.com/v1beta/openai"
          key: "secret:my-key-secret:openai-gemini"

You can also select the configured models and providers in the chatbot window:

Kiali Chatbot models

What you can ask

Examples of tasks that work well:

Mesh/namespace topology and summaries (graph, status)
Basic observability questions (metrics, traces)
Troubleshooting workflows (get logs for a workload, identify failing namespaces)

Example prompts

“Show me the mesh graph for namespace bookinfo.”
“Which workloads in istio-system look unhealthy and why?”
“Get traces for service productpage in bookinfo for the last 30m.”

Next step

If you want to use an AI assistant outside the Kiali UI (for example, in an IDE), see Kiali MCP.

10.2 - Kiali Chatbot tools (schemas)

Input/output schemas for the built-in Kiali AI tools.

Kiali Chatbot uses internal MCP-style tools (implemented inside Kiali) to fetch live data and perform safe actions. These are not external MCP server tools.

The tool input schemas are defined in Kiali under kiali/ai/mcp/tools/*.yaml. The tool outputs are JSON structures returned by the Kiali backend and consumed by the model and/or UI.

Tool list

get_action_ui: returns UI navigation actions (buttons/links).
get_citations: returns documentation links relevant to the user query.
get_mesh_graph: returns mesh health/topology summaries (and supporting raw payloads).
get_resource_detail: returns service/workload details or lists (same payload shapes as existing Kiali APIs).
get_pod_performance: returns usage vs requests/limits summary (CPU/memory).
get_traces: returns a compact trace summary (bottlenecks/errors).
get_logs: returns workload/pod logs with optional filtering.
manage_istio_config: list/get/create/patch/delete Istio objects (with a confirmation gate for sensitive actions).

10.3 - Kiali MCP

Expose Kiali capabilities to AI assistants using the Model Context Protocol (MCP).

Kiali MCP is an integration that allows MCP-capable AI assistants to query (and optionally manage) Kiali-related data by calling tools exposed by an MCP server.

The implementation is provided as part of the Kubernetes MCP Server upstream and also for Openshift MCP server. It exposes a kiali toolset (see upstream guide: docs/KIALI.md).

Prerequisites

A reachable Kiali endpoint (Route/Ingress/Service URL).
Kubernetes credentials available to the MCP server (kubeconfig or in-cluster config).

Enable the `kiali` toolset

Create a TOML config file and enable kiali in toolsets.

toolsets = ["core", "kiali"]

[toolset_configs.kiali]
url = "https://kiali.example" # Endpoint/route to reach the Kiali console
# insecure = true  # optional: allow insecure TLS (not recommended in production)
# certificate_authority = "/path/to/ca.crt"  # CA bundle for Kiali's TLS cert

Notes:

If url is https:// and insecure = false, you must provide certificate_authority.
Authentication to Kiali is performed using the server’s Kubernetes credentials (it obtains/uses a bearer token for Kiali calls).

Connect from an MCP client

How you wire this into a specific client depends on the client, but the core idea is the same: start the MCP server with your kubeconfig and your TOML config.

Example (conceptual) command:

kubernetes-mcp-server --config /path/to/config.toml --read-only

Once connected, your assistant can use the Kiali tools (for example: mesh graph, metrics, traces, workload logs) to power a chatbot-like experience outside the Kiali UI (for example, in an IDE).