What version of gRPC and what language are you using?
1.51.0,master
What operating system (Linux, Windows,...) and version?
Linux
What runtime / compiler are you using (e.g. python version or version of gcc)
C++ gcc11
What did you do?
use grpc proxyless in istio,but update routeconfig failed
What did you expect to see?
update routeconfig ok
What did you see instead?
- After the program runs, there is the following error log and grpcdebug info.
[/data/grpc/src/core/ext/xds/xds_route_config.cc:1126][xds_client 0x448a9860] invalid RouteConfiguration outbound|50051||dev-nick.hlddz.svc.cluster.local: INVALID_ARGUMENT: errors validating RouteConfiguration resource: [field:virtual_hosts[2].routes error:no valid routes in VirtualHost; field:virtual_hosts[3].routes error:no valid routes in VirtualHost]
Name Status Version Type LastUpdated
dev-meshgate:50051 ACKED 2023-01-05T08:07:45Z/1425 type.googleapis.com/envoy.config.listener.v3.Listener 12 seconds ago
dev-nick:50051 ACKED 2023-01-05T07:40:47Z/1423 type.googleapis.com/envoy.config.listener.v3.Listener 22 minutes ago
monitor:50051 DOES_NOT_EXIST type.googleapis.com/envoy.config.listener.v3.Listener
outbound|50051||dev-meshgate.hlddz.svc.cluster.local NACKED type.googleapis.com/envoy.config.route.v3.RouteConfiguration
outbound|50051||dev-nick.hlddz.svc.cluster.local NACKED type.googleapis.com/envoy.config.route.v3.RouteConfiguration
- After dumping the configuration file, it is found that the following configuration is used for normal http cgi routing
virtual_hosts {
name: "inner.mcgitest.hlddz.huanle.qq.com:50051"
domains: "inner.mcgitest.hlddz.huanle.qq.com"
domains: "inner.mcgitest.hlddz.huanle.qq.com:50051"
routes {
match {
prefix: "/cgi-bin/CommonMobileCGI/pandoraCgiProxy"
case_sensitive {
value: true
}
}
route {
cluster: "outbound|50051|online|dev-pandoracgiproxy.hlddz.svc.cluster.local"
timeout {
}
retry_policy {
retry_on: "connect-failure,refused-stream,unavailable,cancelled,retriable-status-codes"
num_retries {
value: 2
}
retry_host_predicate {
name: "envoy.retry_host_predicates.previous_hosts"
}
host_selection_retry_max_attempts: 5
retriable_status_codes: 503
}
hash_policy {
header {
header_name: "routekey"
}
}
max_stream_duration {
max_stream_duration {
}
}
}
metadata {
filter_metadata {
key: "istio"
value {
fields {
key: "config"
value {
string_value: "/apis/networking.istio.io/v1alpha3/namespaces/hlddz/virtual-service/dev-pandoracgiproxy"
}
}
}
}
}
decorator {
operation: "dev-pandoracgiproxy.hlddz.svc.cluster.local:50051/cgi-bin/CommonMobileCGI/pandoraCgiProxy*"
}
}
include_request_attempt_count: true
}
- After reading the source code, I found the following logic. When parsing the Routerconfiguration, the match field will be validated. When the configuration contains entries that do not conform to the grpc specification, the entire configuration update will be discarded. The source code is as follows
const envoy_config_route_v3_Route* const* routes =
envoy_config_route_v3_VirtualHost_routes(virtual_hosts[i], &num_routes);
for (size_t j = 0; j < num_routes; ++j) {
ValidationErrors::ScopedField field(errors, absl::StrCat("[", j, "]"));
auto route = ParseRoute(context, routes[j], virtual_host_retry_policy,
rds_update.cluster_specifier_plugin_map,
&cluster_specifier_plugins_not_seen, errors);
if (route.has_value()) vhost.routes.emplace_back(std::move(*route));
}
if (errors->size() == original_error_size && vhost.routes.empty()) {
errors->AddError("no valid routes in VirtualHost");
}
if (envoy_config_route_v3_RouteMatch_has_prefix(match)) {
absl::string_view prefix =
UpbStringToAbsl(envoy_config_route_v3_RouteMatch_prefix(match));
// For any prefix that cannot match a path of the form "/service/method",
// ignore the route.
if (!prefix.empty()) {
// Does not start with a slash.
if (prefix[0] != '/') return absl::nullopt;
std::vector<absl::string_view> prefix_elements =
absl::StrSplit(prefix.substr(1), absl::MaxSplits('/', 2));
// More than 2 slashes.
if (prefix_elements.size() > 2) return absl::nullopt;
// Two consecutive slashes.
if (prefix_elements.size() == 2 && prefix_elements[0].empty()) {
return absl::nullopt;
}
}
type = StringMatcher::Type::kPrefix;
match_string = std::string(prefix);
}
- Our service runs in istio, which is a large heterogeneous system. There are grpc services and regular http cgi services, so the routing configuration may not conform to the grpc specification. Is it possible to just skip this piece of configuration when reading a configuration that does not meet the specifications, instead of discarding the entire configuration update, similar modifications are as follows
for (size_t i = 0; i < num_virtual_hosts; ++i) {
...
const envoy_config_route_v3_Route* const* routes =
envoy_config_route_v3_VirtualHost_routes(virtual_hosts[i], &num_routes);
for (size_t j = 0; j < num_routes; ++j) {
ValidationErrors::ScopedField field(errors, absl::StrCat("[", j, "]"));
auto route = ParseRoute(context, routes[j], virtual_host_retry_policy,
rds_update.cluster_specifier_plugin_map,
&cluster_specifier_plugins_not_seen, errors);
if (route.has_value()) vhost.routes.emplace_back(std::move(*route));
}
if (errors->size() == original_error_size && vhost.routes.empty()) {
// ignore this invalid virtualhost update
continue;
}
}
- After I followed the above modification method, it worked well in my system. I think it may be more friendly to implement it in a heterogeneous operating environment.
grpcdebug info
Name Status Version Type LastUpdated
outbound|50051|online|dev-meshgate.hlddz.svc.cluster.local ACKED 2023-01-05T08:07:45Z/1425 type.googleapis.com/envoy.config.cluster.v3.Cluster 13 seconds ago
outbound|50051|online|dev-nick.hlddz.svc.cluster.local ACKED 2023-01-05T08:07:45Z/1425 type.googleapis.com/envoy.config.cluster.v3.Cluster 15 seconds ago
outbound|50051|online|dev-meshgate.hlddz.svc.cluster.local ACKED 2023-01-05T08:07:45Z/1425 type.googleapis.com/envoy.config.endpoint.v3.ClusterLoadAssignment 13 seconds ago
outbound|50051|online|dev-nick.hlddz.svc.cluster.local ACKED 2023-01-05T08:07:45Z/1425 type.googleapis.com/envoy.config.endpoint.v3.ClusterLoadAssignment 15 seconds ago
dev-meshgate:50051 ACKED 2023-01-05T08:07:45Z/1425 type.googleapis.com/envoy.config.listener.v3.Listener 13 seconds ago
dev-nick:50051 ACKED 2023-01-05T08:07:45Z/1425 type.googleapis.com/envoy.config.listener.v3.Listener 15 seconds ago
outbound|50051||dev-meshgate.hlddz.svc.cluster.local ACKED 2023-01-05T08:07:45Z/1425 type.googleapis.com/envoy.config.route.v3.RouteConfiguration 13 seconds ago
outbound|50051||dev-nick.hlddz.svc.cluster.local ACKED 2023-01-05T08:07:45Z/1425 type.googleapis.com/envoy.config.route.v3.RouteConfiguration 13 seconds ago
Anything else we should know about your project / environment?
Both 1.51.x and the latest master have the above-mentioned problems, and the above-mentioned source code is the code of the latest master referenced
kind/bug lang/c++ priority/P2 untriaged