{"id":756,"date":"2025-05-08T03:44:39","date_gmt":"2025-05-08T03:44:39","guid":{"rendered":"https:\/\/www.sdwan2.com\/?p=756"},"modified":"2025-05-08T06:25:56","modified_gmt":"2025-05-08T06:25:56","slug":"velocloud-virtual-edge-in-alibaba-cloud-with-havip","status":"publish","type":"post","link":"https:\/\/www.sdwan2.com\/index.php\/2025\/05\/08\/velocloud-virtual-edge-in-alibaba-cloud-with-havip\/","title":{"rendered":"Velocloud Virtual Edge HA in Alibaba Cloud with HaVip"},"content":{"rendered":"\n<p><mark style=\"background-color:#fcb900\" class=\"has-inline-color has-black-color\"><strong>Disclaimer: The method, workaround, idea and script in this article is NOT supported by Velocloud, use them on your own risk.<\/strong><\/mark><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Backgroud<\/h2>\n\n\n\n<p>When deploy Velocloud SD-WAN Edge (VCE) hardware, or virtual edge in KVM\/ESXi, High Availability (HA) is supported. However, in public cloud, including Alibaba Cloud, VCE HA is not possible. This is because:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>VCE in HA pair discover each other by Multicast. Multicast is not supported in public cloud.<\/li>\n\n\n\n<li>The HA interface is always automatically assigned with IP address 169.254.2.1 ad 169.254.2.2. This is not possible in public cloud because each VPC comes with it&#8217;s only address block and we cannot assign IP address outside of the VPC address block.<\/li>\n<\/ol>\n\n\n\n<h2 class=\"wp-block-heading\">Alibaba Cloud High-Availability Virtual IP Address (HAVIP)<\/h2>\n\n\n\n<p>Alibaba Cloud has a feature called HaVip which the detail can be found here: <a href=\"https:\/\/www.alibabacloud.com\/help\/en\/vpc\/user-guide\/highly-available-virtual-ip-address-havip\">https:\/\/www.alibabacloud.com\/help\/en\/vpc\/user-guide\/highly-available-virtual-ip-address-havip<\/a><\/p>\n\n\n\n<p>And there are some other vendors support HA with this HaVip feature, this is because those vendors can support configurable IP address of the HA interface and also VRRP communication by unicast (not multicast).<\/p>\n\n\n\n<p>Since VCE cannot support HA communication by unicast, this article is about how we can use a script to workaround this situation (where the script is running at the VCE itself).<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">The Idea<\/h3>\n\n\n\n<p>The idea of the workaround is simple, there will be two VCE which working independently. However, logically one VCE is working as a primary and one is working as a secondary (let&#8217;s call them primary VCE and secondary VCE from now on). On both VCE LAN interfaces, a secondary IP is configured and that secondary IP is the HaVip. And this will result of IP address conflict. To make things work, the secondary VCE LAN interface is intentionally shut down. And there will be a python script running on the secondary VCE, the script will continuously ping the primary VCE WAN interface. If the ping success (which means primary VCE is up), nothing will be done. However, if the ping failed (which means primary VCE is down), the python script will bring up the secondary VCE LAN interface, so the secondary VCE can take over. Thus, the python is responsible to make sure traffic from VPC to HaVip will hit the primary VCE unless the primary VCE is down.<\/p>\n\n\n\n<p>For the remote site, we also need to ensure the traffic will prefer hitting the primary VCE. Since the secondary VCE LAN interface is intentionally shut down, the secondary VCE will not advertise the VPC routes (reachable is false). But to make a precaution, the primary VCE will advertise the VPC routes with cost 0, while the secondary VCE will advertise the VPC routes with cost 10. Since lower cost is preferred, the traffic from remote site will always prefer the primary VCE as long as primary VCE is up and running.<\/p>\n\n\n\n<p>Since the VCE is not really running VRRP and using the above idea to make the VCE able to work with Alibaba Cloud HaVip, so there will be caveats such as this is not officially supported, there will be some network interruption when the primary VCE come back, etc. However, the script in this workaround does not need to touch the VPC route table, so it should requires minimal maintenance.<\/p>\n\n\n\n<!--nextpage-->\n\n\n\n<h2 class=\"wp-block-heading\">Topology for demonstration<\/h2>\n\n\n\n<p>VCE version: R5241-20241112-GA-5008849603<\/p>\n\n\n\n<p>The following diagram shows the topology for demonstration<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" width=\"1547\" height=\"732\" src=\"https:\/\/www.sdwan2.com\/wp-content\/uploads\/2025\/05\/topology-test.jpg\" alt=\"\" class=\"wp-image-764\"\/><figcaption class=\"wp-element-caption\">Figure 1 &#8211; Topology for demonstration<\/figcaption><\/figure>\n\n\n\n<p>In the Alibaba Cloud VPC, there are two virtual edges, where Ali-HKVCE-Pri is the primary VCE and Ali-HKVCE-Sec is the secondary VCE. Let&#8217;s check the static route setting:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" width=\"2534\" height=\"956\" src=\"https:\/\/www.sdwan2.com\/wp-content\/uploads\/2025\/05\/pri-static-routes.jpg\" alt=\"\" class=\"wp-image-767\"\/><figcaption class=\"wp-element-caption\">Figure 2 &#8211; Primary VCE static routes 10.200.190.0\/24 where the cost is 0<\/figcaption><\/figure>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" width=\"2521\" height=\"961\" src=\"https:\/\/www.sdwan2.com\/wp-content\/uploads\/2025\/05\/secondary-static-routes.jpg\" alt=\"\" class=\"wp-image-768\"\/><figcaption class=\"wp-element-caption\">Figure 3 &#8211; Secondary VCE static routes 10.200.190.0\/24 where the cost is 10<\/figcaption><\/figure>\n\n\n\n<p>Both VCEs advertise the VPC subnet 10.200.190.0\/24 where the primary VCE advertise with a cost 0 and secondary VCE advertise with a cost 10, since lower cost is preferred, the remote site will prefer primary VCE. This is just a precaution because the secondary VCE Ali-HKVCE-Sec will not advertise the 10.200.190.0\/24 because the LAN interface is down.<\/p>\n\n\n\n<p>There is a spoke site called RT-Spoke1 with both Ali-HKVCE-Pri and Ali-HKVCE-Sec assigned as hub site. The RT-Spoke1 is with a LAN subnet 10.11.1.0\/24. There is a PC with IP address 10.11.1.99 attached to the LAN side of RT-Spoke1, this PC will also initiate ping for the testing. Let&#8217;s check the tunnel and route in the RT-Spoke1 to confirm it is able to learn the route 10.200.190.0\/24 from the primary VCE Ali-HKVCE-Pri:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" width=\"1784\" height=\"297\" src=\"https:\/\/www.sdwan2.com\/wp-content\/uploads\/2025\/05\/spoke-path.jpg\" alt=\"\" class=\"wp-image-771\"\/><figcaption class=\"wp-element-caption\">Figure 4 &#8211; tunnels status at RT-Spoke1<\/figcaption><\/figure>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" width=\"1795\" height=\"153\" src=\"https:\/\/www.sdwan2.com\/wp-content\/uploads\/2025\/05\/spoke-route.jpg\" alt=\"\" class=\"wp-image-772\"\/><figcaption class=\"wp-element-caption\">Figure 5 &#8211; route 10.200.190.0\/24 status at RT-Spoke1<\/figcaption><\/figure>\n\n\n\n<p>From the above, RT-Spoke1 is able to form tunnels to both Ali-HKVCE-Pri and Ali-HKVCE-Sec. RT-Spoke1 only learn 10.200.190.0\/24 from Ali-HKVCE-Pri, this is because Ali-HKVCE-Sec LAN interface is shutdown so Ali-HKVCE-Sec will not advertise 10.200.190.0\/24.<\/p>\n\n\n\n<!--nextpage-->\n\n\n\n<h2 class=\"wp-block-heading\">Python script flow<\/h2>\n\n\n\n<p>The python script flow is as the following flow chart for your reference. You can skip this if you just want to implement the script.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" width=\"1149\" height=\"1826\" src=\"https:\/\/www.sdwan2.com\/wp-content\/uploads\/2025\/05\/failover-exp-2.bmp\" alt=\"\" class=\"wp-image-802\"\/><figcaption class=\"wp-element-caption\">Figure 6 &#8211; python script flow<\/figcaption><\/figure>\n\n\n\n<!--nextpage-->\n\n\n\n<h2 class=\"wp-block-heading\">The python script running on secondary VCE<\/h2>\n\n\n\n<p>In this test, the script is called failover_exp.py and placed in \/opt\/vc\/bin folder. The failover_exp.py script content is at the code box below.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>#!\/usr\/bin\/python3\nimport subprocess\nimport time\nfrom datetime import datetime\n\n# Looking for the following in main and adjust the IP address and interface to suit your environment\n#    target_ip = \"10.200.203.10\"\n#    This target_ip is the primary VCE WAN IP (but use the IP address on the interface, not the public IP address)\n#    source_ip = \"10.200.203.11\"\n#    This source_ip is the secondary VCE WAN IP (again, use the IP address on the interface, not the public IP address)\n#    lan_default_gateway = \"10.200.202.253\"\n#    lan_default_gateway is the LAN side Alibaba vSwitch default gateway\n#    interface = \"eth1\"\n#    interface is the LAN side interface, that is the interface where the HaVip located at\n#\n\ndef ping_ip(destination_ip, source_ip):\n    \"\"\"\n    Pings the destination IP once using source_ip as the origin.\n    The '-c 1' flag sends only one packet.\n    Returns True if ping is successful (exit code 0), otherwise False.\n    \"\"\"\n    try:\n        result = subprocess.run(\n            &#91;\"ping\", \"-I\", source_ip, \"-c\", \"1\", destination_ip],\n            stdout=subprocess.DEVNULL,\n            stderr=subprocess.DEVNULL\n        )\n        return result.returncode == 0\n    except Exception as e:\n        print(f\"Error pinging {destination_ip} from {source_ip}: {e}\")\n        return False\n\ndef is_interface_up(interface):\n    \"\"\"\n    Checks whether the given network interface is up by invoking 'ifconfig &lt;interface&gt;'\n    and searching for the 'RUNNING' flag in its output.\n    Returns True if 'RUNNING' is present, otherwise False.\n    \"\"\"\n    try:\n        result = subprocess.run(\n            &#91;\"ifconfig\", interface],\n            capture_output=True,\n            text=True\n        )\n        if result.returncode != 0:\n            # If getting interface info fails, treat the interface as down.\n            return False\n        # The 'RUNNING' flag generally indicates the interface is active.\n        return \"RUNNING\" in result.stdout\n    except Exception as e:\n        print(f\"Error checking interface {interface}: {e}\")\n        return False\n\ndef bring_interface_up(interface):\n    \"\"\"\n    Brings the given network interface up using 'ifconfig &lt;interface&gt; up'.\n    \"\"\"\n    try:\n        subprocess.run(&#91;\"ifconfig\", interface, \"up\"])\n        print(f\"Interface {interface} has been brought up.\")\n    except Exception as e:\n        print(f\"Error bringing up interface {interface}: {e}\")\n\ndef bring_interface_down(interface):\n    \"\"\"\n    Brings the given network interface down using 'ifconfig &lt;interface&gt; down'.\n    \"\"\"\n    try:\n        subprocess.run(&#91;\"ifconfig\", interface, \"down\"])\n        print(f\"Interface {interface} has been brought down.\")\n    except Exception as e:\n        print(f\"Error bringing down interface {interface}: {e}\")\n\ndef main():\n    target_ip = \"10.200.203.10\"\n    source_ip = \"10.200.203.11\"\n    lan_default_gateway = \"10.200.202.253\"\n    interface = \"eth1\"\n\n    # Log the event\n    with open(\"\/var\/log\/failover_exp.log\", \"a\") as log_file:\n        current_time = datetime.now().strftime(\"%Y-%m-%d %H:%M:%S\")\n        log_file.write(f\"{current_time} - failover_exp started\\n\")\n\n    while True:\n        if ping_ip(target_ip, source_ip):\n            print(f\"Ping to {target_ip} is successful from {source_ip}.\")\n            if is_interface_up(interface):\n                print(f\"Interface {interface} is currently up. Sleeping for 30 seconds...\")\n                time.sleep(60)\n\n                # Perform a second ping after sleeping.\n                if ping_ip(target_ip, source_ip):\n                    print(f\"Second ping to {target_ip} is also successful.\")\n                    time.sleep(60)\n                    if ping_ip(target_ip, source_ip):\n                        print(f\"Third ping to {target_ip} is also successful. Bringing interface {interface} down...\")\n                        bring_interface_down(interface)\n                        # Log the event\n                        with open(\"\/var\/log\/failover_exp.log\", \"a\") as log_file:\n                            current_time = datetime.now().strftime(\"%Y-%m-%d %H:%M:%S\")\n                            log_file.write(f\"{current_time} - Interface {interface} has been brought down. Which means failover back to primary\\n\")\n                else:\n                    print(f\"Second ping to {target_ip} failed. Keeping interface {interface} up.\")\n            else:\n                print(f\"Interface {interface} is already down.\")\n        else:\n            print(f\"Ping to {target_ip} failed from {source_ip}.\")\n            # Check the state of the interface if the ping fails.\n            if is_interface_up(interface):\n                print(f\"Interface {interface} is already up.\")\n            else:\n                print(f\"Interface {interface} is currently down. Bringing it up...\")\n                bring_interface_up(interface)\n                # Log the event\n                with open(\"\/var\/log\/failover_exp.log\", \"a\") as log_file:\n                    current_time = datetime.now().strftime(\"%Y-%m-%d %H:%M:%S\")\n                    log_file.write(f\"{current_time} - Interface {interface} has been brought up. Which means secondary take over as primary\\n\")\n                time.sleep(1)\n                subprocess.run(&#91;\"ping\", \"-c\", \"2\", lan_default_gateway])\n\n        # Pause for a second before the next check.\n        time.sleep(2)\n\nif __name__ == \"__main__\":\n    main()<\/code><\/pre>\n\n\n\n<!--nextpage-->\n\n\n\n<h2 class=\"wp-block-heading\">Deploy the failover_exp.py<\/h2>\n\n\n\n<p>Copy the content of the previous code block to \/opt\/vc\/bin\/failover_exp.py. There are some parameters, IP address and interface name you need to change in this python script to make it work in your environment. Those parameters are at line 76 to line 79. You can refer to the network diagram in this post (figure 1) to understand what those IP addresses mean. The interface is the LAN interface, in this example is GE2 which will map to eth1. If you use GE2 as LAN interface, then the lan_interface parameter no need to make changes.<\/p>\n\n\n\n<p>Make sure the failover_exp.py is with executable permission:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" width=\"949\" height=\"55\" src=\"https:\/\/www.sdwan2.com\/wp-content\/uploads\/2025\/05\/script_permission.jpg\" alt=\"\" class=\"wp-image-776\"\/><figcaption class=\"wp-element-caption\">Figure 7 &#8211; failover_exp.py permission<\/figcaption><\/figure>\n\n\n\n<p>To let the failover_exp.py automatically start when the VCE boot up, add the following line in \/etc\/rc.local<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>nohup \/opt\/vc\/bin\/failover_exp.py &gt;\/dev\/null 2&gt;&amp;1 &amp;<\/code><\/pre>\n\n\n\n<p>The \/etc\/rc.local will become:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" width=\"688\" height=\"112\" src=\"https:\/\/www.sdwan2.com\/wp-content\/uploads\/2025\/05\/rclocal.jpg\" alt=\"\" class=\"wp-image-777\"\/><figcaption class=\"wp-element-caption\">Figure 8 &#8211; the content of \/etc\/rc.local<\/figcaption><\/figure>\n\n\n\n<!--nextpage-->\n\n\n\n<h2 class=\"wp-block-heading\">HaVip at Alibaba Cloud<\/h2>\n\n\n\n<p>Before going into the HaVip, let&#8217;s screen capture the ENI of primary VCE Ali-HKVCE-Pri and secondary VCE Ali-HKVCE-Sec:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" width=\"2270\" height=\"385\" src=\"https:\/\/www.sdwan2.com\/wp-content\/uploads\/2025\/05\/primary-eni-list.jpg\" alt=\"\" class=\"wp-image-780\"\/><figcaption class=\"wp-element-caption\">Figure 9 &#8211; Ali-HKVCE-Pri ENI list<\/figcaption><\/figure>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" width=\"2283\" height=\"403\" src=\"https:\/\/www.sdwan2.com\/wp-content\/uploads\/2025\/05\/secondary-eni-list.jpg\" alt=\"\" class=\"wp-image-781\"\/><figcaption class=\"wp-element-caption\">Figure 10 &#8211; Ali-HKVCE-Sec ENI list<\/figcaption><\/figure>\n\n\n\n<p>The HaVip used in this demonstration is called HAVIP-b-private with address 10.200.202.9. The primary VCE LAN side ENI is binded as the primary while the secondary VCE LAN side ENI is binded as standby. The following shows the HaVip configuration and associated interface:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" width=\"1625\" height=\"962\" src=\"https:\/\/www.sdwan2.com\/wp-content\/uploads\/2025\/05\/HA-VIP.jpg\" alt=\"\" class=\"wp-image-782\"\/><figcaption class=\"wp-element-caption\">Figure 11 &#8211; HaVip and the association<\/figcaption><\/figure>\n\n\n\n<!--nextpage-->\n\n\n\n<h2 class=\"wp-block-heading\">Secondary IP address at the VCE<\/h2>\n\n\n\n<p>The remaining question is, since the VCE is not actually running VRRP, it has no idea or no configuration for the HaVip (floating IP) 10.200.202.9. The trick here is to add a secondary IP 10.200.202.9 at both primary and secondary VCE. Although this is supposed to cause IP address conflict, the secondary VCE LAN interface is shutdown so IP address conflict will not happen. And the following screen capture shows the corresponding secondary IP address configuration.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" width=\"2432\" height=\"845\" src=\"https:\/\/www.sdwan2.com\/wp-content\/uploads\/2025\/05\/pri-secondaryIP.jpg\" alt=\"\" class=\"wp-image-785\"\/><figcaption class=\"wp-element-caption\">Figure 12 &#8211; Secondary IP 10.200.202.9 added as GE2:1:SIP on Ali-HKVCE-Pri<\/figcaption><\/figure>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" width=\"2434\" height=\"786\" src=\"https:\/\/www.sdwan2.com\/wp-content\/uploads\/2025\/05\/sec-secondaryIP.jpg\" alt=\"\" class=\"wp-image-786\"\/><figcaption class=\"wp-element-caption\">Figure 13 &#8211; Secondary IP 10.200.202.9 added as GE2:1:SIP on Ali-HKVCE-Sec<\/figcaption><\/figure>\n\n\n\n<p>The following is the screen capture of the result of ifconfig at the Ali-HKVCE-Pri, the secondary IP 10.200.202.9 is at the interface eth1:5000<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" width=\"1212\" height=\"948\" src=\"https:\/\/www.sdwan2.com\/wp-content\/uploads\/2025\/05\/ifconfig-secondary.jpg\" alt=\"\" class=\"wp-image-787\"\/><figcaption class=\"wp-element-caption\">Figure 14 &#8211; ifconfig output at Ali-HKVCE-Pri<\/figcaption><\/figure>\n\n\n\n<!--nextpage-->\n\n\n\n<h2 class=\"wp-block-heading\">Failover test by power off Ali-HKVCE-Pri<\/h2>\n\n\n\n<p>The setup is ready for the failover test. At the PC (10.11.1.99) on the RT-Spoke1 LAN side, do a trace route to the server 10.200.190.108 at Alibaba VPC.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" width=\"480\" height=\"299\" src=\"https:\/\/www.sdwan2.com\/wp-content\/uploads\/2025\/05\/tracert-ori.jpg\" alt=\"\" class=\"wp-image-789\"\/><figcaption class=\"wp-element-caption\">Figure 15 &#8211; tracert to 10.200.190.108<\/figcaption><\/figure>\n\n\n\n<p>The tracert shows the secondary hop is 10.200.203.10 which is Ali-HKVCE-Pri. Issue a continuous ping to 10.200.190.108.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" width=\"549\" height=\"554\" src=\"https:\/\/www.sdwan2.com\/wp-content\/uploads\/2025\/05\/continous-ping.jpg\" alt=\"\" class=\"wp-image-790\"\/><figcaption class=\"wp-element-caption\">Figure 16 &#8211; Continuous ping to 10.200.190.108<\/figcaption><\/figure>\n\n\n\n<p>While the ping is running, power off the Ali-HKVCE-Pri and observe if the failover success. There are 3 ping loss (around 15 seconds) during the failover:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" width=\"571\" height=\"738\" src=\"https:\/\/www.sdwan2.com\/wp-content\/uploads\/2025\/05\/ping-failover.jpg\" alt=\"\" class=\"wp-image-792\"\/><figcaption class=\"wp-element-caption\">Figure 17 &#8211; 3 ping loss during failover<\/figcaption><\/figure>\n\n\n\n<p>Let&#8217;s perform a trace route again.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" width=\"468\" height=\"314\" src=\"https:\/\/www.sdwan2.com\/wp-content\/uploads\/2025\/05\/tracert-after.jpg\" alt=\"\" class=\"wp-image-793\"\/><figcaption class=\"wp-element-caption\">Figure 18 &#8211; trace route result after failover<\/figcaption><\/figure>\n\n\n\n<p>The secondary hop of the trace route is now 10.200.203.11 which belongs to Ali-HKVCE-Sec. This concludes the failover success. The python script failover_exp.py will write a simple message &#8220;Interface eth1 has been brought up. Which means secondary take over as primary&#8221; at \/var\/log\/failover_exp.log.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" width=\"970\" height=\"208\" src=\"https:\/\/www.sdwan2.com\/wp-content\/uploads\/2025\/05\/log-out-put1.jpg\" alt=\"\" class=\"wp-image-803\"\/><figcaption class=\"wp-element-caption\">Figure 19 &#8211; sample of \/var\/log\/failover_exp.log<\/figcaption><\/figure>\n\n\n\n<!--nextpage-->\n\n\n\n<h2 class=\"wp-block-heading\">When the primary VCE comes up<\/h2>\n\n\n\n<p>In the previous section, the primary VCE Ali-HKVCE-Pri is powered off to test the failover. What will happen when the primary VCE Ali-HKVCE-Pri is power on back? Again, since the VCEs are not running VRRP, the HaVip 10.200.202.9 is just a secondary IP address. Thus, when the primary VCE boot up, and it needs time to initialize, including bring up the edged service and establish tunnels. Before the tunnels are up, the LAN interface should already working so the HaVip will send the traffic to the primary VCE even the primary VCE tunnels are not established yet. As a result, it is unavoidable during the primary VCE comes back, there are some instances the network connectivity between the peer VCE and the VCE in Alibaba Cloud is lost. The screen capture below shows this situation, there are 3 instances of couple ping loss during the primary VCE boot up.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" width=\"528\" height=\"708\" src=\"https:\/\/www.sdwan2.com\/wp-content\/uploads\/2025\/05\/return-some-loss.jpg\" alt=\"\" class=\"wp-image-797\"\/><figcaption class=\"wp-element-caption\">Figure 20 &#8211; ping loss when the primary VCE booting up<\/figcaption><\/figure>\n\n\n\n<!--nextpage-->\n\n\n\n<h2 class=\"wp-block-heading\">How about bring the secondary VCE up?<\/h2>\n\n\n\n<p>Assume the primary VCE is up and the secondary VCE is off. When we power on the secondary VCE, during the boot up, the LAN interface with secondary IP will be initialized and being bring up, which it can cause the HaVip sending the traffic to the secondary VCE instead. After the secondary VCE is completely boot up, the \/opt\/vc\/bin\/failover_exp.py gets kick start, it then shutdown the LAN interface, at this point the environment becomes stable. In addition, the python script cannot be very aggressive to shutdown the LAN interface right away when it can ping the primary VCE WAN IP, because it can be in the scenario the primary VCE is also initializing.<\/p>\n\n\n\n<p>Thus, when the primary VCE is up and we bring up the secondary VCE, there will be a few minutes like a &#8220;chaos period&#8221; which the connectivity going on and off. If the administrator want to avoid this, the administrator can consider unbind the secondary VCE ENI from the HaVip first. After power up the secondary VCE and wait for enough time, bind the secondary VCE ENI back to the HaVip.<\/p>\n\n\n\n<p><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Possible further enhancements<\/h2>\n\n\n\n<p>There are some enhancements can be made:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Add job to monitor the python script. At the moment, the \/opt\/vc\/bin\/failover_exp.py running in a forever loop. But if somehow the script gets terminated, then there is no process to bring the script up again.<\/li>\n\n\n\n<li>A better checking mechanism of the primary status. Currently, the script just check the primary VCE status by pinging the WAN interface IP address, this cannot reflect the primary VCE is really functioning, further enhancement can be consider to perform a more precise checking.<\/li>\n<\/ol>\n\n\n\n<p>And certainly, if you plan to use this python script, you can feel free to do any enhancement or tuning on it.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Conclusion<\/h2>\n\n\n\n<p>This failover_exp.py together with Alibaba Cloud HaVip, makes Velocloud virtual edge failover possible, without the need of an external VM\/function to run a script, and it also does not touch the VPC route table. I hope this script is helpful when you need VCE having HA ability in Alibaba Cloud.<\/p>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Disclaimer: The method, workaround, idea and script in this article is NOT supported by Velocloud, use them on your own risk. Backgroud When deploy Velocloud SD-WAN Edge (VCE) hardware, or virtual edge in KVM\/ESXi, High Availability (HA) is supported. However, in public cloud, including Alibaba Cloud, VCE HA is not possible. This is because: Alibaba [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"zakra_sidebar_layout":"customizer","zakra_remove_content_margin":false,"zakra_sidebar":"customizer","zakra_transparent_header":"customizer","zakra_logo":0,"zakra_main_header_style":"default","zakra_menu_item_color":"","zakra_menu_item_hover_color":"","zakra_menu_item_active_color":"","zakra_menu_active_style":"","zakra_page_header":true,"footnotes":""},"categories":[11,6,5],"tags":[],"class_list":["post-756","post","type-post","status-publish","format-standard","hentry","category-public-cloud","category-unofficial","category-velocloud"],"_links":{"self":[{"href":"https:\/\/www.sdwan2.com\/index.php\/wp-json\/wp\/v2\/posts\/756","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.sdwan2.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.sdwan2.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.sdwan2.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.sdwan2.com\/index.php\/wp-json\/wp\/v2\/comments?post=756"}],"version-history":[{"count":47,"href":"https:\/\/www.sdwan2.com\/index.php\/wp-json\/wp\/v2\/posts\/756\/revisions"}],"predecessor-version":[{"id":826,"href":"https:\/\/www.sdwan2.com\/index.php\/wp-json\/wp\/v2\/posts\/756\/revisions\/826"}],"wp:attachment":[{"href":"https:\/\/www.sdwan2.com\/index.php\/wp-json\/wp\/v2\/media?parent=756"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.sdwan2.com\/index.php\/wp-json\/wp\/v2\/categories?post=756"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.sdwan2.com\/index.php\/wp-json\/wp\/v2\/tags?post=756"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}