Recently I had a customer decide to literally pull the power cable to test resiliency of their OVS/OVM platform. We have UPS’s installed and with the associated software can and will (I have tested it) shut down machines before power is lost to save corruption. Due to the size of this institute, we run OVM within a VM on OVS and I only use XE. We understand this isn’t a fully supported environment by Oracle, but it the moment it is what it is.
First off a big thanks to Wim Coekaerts and Avi Miller. While they didn’t directly help me, their blog posts and forum posts were really helpful in diagnosing and ultimately resolving the solution.
Usually you can check
log files and get a good idea as to what is going on, (e.g. database isn’t running, memory issues, etc). Not this time. While there was 1 error given during login, there were no errors during startup. The error given during login was almost identical to the one provided on the screen:
Unexpected error during login (com.oracle.ovm.mgr.api. exception. FailedOperationException: OVMAPI_6000E Internal Error: Connection refused Connection refused
Connection refused to what, which port? which host? There were no details at all.
Down the rabbit hole I went. Is the database up and running? Can I connect to it? Tnslistener? Can I use ovm_admin? All these things seemed to be working, but I still couldn’t login to the web GUI.
I don’t really like to be destructive, as you may or may not be able to recover everything with discovery etc, but in the end this time, I needed to destroy the database and start from scratch almost, I could not figure out what was going on.
Basically these are the steps I took to get to a state where I could login again. Stop OVMM, export the database (just in case), take a backup of the .config file and then run drop the database by use the ovm_upgrade script.
/etc/init.d/ovmm stop su - oracle exp OVS/ exit cp /u01/app/oracle/ovm-manager-3/.config /root/ovm_config_backup cd /u01/app/oracle/ovm-manager-3/bin ./ovm_upgrade.sh --dbsid=orcl --dbuser=ovs --dbpass= --deletedb /etc/init.d/ovmm start
Start this time took longer than usual, as it populates the database, but I could now login, but everything was gone. Well step one was complete.
Now you need to go and discover the host again.
Once you have discovered the host, go back to the home screen and find all the storage pools, and refresh the storage pools. This should find all your ISOs, VM images, and your VMs should start to be found.
Once you have VMs are found, they should be in the folder where they are unowned by any cluster. You need to migrate them back into this pool.
That should be about it. You are backup and running.