Recently I upgraded the iDRAC on a Dell R730 with an iDRAC 8 and had a strange issue – the iDRAC went into a reboot loop. The firmware upgrade initially was from 184.108.40.206 to 220.127.116.11. After the upgrade the iDRAC could be accessed for short periods of time via web UI – usually just long enough for me to login but not do anything else. I could also SSH to the iDRAC for a very short period of time, just long enough to login but not actually run any commands. Physically the lights on the iDRAC network port were turning off then back on after a while and the link status on the switch was showing it was going down/up.
The first time it happened I placed a support call with Dell who sent an engineer out to replace the system board. After the system board was replaced the iDRAC started working again, the replacement board was using firmware 18.104.22.168. I then attempted the firmware upgrade again this time directly from the Dell FTP server but the same issue occured.
This time I managed to SSH to the device for a longer period of time, the iDRAC would reboot every minute or so. I managed to roll back the firmware by waiting for it to reboot and then as soon as I saw it responding to ping requests connecting to the device with SSH and issuing a roll back command like this:
firstname.lastname@example.org's password: /admin1-> racadm fwupdate -r Successfully Rollback. /admin1->
After about 5 minutes I was back in, and it seems like everything started working again (albeit on the older version). I was then able to access the lifecycle log for the server, this is what was happening every time the iDRAC restarted:
RAC0182 - The iDRAC firmware was rebooted with the following reason: FP SPI FS Recovery. RAC0182 - The iDRAC firmware was rebooted with the following reason: user initiated.
UPDATE: (April 2021) This was posted in 2018, and as of yet I did not find a work around/fix for the broken servers. It looks like they will be stuck on the old iDRAC release until they are removed from service.