Save configuration error

  • 0
  • 1
  • Problem
  • Updated 3 years ago
  • Solved
  • (Edited)
During a maintenance, after doing some changes on a stack configuration (stack composed of 2 x460 and one x440), we wanted to save the configuration on the primary so we've done the following command:


The configuration file primary.cfg already exists.
Do you want to save configuration to primary.cfg and overwrite it? (y/N) Yes


After that we received this message:

Error: This command cannot be executed during configuration save.


But after looking, we don't find how to verify the saving process. The only thing that we have found is the last save which was tried was done by the Ridgeline server three days ago:


 09/20/2015 19:06:23.72 <Info:cli.logRemoteCmd> Slot-1: x.x.x.x (telnet) userXXX: SAVE CONFIGURATION

But the last configuration done apparently have been made last June 1st (below output of show switch command)

primary.cfg       Created by ExtremeXOS version 15.5.3.4
                  1053995 bytes saved on Mon Jun  1 01:37:34 2015


Please, how can I verify and release the saving process (other than reboot) ?

Thanks a lot.

Tristan
Photo of Fauriant Tristan

Fauriant Tristan

  • 384 Points 250 badge 2x thumb

Posted 3 years ago

  • 0
  • 1
Photo of Taykin Izzet

Taykin Izzet , Employee

  • 3,106 Points 3k badge 2x thumb
Tristan, are you able to save to another configuration file other than primary.cfg 'save configuration <new-config>' ?

Also is there any other telnet/ssh sessions open or polling that could be going on preventing access to resources?

Review the 'top' output for CLI, SSH, SNMP, and other high process usage.
Photo of Fauriant Tristan

Fauriant Tristan

  • 384 Points 250 badge 2x thumb
Hello,

thanks for your response.

I've tried to save the config file in an other destination as you suggest and the same error appears.

save conf test
Do you want to save configuration to test.cfg? (y/N) Yes
Error: This command cannot be executed during configuration save.

The show sessions command, return that my connection is the only one active :

 sh session
                                                            
    #       Login Time               User    
================================================================================
*68         Thu Sep 24 10:00:53 2015 userXXX


Next, I've removed the switch from the ridgeline Management, and it's also the same.

Here a review of the TOP  command, I don't find anything weird, anybody have an idea ?

  Load average: 7.53 7.55 7.47 3/204 11799

  PID  PPID USER     STAT   RSS %MEM CPU %CPU COMMAND
 1451     1 root     S <  26512  2.5   0  4.2 ./hal
 1243     2 root     SW<      0  0.0   0  2.4 [bcmLINK.0]
 1800     2 root     SW<      0  0.0   0  1.8 [bcmCNTR.0]
 1801     2 root     SW<      0  0.0   1  1.6 [bcmCNTR.1]
 1475     1 root     S     3612  0.3   1  0.6 ./fdb
11799 11798 root     R      852  0.0   0  0.6 top -d 3
 1787     1 root     S      832  0.0   1  0.6 ./exsshd
 1246     2 root     RW<      0  0.0   1  0.6 [bcmLINK.1]
 1547     1 root     S     3796  0.3   0  0.3 ./acl
 1530     1 root     S     3368  0.3   1  0.3 ./pim
 1088     1 root     S     2716  0.2   1  0.1 /exos/bin/epm -t 40 -f /exos/config/epmrc.Edge -d /exos/config/epmdprc
 1520     1 root     S     2484  0.2   0  0.1 ./rip
 1248     2 root     SW<      0  0.0   0  0.1 [bcmASYNC]
 1295     2 root     DW<      0  0.0   1  0.1 [tbcm_msm_tx0]
 1455     1 root     S    18692  1.8   1  0.0 ./cliMaster
 1564     1 root     S    10004  0.9   1  0.0 ./etmon
 1803     1 root     S     5996  0.5   1  0.0 ./snmpMaster
 1461     1 root     S     5508  0.5   0  0.0 ./snmpSubagent
 1457     1 root     S     4768  0.4   1  0.0 ./cfgmgr
 1577     1 root     S     4764  0.4   1  0.0 ./xmld
 1447     1 root     S     4732  0.4   1  0.0 ./emsServer
 1604     1 root     S     4612  0.4   0  0.0 ./idMgr
 1465     1 root     S     4456  0.4   1  0.0 ./vlan
Photo of Prashanth KG

Prashanth KG, Employee

  • 5,300 Points 5k badge 2x thumb
Hi Tristan,

Could you check the show switch output and see if the master and the backup nodes are in sync with each other?
Photo of Fauriant Tristan

Fauriant Tristan

  • 384 Points 250 badge 2x thumb
Hi Prashanth,

you're right, on the output of "show switch" I see that slot 2 is not in Sync ("(In Sync)" on the line slot is missing as you can see on the output below).

Slot:             Slot-1 *                     Slot-2
                  ------------------------     ------------------------
Current State:    MASTER                       BACKUP

Image Selected:   secondary                    secondary
Image Booted:     secondary                    secondary
Primary ver:      15.2.3.2                     15.2.3.2
Secondary ver:    15.5.3.4                     15.5.3.4
                  patch1-2                     patch1-2

How can I resynchronize the two slots ?
Photo of Fauriant Tristan

Fauriant Tristan

  • 384 Points 250 badge 2x thumb
Hello,

I've tried to telnet the slot 2 and apparently, this switch doesn't have synced the configuration of the stack (just for confirmation), the prompt of the cmd is as default :
* Slot-2 Stack.1 >

But fortunately, the stack master is the slot 1 and so the stack is still working.


When I tried to make command like "show log" or "show conf" I've the same error :

ERROR: ems has not finished loading its configuration, please retry command later.
Photo of Prashanth KG

Prashanth KG, Employee

  • 5,300 Points 5k badge 2x thumb
Hi Tristan,

Thank you for sharing the details. Only way to synchronise the slots would require the reboot of the target slot. 

synchronize slot <slot number>

This will reboot the mentioned slot number and during the bootup, it will be synchronized. 

Hope this helps! 
Photo of Prashanth KG

Prashanth KG, Employee

  • 5,300 Points 5k badge 2x thumb
Just to investigate why it went to this state, share the following details:

whats the uptime of the stack? 
Was it in Sync since the boot of the slot 2? 
Were you aware of any recent changes made to this stack like adding / modifying the details in Ridgeline etc., 

Lets see if we can get a clue for the trigger. 

And just to set the expectation right.. 
Finding the root cause might be difficult as the stack is already in the failed state and the recovery option would require a reboot. 
Photo of Fauriant Tristan

Fauriant Tristan

  • 384 Points 250 badge 2x thumb
Prashanth, responses to your questions :

System UpTime:    214 days 21 hours 50 minutes 19 seconds (for the stack)

Yes the stack was synced during the last maintenance window. But I've seen that the slot 2 have reboot during the last months many time for unexpected reasons. Here the log of the last reboot :

08/24/2015 01:06:05.88 <Info:HAL.Card.Info> Slot-1: Module in Slot-2 is operational
08/24/2015 01:06:01.33 <Info:HAL.IPv4ACL.Info> Slot-1: Done synching ACLs to Slot-2
08/24/2015 01:06:00.87 <Info:HAL.IPv4ACL.Info> Slot-1: Synching ACLs to Slot-2
08/24/2015 01:05:40.07 <Noti:DM.Notice> Slot-1: Slot-2 being Powered ON
08/24/2015 01:05:37.05 <Info:HAL.Card.Info> Slot-1: Module in Slot-2 is inserted
08/24/2015 01:02:55.57 <Warn:DM.Warning> Slot-3: Slot-2 FAILED (1) Not In Sync
08/24/2015 01:02:55.56 <Warn:DM.Warning> Slot-3: BACKUP NODE (Slot-2) DOWN
08/24/2015 01:02:54.07 <Info:HAL.Card.Info> Slot-1: Slot-2 down, resetting all TCP connections to it
08/24/2015 01:02:54.07 <Info:HAL.Card.Info> Slot-1: Module in Slot-2 is removed
08/24/2015 01:02:53.48 <Warn:DM.Warning> Slot-1: BACKUP NODE (Slot-2) DOWN
08/24/2015 01:02:53.29 <Info:HAL.Card.Info> Slot-1: Slot-2 down, resetting all TCP connections to it
08/24/2015 01:02:53.29 <Info:HAL.Card.Info> Slot-1: Module in Slot-2 is removed


I'm not aware of changes in the last months, I've checked the show debug system-dump and nothing is present. We'll try to resynchronize the slot next Sunday, I will make a feedback when it's done.
(Edited)
Photo of Fauriant Tristan

Fauriant Tristan

  • 384 Points 250 badge 2x thumb
Last Sunday, we've made the modification for the synchronization of the slot 2, but unfortunately after the reload of this slot, it was always not synchronized, a reboot of the entire stack has been made and the synchronization is now ok.

Thanks for your support.