Explanation and solution of bug report #345931

Mats Erik Andersson ynglingatal at yahoo.se
Thu Sep 7 20:11:22 UTC 2006


   Dear Maintainers of Grub,

 I propose to offer the solution to bug #345931. The 
 problem is that the grub-shell of 0.97 breaks stage1
 for version 0.96 and earlier. The demolition is so
 strong that stage1 is unintelligibly displaying
 "shadows" on the screen.

 I am writing this for your information and I intend
 to locate the disturbing code snippets later.

  Hands on solution in testing case:
  ----------------------------------

 *  Functional grub 0.95-2004... with Debian Sarge;

 *  Reconfiguring with grub 0.97-4 and 0.97-13
       using grml-small-0.2 resp. grml-0.8,
       both Debian-based:

      device (hd0) /dev/hda
      root (hd0,5)
      setup (hd0)

 *  Now the bootloader i completely broken, no
      signs of correct booting-activity.

 *  Remedy:  write 0x90 to byte 0x004d of MBR.
    This restores full funtionality of grub.

 Explanations
 ------------

   Stage1 of grub 0.95 and 0.96 begin with the 
   following machine code.

       0x7c00  eb48     jmp  0x7c4a

       0x7c4a  fa       cli
       0x7c4b  80ca00   or dl,0x00
       0x7c4e  .......   a corrective jump instruction


   (For buggy BIOSes the last will be changed 
    to 80ca80  or dl,0x80).
   Observe the last value 0x00 which is different
   from   nop = 0x90 .

   In contrast, grub 0.97 commences stage1 as

        0x7c00  eb48     jmp short 0x7c4a

        0x7c4a  fa       cli
        0x7c4b  eb07     jmp short 0x7c54
        0x7c4d  f6...    four corrective instructions

    **** The problem  ****

    The setup-command of the grub-shell uses some
    technique to judge the sanity of the bios and
    mostly replaces the TWO-BYTES INSTRUCTION 
                        ---------------------
          eb07      jmp  short 0x7c54
     by
          90        nop
          90        nop

     However, if stage1 originates from grub 0.95/96
     the corresponding instruction is THREE BYTES long

          80ca00    or dl,0x00

     thus being "corrected" to
        
          90         nop
          90         nop
          00ea597c      (processor runs havoc now)

     The corrective means that grub 0.95/96 uses
     is much simpler choice:

          0x7c4d     (either)  0x00  or  0x80.


   I see a twofold way out of this incompatibility:

     1)  Replace code snippet   'jmp short 0x7c4a'
         in stage1 of grub 0.97 by inserting an extra
         nop-instruction  'jmp 0x7c4a ; nop'.
         Then the grub-shell always can correct with
         a code snippet 0x909090 at address 0x7c4b
         for all versions of grub.

      2) Better testing in the grub-shell to determine
         whether location 0x7c4b holds 0x80**** which
         calls for a three bytes correction (or an
         alteration at 0x7c4d), or it holds 0xeb**
         and needs a two-byte correction.

    Since I only have gleaned on the source code for
    stage1 and not the rest of grubs sources, there
    is indication of which method is preferable.
    Do you have a suggestion? If not sooner, I will
    ask you for advice when I have disected the
    source code.

         Best regards

           Mats E Andersson

           ynglingatal at yahoo.se



More information about the Pkg-grub-devel mailing list