ARCHIVED: On a PC, is it possible to emulate a math coprocessor?

This content has been archived, and is no longer maintained by Indiana University. Information here may no longer be accurate, and links may no longer be available or reliable.

The following information comes from a Usenet newsgroup post:

  Article 18096 of comp.sys.ibm.pc.hardware:
  Xref: usenet.ucs.indiana.edu comp.sys.intel:1907
  comp.sys.ibm.pc.hardware:18096
  Path: usenet.ucs.indiana.edu!sol.ctr.columbia.edu!ira.uka.de!uka!uka!news
  From: S_JUFFA@IRAV1.ira.uka.de (|S| Norbert Juffa)
  Newsgroups: comp.sys.intel,comp.sys.ibm.pc.hardware
  Date: 13 Jan 1993 19:31:19 GMT
  Organization: University of Karlsruhe, FRG

Yes. In the absence of a coprocessor, floating-point calculations are often performed by a software package that simulates its operations. Such a program is called a coprocessor emulator. Simulating the coprocessor has the advantage for application programs that identical code can be generated for use with either the coprocessor and the emulator, so that it's possible to write programs that run on any system without regard to whether a coprocessor is present or not. Whether the program will use an actual coprocessor or software emulating it can easily be determined at run-time by detecting the presence or absence of the coprocessor chip.

Two approaches to interface an 80x87 emulator to programs are common. The first method makes use of the fact that all coprocessor instruction start with the same five bit pattern 11011. Thus the first byte of a coprocessor instruction will be in the range D8-DF hexadecimal. In addition, coprocessor instructions usually are preceded by a WAIT instruction (opcode 9Bh) which is one byte long (the reason for doing this has been described in the previous chapter dealing with the operating details of the 80x87). One common approach is to replace the WAIT instruction and the first byte of the coprocessor instruction with one out of eight interrupt instructions; the remaining bytes of the coprocessor instruction are left unchanged. Interrupts 34 to 3B hexadecimal are used for this emulation technique. (Note that the sequences 9B D8 ... 9B DF can be easily converted to the interrupt instructions CD 34 ... CD 3B by simple addition and subtraction of constants.) The compiler or assembler initially produces code that contains these appropriate interrupt calls instead of the coprocessor instructions. If a hardware coprocessor is detected at run-time, the emulator interrupts point to a short routine that converts the interrupts calls back to coprocessor instructions (yes, this is known as "self-modifying code"). If no coprocessor is found the interrupts point to the emulation package, which examines the byte(s) following the interrupt instruction to determine which floating-point operation to perform. This method is used by many compilers, including those from Microsoft and Borland. It works with every 80x86 CPU from the 8086/8088 on.

The second method to interface an emulator is only available on 286/386/486 machines. If the emulation bit in the machine status word of these processors is set, the processors will generate an interrupt 7 whenever a coprocessor instruction is encountered. The vector for this interrupt will have been set up to point at an emulation package that decodes the instruction and performs the desired operation. This approach has the advantage that the emulator doesn't have to be included in the program code, but can be loaded once (as a TSR or device driver) and then used by every program that requires a coprocessor. Emulation via interrupt 7 is transparent, which means that programs containing coprocessor instructions execute just like a coprocessor was present, only slower. This approach is taken by the public domain EM87 emulator, the shareware program Q387, and the commercial Franke387 emulator, for example. Even programs that require a coprocessor to run like AutoCAD are 'fooled' to believe that a coprocessor is present with emulators using INT 7.

Operating systems such as OS/2 2.0 and Windows 3.1 provide coprocessor emulations using INT 7 automatically if they do not find a coprocessor to be installed. The emulator in Windows doesn't seem to be very fast, as people who have ported their Turbo Pascal programs from the TP 6.0 DOS compiler (using the emulation built into the TP 6.0 run-time library) to the TPW 1.5 Windows compiler (using MS Windows' emulator) have noticed. Slowdowns of as much as a factor of five have been reported [79].

The size of the emulator used by TP 6.0 is about 9.5KB, while EM87 occupies about 15.8KB as a TSR, and Franke387 uses about 13.4KB as a device driver. Note that Franke387 and especially EM87 model a real coprocessor much more closely than Turbo Pascal's emulator does. In particular, EM87 supports denormal numbers, precision control, and rounding control. The emulator in TP 6.0 does not implement these features. The version of Franke387 tested (V2.4) supports denormals in single and double-precision, but not double extended precision, and it supports precision control, but not rounding control. The recently introduced shareware program Q387 only runs on 386, 386SX, 486SX and compatible processors. The program loads completely into extended memory and uses about 330KB. To enable INT 7 trapping to a service routine in extended memory it needs to run with a memory manager (e.g. EMM386, QEMM, or 386MAX). The huge size of the program stems from the fact that it was solely optimized for speed, assuming that extended memory is a cheap resource. Presumably it uses large tables to speed computations. Intel's E80287 program is supposed to be an 100% exact emulation of the 80287 coprocessor [44]. Note that the more closely a real coprocessor is modeled by the emulator, the slower the emulator runs and the larger the code for the emulator gets.

  Relative execution times of coprocessor vs. software emulators
  for selected coprocessor instructions

                  Intel 387DX    TP 6.0 Emulator   EM87 Emulator

  FADD ST, ST(0)       1              26                104
  FDIV [DWord]         1              22                136
  FXAM                 1              10                 73
  FYL2X                1              33                102
  FPATAN               1              36                110
  F2XM1                1              38                110

  The following table is an excerpt from [44]:

                  Intel 80287  Intel E80287 Emulator

  FADD ST, ST(0)       1              42
  FDIV [DWord]         1             266
  FXAM                 1             139
  FYL2X                1              99
  FPATAN               1             153
  F2XM1                1              41

  The following has been adapted from [43] and merged with my own
  data:

                  Intel 8087  TP 6.0 Emul. (8086)  Intel Emul. (8086)

  FADD ST, ST(0)       1              20                 94
  FDIV [DWord]         1              22                 82
  FPTAN                1              18                144
  F2XM1                1               6                171
  FSQRT                1              44                544

One of the reasons emulators are so slow is that they are often designed to run with every CPU from the 8086/8088 on upwards. This is the case with the emulators built into the compiler libraries of the Turbo Pascal 6.0 (also used by Turbo C/C++) and Microsoft C 6.0 compiler (probably also used in other Microsoft products) and is also true for the EM87 emulator in the public domain. By using code that can run on a 8086/8088, these emulators forego the speed advantage offered by the additional instructions and architectural enhancements (such as 32-bit registers) of the more advanced Intel 80x86 processors. A notable exception to this is the Franke387 emulator, a commercial emulator that is also sold as shareware. It uses 386- specific 32-bit code and only runs on 386/386SX/486SX computers.

Besides being slow, coprocessor emulators have other drawbacks when compared with real coprocessors. Most of the emulators do not support the additional instructions that the 387-compatible coprocessors offer over the 80287. Often, some of the low-level stack-manipulating instructions like FDECSTP are not emulated. For example, [76] lists the coprocessor instructions not emulated by Microsoft's emulator (included in the MS-C and MS-FORTRAN libraries) as follows:

  FCOS         FRSTOR      FSINCOS      FXTRACT
  FDECSTP      FSAVE       FUCOM
  FINCSTP      FSETPM      FUCOMP
  FPREM1       FSIN        FUCOMPP

Additionally, some parts of the coprocessor architecture, like the status register, are often not or only partially emulated. Some emulators do not conform to the IEEE-754 standard in their implementation of the basic arithmetic functions, while the hardware coprocessors do. Also, they sometimes lack the support for denormals (a special class of floating-point numbers) although it is required by the standard. Not all the 80x87 emulators support rounding control and precision control, also features required by IEEE-754. Most of these omissions are aimed at making the emulator faster and smaller. Because of the performance gap and these other shortcomings of coprocessor emulators, a real coprocessor is a must for anybody planning to do some serious computations. (At today's prices, this shouldn't pose much of a problem to anybody!)

Nhuan Doduc (ndoduc@framentec.fr) has tested a number of stand-alone coprocessor emulators for PCs, among them the two emulators, EM87 and Franke387 V2.4, already mentioned. He found Franke387 to be the best in terms of reliability, speed, and accuracy.

If you have questions about using statistical and mathematical software at Indiana University, contact the UITS Research Applications and Deep Learning team.