ARCHIVED: How do various PC math coprocessors work?

This content has been archived, and is no longer maintained by Indiana University. Information here may no longer be accurate, and links may no longer be available or reliable.

The following information comes from the Usenet:

  Article 18096 of comp.sys.ibm.pc.hardware:
  Xref: usenet.ucs.indiana.edu comp.sys.intel:1907
  comp.sys.ibm.pc.hardware:18096
  Path: usenet.ucs.indiana.edu!sol.ctr.columbia.edu!ira.uka.de!uka!uka!news
  From: S_JUFFA@IRAV1.ira.uka.de (|S| Norbert Juffa)
  Newsgroups: comp.sys.intel,comp.sys.ibm.pc.hardware
  Date: 13 Jan 1993 19:31:19 GMT
  Organization: University of Karlsruhe, FRG

In any 80x86 system with an 80x87 math coprocessor, CPU instructions and coprocessor instructions are executed concurrently. This means that the CPU can execute CPU instructions while the coprocessor executes a coprocessor instruction at the same time. The concurrency is restricted somewhat by the fact that the CPU has to aid the coprocessor in certain operations. As the CPU and the coprocessor are fed from the same instruction stream and both instruction streams may operate on the same data, there has to be a synchronizing mechanism between the CPU and the coprocessor.

The 8087

In 8086/8088 systems with 8087 coprocessors, both chips look at every opcode coming in from the bus. To do this, both chips have the same BIU (bus interface unit) and the 8086 BIU sends the status signals of its prefetch queue to the 8087 BIU. This insures that both processors always decode the same instructions in parallel. Since all coprocessor instruction start with the bit pattern 11011, it is easy for the 8087 to ignore all other instructions. Likewise the CPU ignores all coprocessor instructions, unless they access memory. In this case, the CPU computes the address of the LSB (least significant byte) of the memory operand and does a dummy read. The 8087 then takes the data from the data bus. If more than one memory access is needed to load a memory operand, the 8087 requests the bus from the CPU, generates the consecutive addresses of the operand's bytes and fetches them from the data bus. After completing the operation, the 8087 hands bus control back to the CPU. Since the 8087 and CPU are hooked up to the same synchronous bus, they must run at the same speed. This means that with the 8087, only synchronous operation of CPU and coprocessor is possible.

Another 8087 coprocessor instruction can only be started if the previous one has been completed in the NEU (numerical execution unit) of the 8087. To prevent the 8086 from decoding a new coprocessor instruction while the 8087 is still executing the previous coprocessor instruction, a coding mechanism is employed; all 8087-capable compilers and assemblers automatically generate a WAIT instruction before each coprocessor instruction. The WAIT instruction tests the CPU's /TEST pin and suspends execution until its input becomes "LOW". In all 8086/8087 systems, the 8086 /TEST pin is connected to the 8087 BUSY pin. As long as the NEU executes a coprocessor instruction, it forces its BUSY pin "HIGH"; thus, the WAIT opcode preceding the coprocessor instruction stops the CPU until any still-executing coprocessor instruction has finished.

The same synchronization is used before the CPU accesses data that was written by the coprocessor. A WAIT instruction after any coprocessor instruction that writes to memory causes the CPU to stop until the coprocessor has completed transfer of the data to memory, after which the CPU can safely access it.

The 80287

The 80287 coprocessor-CPU interface is totally different from the 8087 design. Since the 80286 implements memory protection via an MMU based on segmentation, it would be much too expensive to duplicate the whole memory protection logic on the coprocessor, which an interface solution similar to the 8087 would have required. Instead, in an 80286/80287 system, the CPU fetches and stores all opcodes and operands for the coprocessor. Information is then passed through the CPU ports F8h-FFh. (As these ports are accessible under program control, care must be taken in user programs not to accidentally perform write operations to them, as this could corrupt data in the math coprocessor.)

The 8086/8088 combination can be characterized as a cooperation of partners with equal rights, while the 80286/287 is more a master-slave relationship. This makes synchronization easier, since the complete instruction and data flow of the coprocessor goes through the CPU. Before executing most coprocessor instructions, the 80286 tests its /BUSY pin, which is tied to the 287 coprocessor and signals if the 80287 is still executing a previous coprocessor instruction or has encountered an exception. The 80286 then waits until the /BUSY signal goes to "low" before loading the next coprocessor instruction into the 80287. Therefore, a WAIT instruction before every coprocessor instruction is not required. These WAITs are permissible, but not necessary, in 80287 programs. The second form of WAIT synchronization (after the coprocessor has written a memory operand) is still necessary on 286/287 systems.

The execution unit of the 80287 is practically identical to that of the 8087; that is, nearly all coprocessor instructions execute in the same number of clock cycles on both coprocessors. However, due to the additional overhead of the 80287's CPU/coprocessor interface (at least ~40 clock cycles), an 8 MHz 80286/80287 combination can have lower floating-point performance than an 8086/8087 system running at the same speed. Additionally, older 286 boards were often configured to run the coprocessor at only two-thirds the speed of the CPU, making use of the ability of the 80287 to run asynchronously. The 80287 has a CKM pin that causes the incoming system clock to be divided by three for the coprocessor if it is tied to ground. The 80286 always divides the system clock by two internally, hence the final ratio of 2/3. However, when the CKM (ClocK Mode) pin is tied high on the 80287, it does not divide the CLK input. This feature has been exploited by the maker of coprocessor speed sockets. These sockets tie CKM high and supply their own CLK signal with a built-in oscillator, thereby allowing the 80287 or compatible to run at a much higher speed than the CPU. With an IIT or Cyrix 287, one can have a 20 MHz coprocessor running with a 8 MHz 80286. However, the floating- point performance of such a configuration does not scale linearly with the coprocessor clock, since all the data has to be passed through the much slower CPU. If the coprocessor executes mostly simple instructions (such as addition and multiplication), doubling the coprocessor clock to 20 MHz in a 10 MHz system does not show any performance increase at all [24].

The Intel 80287XL, the Cyrix 82S87, and the IIT 2C87 contain the internals of a 387 coprocessor, but are pin-compatible to the original 287. These chips divide the system clock by two internally, as opposed to three in the original 80287. Since the 80286 also divides the system clock by two, they usually run synchronously with respect to the CPU, although they can also be run asynchronously.

The 80387

The coprocessor interface in 80386/80387 systems is very similar to the one found in 286/287 systems. However, to prevent corruption of the coprocessor's contents by programming errors, the IO ports 800000F8h-800000FFh are used, which are not accessible to programs. The CPU/coprocessor interface has been optimized and uses full 32-bit transfers; the interface overhead has been reduced to about 14-20 clock cycles. For some operations on the 387 clones that take less than about 16 clock cycles to complete, this overhead effectively limits the execution rate of coprocessor instructions. The only sensible solution to provide even higher floating-point performance was to integrate the CPU and coprocessor functionality onto the same chip, which is exactly what Intel did with the 80486 CPU. The FPU in the 486 also benefits from the instruction pipelining and from the on-chip cache.

This is document aann in the Knowledge Base.
Last modified on 2018-01-18 09:01:05.