Article from HP newsletter Basic Exchange, Vol. 1 No. 2, Fall 1980

The HP-85’s Revolutionary CPU

When Hewlett Packard entered the personal comuter market with the HP-85, we brought with us 10 years of experience in personal computation. You can see it in a number of ways: the user definable keys; the exceptionally high quality documentation; the friendliness and power the operating system provides. Have you noticed when you turn on the machine how quiet it is? The need for a cooling fan was eliminated by designing a highly efficient switching power supply and by writing firmware that has only one of the input/output devices operating at a time.

But there are many more subtle ways that this product is indelibly stamped HP. The custon CPU is a case in point. Its architecture is basically different from that of any other CPU. An accumulator, which is used often in other CPU’s, is notably absent. And by one definition, the HP-85 CPU is a 64-bit machine! It has multibyte internal registers that allow direct manipulation of operands up to 64 bits in length.

We’ll go on to describe it in detail for those who are interested, but let’s first put some questions to rest. Why does HP use its own custom CPU? One of the most important reasons is ACCURACY. Where other computers doe binary arithmetic, the HP-85 does its arithmetic in BCD (binary coded decimal). Our calculators do BCD arithmetic, and we’ve spent years testing and perfecting those algorithms, so we KNOW that the answers returned by the HP-85 are accurate to 12 digits. Real number calculations in the hp-85 are performed internally to 15 significant decimal digits and rounded to 12 digits for presentation.

For all the algebraic functions (+,-,x,/,and SQR), the error is no bigger than one-half count in the 12th significant digit, with correct rounding in all situations. For some rational operations such as RMD and MOD there are no errors whatsoever, regardless of the magnitude of the arguments. The transcendental functions are accurate to within one count in the 12th significant digit, except where this specification would be impractical for any machine. There are just two cases:

  1. When yx is bigger than 10200 or smaller than 10-200 , its error may exceed one count in the last place but is always smaller than two counts.
  2. For trigonometric functions of large radian argument there may be an additional error of less than one count in the 12th significant digit. Despite this error, which is significant only when the argument is huge, both sides of the trigonometric identity SIN(2x) = 2SIN(x)COS(x) agree to all 12 significant digits when, for example, x = 52,174 radians. In general, every trigonometric identity that doesn’t explicitly involve pi will be satisfied to within the rounding error of each trigonometric function that appears in it. This includes all cases when angles are expressed in degrees or grads.

The point is that Hewlett Packard customers have learned to trust the answers their machines provide and they expect that accuracy. And remember, even if you need only three significant digits in the final answer, errors can propagate to the left quite rapidly, especially when a large number of repetitive calculations are performed. So 12 digit accuracy is important even for dollar and cent calculations.

Another effect of doing BCD arithmetic is the "perceived accuracy" of the results. The perceived accuracy comes from the avoidance of binary arithmetic anomalies, which most people are not used to. The anomalies in the base 10 system are commonplace; most people understand them and know how to deal with them. (For instance, 3 times 1/3 not equaling one because 1/3 is not exactly representable with decimal numbers.) In the binary system different anomalies occur. For example, 0.01 (base 10) cannot be represented exactly in the binary number system, so if 0.01 is used on a binary machine for a loop counter, an incorrect number of loops results to the frustration of a user who is unfamiliar with binary numbers.

Speed

There are a couple of ways to build a CPU – make it simple and run it fast, or make it complex and run it slow. Most manufacturers try to drive their CPU’s as fast as possible. Four megahertz is typical – the speed is limited by the physical characteristics of the device, like capacitance. We have followed the second strategy, performing many operations during each clock cycle. The clock speed is 613 kilohertz!

But, execution speed is more than just a function of hardware design, and the HP-85 executes programs on the same order of time or faster than other personal computers on the market. In concert with the design of the hardware, the firmware design equally effects the speed of the system. HP-85 BASIC programs are executed by an interpreter, but the code that is interpreted is very different from the BASIC commands as they were originally entered. As statements are entered into the HP-85, they are compiled to a form of RPN, which can be interpreted more efficiently than the BASIC source statements. One of the things that has traditionally made interpreters slow is that they maintain a table of variables which must be searched at run-time for each variable reference. In the HP-85 this problem is solved by preallocation of all variable references. During the allocation process the variable names that occur in the internal RPN form of the program are replaced by the relative addresses of the variables. Then, at run-time, the interpreter has only to read an address and add it to the base address of the program to determine the absolute location of the variable being referenced. For traditional interpreters the time required to access any variable is dependent upon its position in the variable table. In the HP-85 all variables are accessed in exactly the same amount of time. All line number references are also replaced by the relative address of the referenced line during the allocation process, eliminating the need to search for referenced lines at run-time. This allocation process also makes possible the convenient REN command, which renumbers not only the line numbers, but all GOTO and GOSUB statements as well.

But all of this is another story. The point is that the HP-85 is a machine that is inherently accurate, fast, and affordable. You might argue, though, that there is a penalty for using our own custom CPU, that software written in machine language for the more common varieties of CPU’s must be rewritten to run on the HP-85. And the task of translating these assembly level programs is not straightforward; the HP-85 relies on its unique architecture for its speed. Anything written for a Z-80, for example, translated straight across, would utilize only a small fraction of the CPU’s power.

Well, even there we’ve got you covered! First our application engineers have already developed a great deal of high-quality software, the Application Pacs. Next, there are the contributed programs and the new Series 80 Solutions Books available from the User’s Library. And lastly, there is the third-party software available through our Software Supplier Program. The latter two sources are fully covered elsewhere in this issue.

But really, incompatible software is a short-term problem. Most forthcoming software will necessarily have to written in high-level language, making it independent of the actual processor used. The reason is due to increasing software costs and the desire to implement more complex programs. No one will be able to afford to develop important new applications in machine language, especially since it will be unnecessary. Because of the increased capabilities of future-generation microprocessors, coding efficiency won’t really matter. Stretch it to the limit and you’ll see what I mean – perhaps by the year 1999 all the software ever written will run in less than a minute. But you don’t have to wait until then; forward thinking software houses have already shifted gears.

Architecture

The custom CPU used in the HP-85 incorporates many features currently not found in other microprocessors, such as the instructions that operate on date one to EIGHT BYTES in length (simplifying multibyte arithmetic and facilitating string manipulations). It also contains features found in other microprocessors that have proven to be desirable.

Figure 1 is a simplified block diagram of the microprocessor. Notice that the architecture is of the classical textbook variety. The three buses constitute two sources and one destination. Data for the ABUS can come from the external bus or the 64 byte on chip memory. Data for the DBUS comes only from the on-chip memory. The ALU and shifter are in series and are capable of full eight-bit parallel binary and decimal (BCD) arithmetic. Control of the entire CPU is handled by a programmable logic array.

One of the original design goals was to keep the pin count low to minimize package cost, save space, and lower the power requirements (the CPU fits into a 28-pin package). So a single, time-multiplexed, eight-bit bus is used to transport data, commands, and addresses. It turns out to be very efficient for the instruction set implemented.

When more than one byte of data is sent along the bus, an address is needed only for the first byte. The remaining data is assumed to be in consecutive locations. The memory controller and I/O units handle this by incrementing their own address registers upon seeing either a read or a write instruction. Also, the instruction-fetch sequence was designed so as not to require an updated address when fetching from some consecutive locations. As a result, the bus primarily moves data and instructions and tries to minimize the movement of addresses.

The heart of the CPU is the register bank comprising 64 bytes of RAM. The organization is shown in figure 2, with the locations numbered in octal from 0 to 77. The top eight bytes are special purpose registers. Locations 6 and 7 house the stack pointer for the subroutine return address stack. Locations 4 and 5 provide storage for the program counter. Locations 2 and 3 are scratch registers used for index address calculations. Register 0 can be used as a pointer into the registers. The remaining locations are completely general purpose.

The heavy lines in figure 2 are called boundaries. In the first 32 bytes, there is a boundary every two bytes. In the next 32 bytes there is a boundary every eight bytes. The purpose behind this partitioning is simple: 16 bits are essential for address manipulation, and 64 are handy for representing a floating point quantity. The register array is therefore capable of holding up to four floating point numbers and twelve 16-bit addresses. The advantages of 16-bit addresses is, of course, that page boundaries are completely eliminated – the programmer can get at any of the 216 locations with any addressing mode.

The 64-byte register bank is designed as a two-read, one write memory. Two independent locations from the memory are read at the beginning of a cycle and new information is written into one of those locations at the end of a cycle. THIS ELIMINATES THE NEED FOR AN ACCUMULATOR, WHICH IS A COMMON BOTTLENECK IN MANY MICROPROCESSORS.

Two different address pointers are therefore associated with this memory. The address register pointer (ARP) and the data register pointer (DRP) are independent six-bit locations. Both the ARP and the DRP can be used to address any of the locations in the CPU register bank.

Instructions may be either in a "multibyte" mode or in a "single" mode. As the name suggests, a "multibyte" operation involves a string of bytes rather than just a single byte. The important point is this: The string may consist of from one to eight consecutive processor locations. The actual locations involved in a multibyte operation are those inclusively between where DRP points and the next boundary. The next boundary is the one in the direction of increasing addresses.

The following examples should help explain this idea.

  1. A multibyte increment with DRP set to 70 results in an increment of the 64-bit quantity stored between locations 70 and 77. Higher addresses always refer to more significant data.
  2. A multibyte test with DRP set to 44 results in status being set according to the data found in registers R44, R45, R46, and R47. Location R47 is the most significant byte.

If an instruction is not in the multibyte mode, it is necessarily in the single byte mode. In this case the byte referenced by DRP is the only consideration. A single byte negation with DRP set to 62 negates R62.

So far, only monadic operations have been used as examples. What happens if a two operand instruction is executed in the multibyte mode? Here is where the CPU really shines! In this case, ARP points to one operand and DRP points to the other. DRP still determines the number of bytes for its operand. The other operand consists of the same number of bytes starting with the location pointed to by ARP. If data is written into a register at the end of a cycle, it goes into the register the DRP points to. For example:

  1. A multibyte add with ARP set to 50 and DRP set to 60 results in the 64-bit quantity starting with R60 being added to the 64-bit quantity starting with R50. The sum is stored in R60 through R67.
  2. A multibyte load with ARP set to 11 and DRP set to 74 transfers four bytes, beginning with R11, to locations R74, R75, R76, and R77.

Status Flags

There is one more thing you might like to know about. The microprocessor contains eight flip-flops and a four-bit register for program status, and some of these are quite unique.

The flip-flops serve as flags to signal the present condition of the data, while the four-bit register serves as an extended register for counting and data manipulation. A short description is given below for some of these.

--back--