MCS-48: The quest for 16-bit division on the 8-bit CPU which can’t divide anything

Recently while working on my MCS-48 temperature sensor project I had to confront one of the largest challenges, which is to implement an option where changing a jumper displayed Fahrenheit instead of Celsius. The output from the DS18B10 temperature sensors is Celsius only, as it should be, so a conversion would need to be performed.

$Fahrenheit = \frac{Celsius\times 9}{5} + 32$

A quick reminder of the conversion needed to be performed

In my case it’s +320 as I’m using fixed point arithmetic i.e. 15.3° = 153. This is a tough conversion to perform on an MCS-48 as a slew of different operations are required:

16-bit negate (more about that below)
16-bit unsigned multiply
16-bit unsigned divide
16-bit add with wrap-around

A tall order for a CPU which has just one math instruction: 8-bit add. To perform all of this, one must sling a long sequence of primitive instructions together. Since this is an assembly language only architecture, I couldn’t cheat by compiling it in C and pinching the resulting instructions as I have done for PIC16 in the past.

Edit: Since publishing this a cornucopia of different approaches to this problem have been proposed to me. More about a couple of those at the end of the article. I’m happy with implementation as it currently stands. It works, it makes very efficient use of program memory, the improved performance is not required, and the above math routines are re-used for other tasks in the project.

The best place to find such examples is in the MCS-48 assembly language manual:

(A scan of it can be viewed here). Everything I needed was in there, except how to divide. There is no mention whatsoever in that manual of how to perform any kind of division operation, but, tacked in the back of the 1980 edition MCS-48 handbook, I found this:

Woohoo! A working divide routine for MCS-48. It wouldn’t OCR, so I typed it up.

Unfortunately that routine only has an 8-bit quotient, which isn’t much use for my application because it would overflow in most cases.

I was easily able to work around that with the following implementation (pseudo code):

if (celsius < 0)
{   // If negative, note it, and make it positive
    // so we can work with simpler unsigned routines below
    celsius = -celsius;
    is_negative = 1;
}

fahrenheit = multiply_16x8r16(celsius, 9);

// Divide by 50 so the result of divide_16x8r8 doesn't overflow
fahrenheit = divide_16x8r8(fahrenheit, 50, &remainder);

// Multiply it back up
fahrenheit = multiply_16x8r16(fahrenheit, 10);

// Factor remainder
remainder = multiply_16x8r16(remainder, 10);

// Divide and add remainder
fahrenheit += divide_16x8r8(remainder, 50, NULL);

if (is_negative)
{   // Make it negative again, if it was previously
    celsius = -celsius;
}

// Add 32. Requires a 16-bit add with wrap around to
// correctly handle negative temperatures
fahrenheit += 320

While that does the job, it’s poo poo. What I really want is that complicated looking divide routine to have a 16-bit quotient, so I can do the division in a single operation. To help me understand it, I translated it to C code:

uint8_t mcs48_divide(uint16_t dividend, uint8_t divisor, uint8_t *remainder)
{
    if ((dividend >> 8) >= divisor)
        goto mcs48_div_exit; // Impossible. Result would
                             // overflow. Bail.

    for (int i = 0; i < 8; i++) // One pass for each bit of result
    {
        uint8_t msb;
        uint8_t bit15_was_set = 0;

        if (dividend & 0x8000)
            bit15_was_set = 1; // Note if this was set,

        dividend <<= 1; // Next bit

        msb = (dividend >> 8);
        if (msb >= divisor || bit15_was_set)
        {
            // Subtract remainder from MSB,
            // preserve and increment quotient
            dividend = (((msb - divisor) << 8) | (dividend & 0xFF)) + 1;
        }
    }

mcs48_div_exit:
    *remainder = (dividend >> 8);
    return (dividend & 0xFF);
}

It’s immediately clear that it’s a binary division implementation. What wasn’t immediately clear is how to make the change I wanted. I put the question to stack overflow, and on the face of it, it looked like a dumb question, i.e. just double the integer type sizes, stupid. predictably I got punished with a bunch of down-votes.

Yes we can do that, but it’s not what I want to do to the assembly routine that I’m trying to modify, so perhaps I didn’t quite ask the question as well as I could have done. The answer provided sent me in the right direction, in that the working accumulator needs to be larger, 24-bits in my case, and the 16-bit shift needs to be a 24-bit.

uint16_t mcs48_div16(uint16_t dividend, uint8_t divisor, uint8_t *remainder)
{
    uint32_t accumulator = dividend;

    for (int i = 0; i < 16; i++) // One pass for each bit of result
    {
        uint8_t msb;
        uint8_t bit24_was_set = 0;

        if (accumulator & 0x800000)
            bit24_was_set = 1; // Note if this was set,
                               // can't check if after shift.

        accumulator <<= 1; // Next bit
        accumulator &= 0xFFFFFF; // Simulate 24 bit type

        msb = (accumulator >> 16);
        if (msb >= divisor || bit24_was_set)
        {
            // Subtract remainder from MSB,
            // preserve and increment quotient
            accumulator = (((msb - divisor) << 16) | (accumulator & 0xFFFF)) + 1;
        }
    }

mcs48_div_ext_exit:
    *remainder = (accumulator >> 16);
    return (accumulator & 0xFFFF);
}

Above is the pseudo code of my routine after the changes. In the final implementation registers A, R1 and R2 hold the 24-bit accumulator, so this doesn’t translate too well to C because there isn’t a 24-bit integer type.

The final changes to the routine can be viewed here.

Ah, that’s better.

Other approaches

One reader suggested a very interesting looking approach:

int16_t celsius2fahrenheit(int16_t c)
{
    int16_t f = 0;
    f += c << 4;
    f -= c;
    f -= c >> 1;
    f -= c >> 3;
    f += c >> 5;
    f += 32 << 7;
    f >>= 3;
    return f;
}

I tried it, it doesn’t work but, nonetheless, it’s is interesting. I’m sure that an approach like the above could be made to work and would perform better than my approach.

If one were to translate this to MCS-48 assembler the result would be a big single purpose routine which would likely mean I couldn’t fit into a 1K part anymore without significant re-work.

Another reader suggested using a lookup table (this is mentioned in the DS18B20 datasheet). However with 1800 possible Celsius values, this would be quite some lookup table, agreed the values wouldn’t be entirely random, some cleverness could reduce the size of it, but, even then it’d still take up more program memory than I have spare.

Posted in Vintage microcontrollers

21 thoughts on “MCS-48: The quest for 16-bit division on the 8-bit CPU which can’t divide anything”

Steinar H. Gunderson says:

October 30, 2020 at 1:31 pm

Hi,

Why do you need the divide in the first place? 9 is a constant, so you can use the well-known trick of a multiply and shift instead, which should be much easier to implement.

See e.g. the code GCC generates for x*5/9 on GCC:

0: 8d 14 bf lea (%rdi,%rdi,4),%edx
3: b9 39 8e e3 38 mov $0x38e38e39,%ecx
8: 89 d0 mov %edx,%eax
a: f7 e1 mul %ecx
c: 89 d0 mov %edx,%eax
e: d1 e8 shr %eax

Reply
1. admin says:
  
  October 30, 2020 at 2:45 pm
  
  I do appreciate that there are a variety of more optimum solutions to the problem, and it has been interesting reading about them.
  
  The point of this for me however was to take apart these old canned math routines, understand them and in this case extend it to do more.
  
  I felt that’d be a more valuable legacy for the project than a shortcut solution!
  
  Reply
  1. Zoltan Pekic says:
    
    November 9, 2022 at 3:56 am
    
    The problem here is not a division, but multiplying with constant (9/5 = 1.8), and two general solutions:
    – lookup table (if there is enough memory, but often surprisingly small table is needed as the input range is restricted e.g. usual room temperatures etc.). The table can be made “sparse” by using simple linear interpolation between values (e.g. adding nearby values and shifting down)
    – finding a “shiftable” multiplier. In this case 115/64 is probably sufficiently accurate. Multiplying with 115 is bunch of shifts and adds (or maybe <> 6. An extra bit of accuracy can be squeezed out by using the carry out from shift right and add to result (rounding up to .5)
    
    Reply
kjfoewaijof says:

October 30, 2020 at 3:53 pm

Hopefully that might inspire you.
https://www.avrfreaks.net/forum/dis-asm-dirty-math-tricks-adventures-division-ten

Reply
Bob Paddock says:

October 30, 2020 at 5:42 pm

In the 8048 day’s Intel has a sensor humor.
The 8048 programming manual had this gem:

“The difference between a S_EXCHANGE and a SEX_CHANGE”, is a well placed underscore.”

Which was explaining way the underscore was an acceptable character in a label.

Reply
1. admin says:
  
  October 30, 2020 at 6:37 pm
  
  Where would I find that?
  
  Reply
obelisk of gaul says:

October 30, 2020 at 8:59 pm

I faced this exact problem circa 1987 with the MCS-48. How I loved that little chip and it’s absent minded brother, the 8031.

I had no formal CS education and needed to divide a 24 bit number using a 16 bit divisor with a 16 bit result. Why did we need fancy math?

We used a very inexpensive a/d converter to get raw data. I’ll skip the details, but we got 14 to 16 bits of usable resolution out of a supposedly 6 bit a/d converter via a few old school analog party tricks.
However, while it was very repeatable, the absolute accuracy was iffy at best – or so it seemed.

Turns out that if you regulate voltage, shield like mad and use really good components to make your signal clean and well dressed, you can fix the (actually non) random inaccuracies with a multi-dimensional curve.

And these curves can be calculated through simple measurements based on tables of simple variables. Blah blah blah A.I. and machine learning are just fancy names for simple math techniques used since the beginning of time. We cal this calibration, and it works in as many dimensions as you like.

We collected measurements of performance using a simulator (not in software, but a hacked up oven/refrigerator and series of pumps) to model our wildly changing needs. Not exactly NASA but we did have clipboards.

This information was used to compute a polynomial of the form ((ax^3 / m) + (bx^2 / n) + cx / z )+ offset so that an inexpensive (but repeatable) sensor could be calibrated accurately over a wide range of [stuff] so that we could make our $3 part cost device perform like aerospace rated sensors costing 100x more.

The catch was that although division is really just subtraction, it would take too long to repeatedly subtract… and I needed a few updates per second.

I was no babbage, but I had spent time around bright bunnies and I knew how to do research in the days before the internet. All the research I did on integer math routines referenced the PDP-8, and I figured that the MCS-48 family was probably not too different from the PDP-8. Maybe I could translate?

An expensive phone call to a DECUS group earned me a couple 8″ floppies that I couldn’t read and a printout of a 24 by 16 divide, along with some literature. And so, after a long day or two (I am not smart) I managed to get the subroutine translated and working.

The punchline of this algorithm is that a binary shift is basically a multiply or divide by two (in base 2) and so with a few shifts and the odd subtraction or addition, you can keep the time to compute the remainder much lower than doing n brute force subtractions.

Add a little BCD magic and some placeholding and voila! you have a calculator!

I might still have a floppy of the source, if it hasn’t decayed. These days 32 bit math comes for free.
I highly recommend old rocket science and satellite journals for tips and tricks, as well as a bit of archaeology in the gaming community / demo scenes.

Good luck!

Reply
1. admin says:
  
  November 13, 2020 at 6:53 am
  
  Thanks for sharing the memory. I assume that you meant 8035 instead of 8031 (8031 is a ROMless MCS-51).
  
  Reply
2. Florian says:
  
  November 20, 2020 at 6:57 pm
  
  Hi,
  in the celcius2fahrenheit function is only a small mistake.
  
  f = 0;
  f += c (shift left) 4; // add c * 16; f = c * 16
  f -= c; // sub c; f = c * 15
  f -= c (shift right) 1; // sub c/2; f = c * 14.5
  f -= c (shift right) 3; // sub c/8; f = c * 14.375
  f += c (shift right) 5; // add c/32; f = c * 14.4065
  f += 32 (shift left) 7; // ignore this for now
  f = f (shift right) 3; // f = c * 1.80078125
  
  The constant 9/5 = 1.8, so you calculation error = +0.043%
  The only mistake is the offset 32 (shift left) 7.
  It has to be 32 (shift left) 3.
  
  Reply
Florian says:

November 20, 2020 at 6:43 pm

Hi,
in the celcius2fahrenheit function is only a small mistake.

int16_t f = 0;
f += c <> 1; // f = c * 14.5
f -= c >> 3; // f = c * 14.375
f += c >> 5; // f = c * 14.4065
f += 32 <>= 3; // f = c *1.80078125

The constant 9/5 = 1.8, so you calculation error = +0.043%
The only mistake is the offset 32 << 7. It has to be 32 << 3.

Reply
1. Florian says:
  
  November 20, 2020 at 6:47 pm
  
  Sorry,
  the formulas are wrong diskplayed.
  I just try to fix it.
  
  f += 32 << 7; // ignore this for now
  
  Reply
2. Florian says:
  
  November 20, 2020 at 6:48 pm
  
  f >>= 3; // f = c * 1.80078125
  
  Reply
Florian says:

November 20, 2020 at 6:45 pm

Hi,
in the celcius2fahrenheit function is only a small mistake.

int16_t f = 0;
f += c <> 1; // f = c * 14.5
f -= c >> 3; // f = c * 14.375
f += c >> 5; // f = c * 14.4065
f += 32 <>= 3; // f = c * 1.80078125

The constant 9/5 = 1.8, so you calculation error = +0.043%
The only mistake is the offset 32 << 7. It has to be 32 << 3.

Reply
1. Florian says:
  
  November 20, 2020 at 6:46 pm
  
  f += c <> 1; // f = c * 14.5
  f -= c >> 3; // f = c * 14.375
  f += c >> 5; // f = c * 14.4065
  f += 32 <>= 3; // f = c * 1.80078125
  
  Reply
2. Florian says:
  
  November 20, 2020 at 6:46 pm
  
  f += c <> 1; // f = c * 14.5
  f -= c >> 3; // f = c * 14.375
  f += c >> 5; // f = c * 14.4065
  f += (32 <>= 3); // f = c * 1.80078125
  
  Reply
3. Florian says:
  
  November 20, 2020 at 6:50 pm
  
  Ok the problem are the greater than and less than signs.
  
  f = 0;
  f += c shift left 4; // f = c * 16
  f -= c; // f = c * 15
  f -= c shift right 1; // f = c * 14.5
  f -= c shift right 3; // f = c * 14.375
  f += c shift right 5; // f = c * 14.4065
  f += 32 shift left 7; // ignore this for now
  f shift right= 3; // f = c * 1.80078125
  
  Reply
  1. Florian says:
    
    November 20, 2020 at 6:51 pm
    
    I hope you can read it correctly.
    
    Reply
  2. Florian says:
    
    November 20, 2020 at 7:05 pm
    
    The less-than-sign (U+003C) and greater-than-sign (U+003E) makes trouble.
    
    Reply
Florian says:

November 20, 2020 at 7:01 pm

Please remove my other posts.
The makes trouble.
Sorry.

Here in correct form.

Hi,
in the celcius2fahrenheit function is only a small mistake.

f = 0;
f += c (shift left) 4; // add c * 16; f = c * 16
f -= c; // sub c; f = c * 15
f -= c (shift right) 1; // sub c/2; f = c * 14.5
f -= c (shift right) 3; // sub c/8; f = c * 14.375
f += c (shift right) 5; // add c/32; f = c * 14.4065
f += 32 (shift left) 7; // ignore this for now
f = f (shift right) 3; // f = c * 1.80078125

The constant 9/5 = 1.8, so you calculation error = +0.043%
The only mistake is the offset 32 (shift left) 7.
It has to be 32 (shift left) 3.

Reply
1. Florian says:
  
  November 20, 2020 at 7:03 pm
  
  The “” &th; makes trouble.
  
  Reply
2. Florian says:
  
  November 20, 2020 at 7:06 pm
  
  The less-than-sign (U+003C) and greater-than-sign (U+003E) makes trouble.
  
  Reply