Click to See Complete Forum and Search --> : Add, Sub, Mul, Div


Tisler
April 19th, 2005, 04:28 AM
Among those four operations, could you tell me which one run fastest ? I ask this here because I am taking a basic course on assembly and having a home work to solve bit counting problem.... I am wondering if i can make my program run a bit faster if I could make a correct choice, although admittedly, everything is just what I am....intending to do. I am not sure about this little problem though,

Thank you for any help,

Andreas Tisler

Hobson
April 19th, 2005, 07:47 AM
Unfortunately, instruction clocks were included only in reference to 486 proccessor, and I cannot find them in new Intel Instruction Set Reference. So these numbers can be a bit outdated.


ADD
Clocks Size
Operands 808x 286 386 486 Bytes

reg,reg 3 2 2 1 2
mem,reg 16+EA 7 7 3 2-4 (W88=24+EA)
reg,mem 9+EA 7 6 2 2-4 (W88=13+EA)
reg,immed 4 3 2 1 3-4
mem,immed 17+EA 7 7 3 3-6 (W88=23+EA)
accum,immed 4 3 2 1 2-3


SUB
Clocks Size
Operands 808x 286 386 486 Bytes

reg,reg 3 2 2 1 2
mem,reg 16+EA 7 6 3 2-4 (W88=24+EA)
reg,mem 9+EA 7 7 2 2-4 (W88=13+EA)
reg,immed 4 3 2 1 3-4
mem,immed 17+EA 7 7 3 3-6 (W88=25+EA)
accum,immed 4 3 2 1 2-3

MUL
Clocks Size
Operands 808x 286 386 486 Bytes

reg8 70-77 13 9-14 13-18 2
reg16 118-113 21 9-22 13-26 2
reg32 - - 9-38 13-42 2-4
mem8 (76-83)+EA 16 12-17 13-18 2-4
mem16 (124-139)+EA 24 12-25 13-26 2-4
mem32 - - 12-21 13-42 2-4

DIV
Clocks Size
Operands 808x 286 386 486 Bytes

reg8 80-90 14 14 16 2
reg16 144-162 22 22 24 2
reg32 - - 38 40 2
mem8 (86-96)+EA 17 17 16 2-4
mem16 (150-168)+EA 25 25 24 2-4 (W88=158-176+EA)
mem32 - - 41 40 2-4
SHL/SAL & SHR
Clocks Size
Operands 808x 286 386 486 Bytes

reg,1 2 2 3 3 2
mem,1 15+EA 7 7 4 2-4 (W88=23+EA)
reg,CL 8+4n 5+n 3 3 2
mem,CL 20+EA+4n 8+n 7 4 2-4 (W88=28+EA+4n)
reg,immed8 - 5+n 3 2 3
mem,immed8 - 8+n 7 4 3-5


So, you can notice that you can put like 5-10 ADD\SUB\SHx instructions in place of single MUL or DIV.

And also a little piece of advice:
If you are not going to count all bits set in area of like 50MB of RAM, just use algorithm which is SIMPLER, not FASTER. Its easy to make problems hard with assembler and unnecessary optimizations.

Hob