Click to See Complete Forum and Search --> : Can someone explain with this strcpy algorithm ?


Jackyquah
April 9th, 2006, 12:52 PM
I got it from microsoft visual studio source code.


strlen proc

.FPO ( 0, 1, 0, 0, 0, 0 )

string equ [esp + 4]

mov ecx,string ; ecx -> string
test ecx,3 ; test if string is aligned on 32 bits
je short main_loop

str_misaligned:
; simple byte loop until string is aligned
mov al,byte ptr [ecx]
add ecx,1
test al,al
je short byte_3
test ecx,3
jne short str_misaligned

add eax,dword ptr 0 ; 5 byte nop to align label below

align 16 ; should be redundant

main_loop:
mov eax,dword ptr [ecx] ; read 4 bytes
mov edx,7efefeffh
add edx,eax
xor eax,-1
xor eax,edx
add ecx,4
test eax,81010100h
je short main_loop
; found zero byte in the loop
mov eax,[ecx - 4]
test al,al ; is it byte 0
je short byte_0
test ah,ah ; is it byte 1
je short byte_1
test eax,00ff0000h ; is it byte 2
je short byte_2
test eax,0ff000000h ; is it byte 3
je short byte_3
jmp short main_loop ; taken if bits 24-30 are clear and bit
; 31 is set

byte_3:
lea eax,[ecx - 1]
mov ecx,string
sub eax,ecx
ret
byte_2:
lea eax,[ecx - 2]
mov ecx,string
sub eax,ecx
ret
byte_1:
lea eax,[ecx - 3]
mov ecx,string
sub eax,ecx
ret
byte_0:
lea eax,[ecx - 4]
mov ecx,string
sub eax,ecx
ret

strlen endp


what is .FPO ?
what did it mean testing if the string aligned on 32 bit ?
what is string misalign ?
what 'align 16' for in code ? redudant ?

olivthill
April 9th, 2006, 01:24 PM
The .FPO directive controls the emission of debug records to the .debug$F segment or section. FPO (cdwLocals, cdwParams, cbProlog, cbRegs, fUseBP, cbFrame).

A string is aligned on 32 bit if its adress divided by 4 is zero, which is the same as having the two less signifignant bits to 0, which is the same as having 0 when the address is ORed with 3.

align 16 in your code is not redundant because it is used for aligning the code of the main_loop loop, not for aligning the address of the variable called 'string'.

(Bonus, a silly joke:
Q: Do you know the name of the lead singer of the group called pRolice?
A: StRing.)

Jackyquah
April 9th, 2006, 03:03 PM
thanx for the answer,
but why it's has to be 32 bit align ?
and
I know 'align 16' is for the code not for string. but what for 'align 16' ? is it speed up the process or something else ?
what kind of effect if I removed the 'align 16'

and the last is when I check the algorithm of main loop it's read 4 byte to check for null.
what if null or 0 value in the first byte from 4 byte it read. and last 3 byte that from the memory isn't belong to the process, isn't it memory violation because the code try to read from memory that not belong them ?

olivthill
April 10th, 2006, 04:25 AM
Okay, here are new comments:

1. In the title of this post, the word, strcpy should be replaced with strlen.

2. The header of the file should mention
Entry: const char * str - string whose length is to be computed
Exit: EAX = length of the string "str", exclusive of the final null byte

3. The code comes from vc\crt\src\[cpu vendor]\strlen.asm

4. The first comment, ecx -> string should be string -> ecx.

5. This program is based on an algorihm from Magmai Kai Holmlor:
int strlen(const char* cszStr)
{
DWORD* p = (DWORD*) cszStr;
DWORD k,kk;

while(DWORD(p)&3)
{
if(*(char*)(p)==0)
return (char*)(p)-cszStr;
p = (DWORD*) (((char*)(p)) + 1);
}

do
{
k = *p;
kk = k + 0x7efefeff;
k ^= -1;
k ^= kk;
p++;
} while(!(k&0x81010100));

k = *(--p);
if(!(k&0x000000ff))
return (char*)(p)-cszStr;

//if(!(k&0x0000ffff)) Thanks Kippesoep!
if(!(k&0x0000ff00))
return (char*)(p)-cszStr+1;

if(!(k&0x00ff0000))
return (char*)(p)-cszStr+2;

if(!(k&0xff000000))
return (char*)(p)-cszStr+3;

return (char*)(p)-cszStr;
}
I found it at http://www.gamedev.net/community/forums/topic.asp?topic_id=315709

6. Yes, it is very likely that "align 16" was put to speed up the process.

7. You may remove "align 16". The program will return exactly the same result.

8. As a matter of fact, the 32-bit alignment does not concern the string, but the address in ecx. The string may be at any address. It's just that, before entering main_loop, ecx should contain an address on a 32-bit boundary.

9. The purpose of having an aligned address in ecx is to be able to scan the string, not one byte at a time, but four bytes at a time, in order to speed up the process.

10. You're right, the process can read 1, 2, or 3 bytes beyond the end of the string, since it reads the string by chunks of four bytes.