How do realloc and memcpy work?

Let’s take a little closer look at memcpy and, while we’re at it, at “big O” or Landau notation.

First, big-O. As i’ve talked about elsewhere, it’s worth remembering the definition of big O, which is that some function g(n) is said to be O(f(n)) when there exists a constant k for which g(n)kf(n). What the constant does is lets you ignore the little details in favor of the important part. As everyone has noted, memcpy of n bytes will be O(n) in most any normal architecture, because no matter what you have to move those n bytes, one chunk at a time. So, a first, naive implementation of memcpy in C could be written

unsigned char *
memcpy(unsigned char * s1, unsigned char * s2, long size){
    long ix;
    for(ix=0; ix < size; ix++)
        s1[ix] = s2[ix];
    return s1;
}

This is in fact O(n), and might make you wonder why we even bother with a library routine. however, the thing about the libc functions is that they are the place where platform-specific utilities get written; if you want to optimize for the architecture, this is one of the places you can do it. So, depending on the architecture, there may be a more efficient implementation options; for example, in the IBM 360 archiecture, there is a MOVL instruction that moves data is big chunks using very highly optimized microcode. So in place of that loop, a 360 implementation of memcpy might instead look something like

LR 3,S1      LOAD S1 ADDR in Register 3
LR 4,S2      
MOVL 3,4,SIZE

(No guarantees that’s exactly right 360 code by the way, but it’ll serve for an illustration.) This implementation looks like instead of doing n steps around the loop as the C code did, it just executes 3 instructions.

What really happens, though, is that it’s executing O(n) micro instructions under the covers. What’s different between the two is the constant k; because the microcode is much faster, and because there’s only three decode steps on the instructions, it is dramatically faster than the naive version, but it’s still O(n) — it’s just the constant is smaller.

And that’s why you can make good use of memcpy — it’s not asymptotically faster, but the implementation is as fast as someone could make it on that particular architecture.

Leave a Comment