How does a string comparison actually work?

Implementation in PHP

This is the implementation of the strcmp function in the PHP standard library. As you can see it is calling MEMCMP(3):

c
        
int zend_binary_strcmp(const char *s1,
                        size_t len1,
                        const char *s2,
                        size_t len2)
{
    int retval;

    if (s1 == s2) {
        return 0;
    }
    retval = memcmp(s1, s2, MIN(len1, len2));
    if (!retval) {
        return (int)(len1 - len2);
    } else {
        return retval;
    }
}
        
      

Note that the s1 == s2 check is a reference comparison and only checks if the same string is passed twice.

See full source here: Zend/zend_operators.c#L2640

Implementation in libc

I will now dig into the underlying implementation (of PHP) which is the standard memcmp function from the libc.
Here is a simplified version of the actual implementation (the full implementation can be found in libc's sources in string/strcmp.c):

c
        
# include <sys/types.h>

# define op_t unsigned long int

typedef unsigned char byte;

int memcmp (const void *s1, const void *s2, size_t len)
{
  op_t a0;
  op_t b0;
  long int srcp1 = (long int) s1;
  long int srcp2 = (long int) s2;
  op_t res;

  // […]

  /* There are just a few bytes to compare.
    Use byte memory operations.  */
  while (len != 0)
    {
      a0 = ((byte *) srcp1)[0];
      b0 = ((byte *) srcp2)[0];
      srcp1 += 1;
      srcp2 += 1;
      res = a0 - b0;
      if (res != 0)
    return res;
      len -= 1;
    }

  return 0;
}
        
      

When the string length is greater than OP_T_THRES (the defined value is currently 16) another algorithm is used for the comparison. I decided to skip it because it's more complicated due to some optimizations.
Don't worry if this does not sound familiar to you, I will be going through this code.

Let's see how the algorithm works when you compare ABC against ABO.

It uses a loop to scan both strings and checks letter by letter their equality.

Comparing bytes

You might have noticed that the equality algorithm is a subtraction (res = a0 - b0;). It uses the char code representation to compare if they are equal.
For example:

c
        
res = 'a' - 'a';
res = 97 - 97;
res = 0; // Both char are equal

res = 'c' - 'o';
res = 99 - 111;
res = -12; // Not equal
        
      

In most high level language you would just compare the two strings ('a' == 'a'), and this is what is going on under the hood.

Timing attack

First of all, if your password checking relies on string comparison you have probably done something wrong.
Take the following example:

PHP
        
if (strcmp("toto", $password) !== 0) {
    return false
}
        
      

You could bruteforce the $password here because memcmp has a Θ(n) time complexity (where n is the number of common bytes in the string).
You can find other implementations in c which are not Θ(n), some are even time constant but they are not commonly used.

It's way safer to use real cryptography for this use case. There are more subtle attack of this kind.

Reach out

Say hello: [email protected].

Ping me on Twitter: @svensauleau.