Work Around the Bug of "Deprecated DOS Wildcards"

Environment: VC6 SP5, Win2K Pro SP3 or higher

Recently, I've observed quite an excellent example of what can be called "backward comBUGability." It was a big surprise to me when I found that the FindFileFirst Win32 API function finds a match to wildcard by using deprecated DOS rules. For those who forgot about DOS, I'll remind you: DOS it is predecessor of Windows, its roots and nightmare.

In the time of DOS there were no long filenames, and all files must have had names in following format: not more than 8 characters for the name, not more than 3 characters for the extension, and the extension was separated from name by using a dot ('.'). Currently, files can have up to 255 any characters in the name, including non-English letters, spaces, or dots. We do observe "extension" only as on legacy meaning as a quick hint for user: what format the file has; for example, "My_Favourite_Picture.jpeg" is an image in JPEG format.

Next is the wildcard: When you search for a file, you may want to use such words like "any" or "like;" it is as usual as a coffee—you can forget the exact name of the file and search by using part of the name. Instead of introducing mathematically correct (and thus very strict and descriptive) regular expressions for file search operations, Microsoft developers introduced a quick hack: there are two wildcard characters, '?' and '*'. When you perform almost any file operation, you can use '?' as "any single character", and '*' as "any sequence of characters, possibly empty."

It means searching for "*.a" would find "asd.a" and "qqq.a", but not "qw.aa" (extension mismatches) and searching for "qq?.a" would find "qqq.a" and "qqw.a", but not "qqqq.a" (too many characters in name).

It would be a great idea, if this was done correctly. But somehow, the Microsoft guys had developed such an ugly hack that when a '*' was in the name, the match would work ... but not quite okay. In other words, it had worked only for simple searches depicted in the user's manual: something like "*.bak," or "~tmp.*," two '*,' or tricky combination of '?' and '*' makes the wildcard matcher go out of its mind.

In Windows 95, the situation got better ... or maybe, it was worse... Long filenames were added, but "short" filenames were saved for "compatability" with old programs. For each program with a "long" filename, Windows had auto-generated a "short" filename. When a program searches the file by wildcard, Windows decides that a match happens if it matches a long OR short name. For example, if you search "*.aaa" you will find "a.aaa" but also "verylongname.aaaa" because its short name is "verylo~1.aaa" and it matches "*.aaa".

Why am I telling that touching story? Because I've used Windows 2000 for a very long time, I've used Explorer's Find Files dialog, or FAR's manager Find File plugin, and never issued the behavior described above. But recently I had found an excellent utility dirclean by Michael Dunn, which led me to a surprise: that deprecated wildcard matches are built into Windows 2000 (or XP) also! I found this by accident: This useful utility integrates into the shell and deletes intermediate files of Visual C++ builds, by searching with a wildcard. And while deleting, among all others, all "*.res" files which are scratch and rebuildable, it also deletes "*.resx" files, which are NOT!!!

That "feature" (match wildcard either for a long name or a short name) is built into the FindFirstFile/FindNextFile API, which is what MFC's CFileFind uses, which is what Mike used in his utility. And currently, there's no workaround except an article Q164351 (Command Prompt's Treatment of Long File Extensions). I tried this, and ... it was not working for my Windows 2000 pro Sp3. Let me underline that fact again: This article Q164351 is the only workaround for a problem raised nearly 8 years ago, and it is confuscated (probably not working on Win2000/XP) and not so very-easy-to find because one may think the problem is in Command Prompt, or Windows NT! Again: The problem is in the FindFirstFile/FindNextFile API and it touches all Windows versions currently existing (till XP, included).

It looks like a stone had fallen on the head of the programmer who wrote that facility because when you call FindNextFile to get the next file that matches the wildcard, it sets LastError to ERROR_FILE_NOT_FOUND when it had actually found the file. And when the search ends, it returns FALSE not when FindNextFile could not find the next file, which would be more logical, but when it finds the "last" file, and in that case LastError would be more confuscated ERROR_NO_MORE_FILES!!! That bug is so deeply built into Microsoft Windows, that even the newest Whistler's shell32.dll uses that fact and specifically supposes that FindNextFile worked okay, when it returns ERROR_SUCCESS, ERROR_FILE_NOT_FOUND, or ERROR_NO_MORE_FILES. The real comBUGability over years...

An excellent solution would be a fix at the filesystem level (a patch to kernel32.dll; for example, FindFirstFileEx has plenty of reserved fields that can be used to specify a search method: "only longnames," "only shortnames," "or both," with the default setting user controlled) or a filesystem filter driver...

A good solution would be a regexp search or at least a "true" wildcard matching algorithm built into CFindFile or one of its derivations. Unfortunately, "true" wildcard matching takes n*m time, where n is length of wildcard and m is length of filename to match against... Anyway, if a number of people would ask for "true wildcard matching" algorithm, I could dig it out of my ACM archives, or write a new from scratch. Contact me for more details.

So, I propose a workaround (which is enough for make Mike's program work well, and that's another reason we won't change his program heavily), a CAdvancedFileFind class, the derivate from MFC's CFileFind. Therein I simply (with some heuristic) compare short and long extensions of any found file and claim it is a bug, if they are differ and the user supplies FIND_FIRST_EX_DO_EXACT_MATCH during FindFile call.

That is what CAdvancedFileFind gives when the bug takes place and its "FindFile" operation was called as an "ordinary" FindFile of CFileFind:

And that is the bug fixed: advanced FindFile was called with FIND_FIRST_EX_DO_EXACT_MATCH flag set.

Features

  • Overides the FindFile operation to store a parameter called dwUnused in the original FindFile. It has now become a flag with a possible value of "FIND_FIRST_EX_DO_EXACT_MATCH".
  • Overides the CloseContext method to drop a stored flag to default value.
  • Overides the FindNextFile operation to fix the bug if the stored flag asks it.
  • Provides HasFoundShortInsteadOfLong property to detect whether last FindNextFile had found a "wrong match" (for example, a long filename mismatch, while the short matches).
  • Provides a IsLastFile property, thus fixing another MFC glitch: CFindFile defines m_bGotLast but does not uses it, thus letting people work as the API does: FindNext returns FALSE when it encounters an error or when it finds the last file (and in that case "error" would be "ERROR_NO_MORE_FILES"). I had to change this a bit: now, when FindNextFile does repetitive searches, it may have the last filename "matched wrongly" as well, so error condition ERROR_PATH_NOT_FOUND is returned in that case, not to confuse it with ERROR_FILE_NOT_FOUND returned in the normal case and ERROR_NO_MORE_FILES returned when the original's API finds the last file.
  • For those who are interested in short names (they are called "alternative" names in terms of Win32), the GetShortFileName method is provided. Windows does not create "alternatives" for files with short names only, so there is also a flag in GetShortFileName, asking it to generate a short name if one does not exist.

To use the new features of CAdvancedFileFind, you'll have to:

  • Attach AdvancedFileFind.cpp/AdvancedFileFind.h into your project
  • #include "AdvancedFileFind.h" into the source file where you will use it (or stdafx.h, for example)
  • Change CFileFind instances into CAdvancedFileFind
  • Change the main search loop a bit: if it was something like:
           BOOL bWorking = finder.FindFile(strPattern);
           while (bWorking)
           {
              bWorking = finder.FindNextFile();
    
              ... work with file found
           }
    
    change it to:
           BOOL bWorking = finder.FindFile(strPattern, 
                           FIND_FIRST_EX_DO_EXACT_MATCH);
           while (bWorking)
           {
              bWorking = finder.FindNextFile();
              if (!bWorking && !finder.IsLastFile())
                   break;
    
              ... work with file found
           }
    
  • Use the provided methods and properties as you like; for example, you may pass FIND_FIRST_EX_DO_EXACT_MATCH or default (0) in the last example, based on the user's decision.

Now, some code from CAdvancedFileFind, showing its main logic. That is how we initialize: Store the parameter passed at FindFile and do heuristic to exclude most cases where "bugfix" is undesired.

BOOL CAdvancedFileFind::FindFile(LPCTSTR pstrName /* = NULL */,
                DWORD dwUnused
                /* = FIND_FIRST_EX_DO_EXACT_MATCH */)
{
  m_dwLastUnused = dwUnused; // for we can use it in FindNextFile
  m_bNeedFix = false;

  // in fact, this is just a hack: bug takes place when either
  // long name, or short name matches the mask, but in the name
  // of speed, we take following approach ...
  TCHAR *pstrStar = _tcsrchr(pstrName, _T('*'));
  TCHAR *pstrDot = _tcsrchr(pstrName, _T('.'));

  // ...bug takes place if ...

  // ...mask has star and has extension...
  if ((pstrStar != NULL) && (pstrDot != NULL) &&
    (FIND_FIRST_EX_DO_EXACT_MATCH == (dwUnused &
     FIND_FIRST_EX_DO_EXACT_MATCH)))
  {
    // ... and if star is in extension or extension is
    // 3 chars in length without star symbols ('?' symbol works
    // as normal symbol)

    // ... this does not cover some rare search masks
    m_bNeedFix = true;

    int c = 0;
    while (*pstrDot)
    {
      if (*pstrDot != '*')
        c++;
      pstrDot ++;
    }

        // this is commented out because it really behaves badly
    if ((c!=4))    // || (pstrDot > pstrStar))
      m_bNeedFix = false;
  }

  return CFileFind::FindFile(pstrName, dwUnused);
}

This is the main logic: FindNextFile changed from a single invocation of the API to repetitive invocations in cases where we need to fix the bug.

BOOL CAdvancedFileFind::FindNextFile()
{
  if (m_bGotLast)
  {
    SetLastError(ERROR_PATH_NOT_FOUND);
    m_bGotLast = FALSE;
    return FALSE;
  }
  BOOL bResult; DWORD dwErr;
  bResult = CFileFind::FindNextFile();
  dwErr = GetLastError();

  // FIX to MFC's bug! Guess they mean exactly that introducing
  // m_bGotLast member variable
  m_bGotLast = (bResult == FALSE) &&
               ( dwErr==ERROR_NO_MORE_FILES );

  // I really don't know why, but my system (Win2000 Pro, sp3)
  // returns ERROR_FILE_NOT_FOUND when search is Ok, or...
  // those M$-iers... this seems to be "the rule",
  // because shell32.dll uses the same condition as following "if"
  if ((m_bGotLast || (dwErr == ERROR_SUCCESS) ||
                     (dwErr == ERROR_FILE_NOT_FOUND)) &&
    (FIND_FIRST_EX_DO_EXACT_MATCH == (m_dwLastUnused &
     FIND_FIRST_EX_DO_EXACT_MATCH))
    )
    while(1)
    {
      if (HasFoundShortInsteadOfLong())
      {
        if (!bResult)    // if it was the last file...
        {
          // I hope no one will treat "ERROR_PATH_NOT_FOUND"
          // like "got the last file"... really should return
          // ERROR_NO_MORE_FILES but I guess lot of people are
          // already using it to detect "last file" condition,
          // damn! Microsoft really has to overwhelm the find
          // file facility!!!
          SetLastError(ERROR_PATH_NOT_FOUND);
          m_bGotLast = FALSE;
          break;
        }
        bResult = CFileFind::FindNextFile();

      } else {
        // find real match
        break;
      }
    }
  return bResult;
}

And this is the function that detects that the bug took its place... you may change it to "true" wildcard matching, if you like, and then throw off heuristic at the FileFind operation.

BOOL CAdvancedFileFind::HasFoundShortInsteadOfLong() const
{
  ASSERT(m_hContext != NULL);
  ASSERT_VALID(this);

  BOOL bRet = FALSE;

  // if need to fix the bug
  while ((m_pFoundInfo != NULL) && m_bNeedFix)
  {
    LPWIN32_FIND_DATA pFindInfo;
    TCHAR *pExtLong, *pExtShort;
    int nLen1 = 0, nLen2 = 0, c;

    pFindInfo = (LPWIN32_FIND_DATA) m_pFoundInfo;
    pExtLong = pFindInfo->cFileName;
    pExtShort = pFindInfo->cAlternateFileName;

    // if cAlternateFileName is empty =>
    // no long filename => bug does not take place
    if (_T('\0') == pExtShort[0])
      break;

    // calculate the length and position at the last character
    // of the string
    while (*pExtLong) { pExtLong++; nLen1++; }
    while (*pExtShort) { pExtShort++; nLen2++; }

    c = nLen1;
    if (c==0) break;    // exit
    while (c>0)
    {
      // last "extension" is what we will see in "shortened" name
      if (*pExtLong == '.')
        break;
      c--;
      pExtLong--;
    }

    c = nLen2;
    if (c==0) break;    // exit
    while (c>0)
    {
      // last "extension" is what we will see in "shortened" name
      if (*pExtShort == '.')
        break;
      c--;
      pExtShort--;
    }

    // now compare extensions
    if (*pExtLong == '.')
    {
      int res = _tcsicmp(pExtLong, pExtShort);
      if (res > 0)
      {
        // ok, fix the bug
        bRet = TRUE;
      }
    }

    break;
  }

  return bRet;
}

Downloads

Download demo project - 9 Kb
Download source - 16 Kb