Work Around the Bug of "Deprecated DOS Wildcards"

Environment: VC6 SP5, Win2K Pro SP3 or higher

Recently, I've observed quite an excellent example of what can be called "backward comBUGability." It was a big surprise to me when I found that the FindFileFirst Win32 API function finds a match to wildcard by using deprecated DOS rules. For those who forgot about DOS, I'll remind you: DOS it is predecessor of Windows, its roots and nightmare.

In the time of DOS there were no long filenames, and all files must have had names in following format: not more than 8 characters for the name, not more than 3 characters for the extension, and the extension was separated from name by using a dot ('.'). Currently, files can have up to 255 any characters in the name, including non-English letters, spaces, or dots. We do observe "extension" only as on legacy meaning as a quick hint for user: what format the file has; for example, "My_Favourite_Picture.jpeg" is an image in JPEG format.

Next is the wildcard: When you search for a file, you may want to use such words like "any" or "like;" it is as usual as a coffee—you can forget the exact name of the file and search by using part of the name. Instead of introducing mathematically correct (and thus very strict and descriptive) regular expressions for file search operations, Microsoft developers introduced a quick hack: there are two wildcard characters, '?' and '*'. When you perform almost any file operation, you can use '?' as "any single character", and '*' as "any sequence of characters, possibly empty."

It means searching for "*.a" would find "asd.a" and "qqq.a", but not "qw.aa" (extension mismatches) and searching for "qq?.a" would find "qqq.a" and "qqw.a", but not "qqqq.a" (too many characters in name).

It would be a great idea, if this was done correctly. But somehow, the Microsoft guys had developed such an ugly hack that when a '*' was in the name, the match would work ... but not quite okay. In other words, it had worked only for simple searches depicted in the user's manual: something like "*.bak," or "~tmp.*," two '*,' or tricky combination of '?' and '*' makes the wildcard matcher go out of its mind.

In Windows 95, the situation got better ... or maybe, it was worse... Long filenames were added, but "short" filenames were saved for "compatability" with old programs. For each program with a "long" filename, Windows had auto-generated a "short" filename. When a program searches the file by wildcard, Windows decides that a match happens if it matches a long OR short name. For example, if you search "*.aaa" you will find "a.aaa" but also "verylongname.aaaa" because its short name is "verylo~1.aaa" and it matches "*.aaa".

Why am I telling that touching story? Because I've used Windows 2000 for a very long time, I've used Explorer's Find Files dialog, or FAR's manager Find File plugin, and never issued the behavior described above. But recently I had found an excellent utility dirclean by Michael Dunn, which led me to a surprise: that deprecated wildcard matches are built into Windows 2000 (or XP) also! I found this by accident: This useful utility integrates into the shell and deletes intermediate files of Visual C++ builds, by searching with a wildcard. And while deleting, among all others, all "*.res" files which are scratch and rebuildable, it also deletes "*.resx" files, which are NOT!!!

That "feature" (match wildcard either for a long name or a short name) is built into the FindFirstFile/FindNextFile API, which is what MFC's CFileFind uses, which is what Mike used in his utility. And currently, there's no workaround except an article Q164351 (Command Prompt's Treatment of Long File Extensions). I tried this, and ... it was not working for my Windows 2000 pro Sp3. Let me underline that fact again: This article Q164351 is the only workaround for a problem raised nearly 8 years ago, and it is confuscated (probably not working on Win2000/XP) and not so very-easy-to find because one may think the problem is in Command Prompt, or Windows NT! Again: The problem is in the FindFirstFile/FindNextFile API and it touches all Windows versions currently existing (till XP, included).

It looks like a stone had fallen on the head of the programmer who wrote that facility because when you call FindNextFile to get the next file that matches the wildcard, it sets LastError to ERROR_FILE_NOT_FOUND when it had actually found the file. And when the search ends, it returns FALSE not when FindNextFile could not find the next file, which would be more logical, but when it finds the "last" file, and in that case LastError would be more confuscated ERROR_NO_MORE_FILES!!! That bug is so deeply built into Microsoft Windows, that even the newest Whistler's shell32.dll uses that fact and specifically supposes that FindNextFile worked okay, when it returns ERROR_SUCCESS, ERROR_FILE_NOT_FOUND, or ERROR_NO_MORE_FILES. The real comBUGability over years...

An excellent solution would be a fix at the filesystem level (a patch to kernel32.dll; for example, FindFirstFileEx has plenty of reserved fields that can be used to specify a search method: "only longnames," "only shortnames," "or both," with the default setting user controlled) or a filesystem filter driver...

A good solution would be a regexp search or at least a "true" wildcard matching algorithm built into CFindFile or one of its derivations. Unfortunately, "true" wildcard matching takes n*m time, where n is length of wildcard and m is length of filename to match against... Anyway, if a number of people would ask for "true wildcard matching" algorithm, I could dig it out of my ACM archives, or write a new from scratch. Contact me for more details.

So, I propose a workaround (which is enough for make Mike's program work well, and that's another reason we won't change his program heavily), a CAdvancedFileFind class, the derivate from MFC's CFileFind. Therein I simply (with some heuristic) compare short and long extensions of any found file and claim it is a bug, if they are differ and the user supplies FIND_FIRST_EX_DO_EXACT_MATCH during FindFile call.

That is what CAdvancedFileFind gives when the bug takes place and its "FindFile" operation was called as an "ordinary" FindFile of CFileFind:

And that is the bug fixed: advanced FindFile was called with FIND_FIRST_EX_DO_EXACT_MATCH flag set.

Features

  • Overides the FindFile operation to store a parameter called dwUnused in the original FindFile. It has now become a flag with a possible value of "FIND_FIRST_EX_DO_EXACT_MATCH".
  • Overides the CloseContext method to drop a stored flag to default value.
  • Overides the FindNextFile operation to fix the bug if the stored flag asks it.
  • Provides HasFoundShortInsteadOfLong property to detect whether last FindNextFile had found a "wrong match" (for example, a long filename mismatch, while the short matches).
  • Provides a IsLastFile property, thus fixing another MFC glitch: CFindFile defines m_bGotLast but does not uses it, thus letting people work as the API does: FindNext returns FALSE when it encounters an error or when it finds the last file (and in that case "error" would be "ERROR_NO_MORE_FILES"). I had to change this a bit: now, when FindNextFile does repetitive searches, it may have the last filename "matched wrongly" as well, so error condition ERROR_PATH_NOT_FOUND is returned in that case, not to confuse it with ERROR_FILE_NOT_FOUND returned in the normal case and ERROR_NO_MORE_FILES returned when the original's API finds the last file.
  • For those who are interested in short names (they are called "alternative" names in terms of Win32), the GetShortFileName method is provided. Windows does not create "alternatives" for files with short names only, so there is also a flag in GetShortFileName, asking it to generate a short name if one does not exist.

To use the new features of CAdvancedFileFind, you'll have to:

  • Attach AdvancedFileFind.cpp/AdvancedFileFind.h into your project
  • #include "AdvancedFileFind.h" into the source file where you will use it (or stdafx.h, for example)
  • Change CFileFind instances into CAdvancedFileFind
  • Change the main search loop a bit: if it was something like:
           BOOL bWorking = finder.FindFile(strPattern);
           while (bWorking)
           {
              bWorking = finder.FindNextFile();
    
              ... work with file found
           }
    
    change it to:
           BOOL bWorking = finder.FindFile(strPattern, 
                           FIND_FIRST_EX_DO_EXACT_MATCH);
           while (bWorking)
           {
              bWorking = finder.FindNextFile();
              if (!bWorking && !finder.IsLastFile())
                   break;
    
              ... work with file found
           }
    
  • Use the provided methods and properties as you like; for example, you may pass FIND_FIRST_EX_DO_EXACT_MATCH or default (0) in the last example, based on the user's decision.

Now, some code from CAdvancedFileFind, showing its main logic. That is how we initialize: Store the parameter passed at FindFile and do heuristic to exclude most cases where "bugfix" is undesired.

BOOL CAdvancedFileFind::FindFile(LPCTSTR pstrName /* = NULL */,
                DWORD dwUnused
                /* = FIND_FIRST_EX_DO_EXACT_MATCH */)
{
  m_dwLastUnused = dwUnused; // for we can use it in FindNextFile
  m_bNeedFix = false;

  // in fact, this is just a hack: bug takes place when either
  // long name, or short name matches the mask, but in the name
  // of speed, we take following approach ...
  TCHAR *pstrStar = _tcsrchr(pstrName, _T('*'));
  TCHAR *pstrDot = _tcsrchr(pstrName, _T('.'));

  // ...bug takes place if ...

  // ...mask has star and has extension...
  if ((pstrStar != NULL) && (pstrDot != NULL) &&
    (FIND_FIRST_EX_DO_EXACT_MATCH == (dwUnused &
     FIND_FIRST_EX_DO_EXACT_MATCH)))
  {
    // ... and if star is in extension or extension is
    // 3 chars in length without star symbols ('?' symbol works
    // as normal symbol)

    // ... this does not cover some rare search masks
    m_bNeedFix = true;

    int c = 0;
    while (*pstrDot)
    {
      if (*pstrDot != '*')
        c++;
      pstrDot ++;
    }

        // this is commented out because it really behaves badly
    if ((c!=4))    // || (pstrDot > pstrStar))
      m_bNeedFix = false;
  }

  return CFileFind::FindFile(pstrName, dwUnused);
}

This is the main logic: FindNextFile changed from a single invocation of the API to repetitive invocations in cases where we need to fix the bug.

BOOL CAdvancedFileFind::FindNextFile()
{
  if (m_bGotLast)
  {
    SetLastError(ERROR_PATH_NOT_FOUND);
    m_bGotLast = FALSE;
    return FALSE;
  }
  BOOL bResult; DWORD dwErr;
  bResult = CFileFind::FindNextFile();
  dwErr = GetLastError();

  // FIX to MFC's bug! Guess they mean exactly that introducing
  // m_bGotLast member variable
  m_bGotLast = (bResult == FALSE) &&
               ( dwErr==ERROR_NO_MORE_FILES );

  // I really don't know why, but my system (Win2000 Pro, sp3)
  // returns ERROR_FILE_NOT_FOUND when search is Ok, or...
  // those M$-iers... this seems to be "the rule",
  // because shell32.dll uses the same condition as following "if"
  if ((m_bGotLast || (dwErr == ERROR_SUCCESS) ||
                     (dwErr == ERROR_FILE_NOT_FOUND)) &&
    (FIND_FIRST_EX_DO_EXACT_MATCH == (m_dwLastUnused &
     FIND_FIRST_EX_DO_EXACT_MATCH))
    )
    while(1)
    {
      if (HasFoundShortInsteadOfLong())
      {
        if (!bResult)    // if it was the last file...
        {
          // I hope no one will treat "ERROR_PATH_NOT_FOUND"
          // like "got the last file"... really should return
          // ERROR_NO_MORE_FILES but I guess lot of people are
          // already using it to detect "last file" condition,
          // damn! Microsoft really has to overwhelm the find
          // file facility!!!
          SetLastError(ERROR_PATH_NOT_FOUND);
          m_bGotLast = FALSE;
          break;
        }
        bResult = CFileFind::FindNextFile();

      } else {
        // find real match
        break;
      }
    }
  return bResult;
}

And this is the function that detects that the bug took its place... you may change it to "true" wildcard matching, if you like, and then throw off heuristic at the FileFind operation.

BOOL CAdvancedFileFind::HasFoundShortInsteadOfLong() const
{
  ASSERT(m_hContext != NULL);
  ASSERT_VALID(this);

  BOOL bRet = FALSE;

  // if need to fix the bug
  while ((m_pFoundInfo != NULL) && m_bNeedFix)
  {
    LPWIN32_FIND_DATA pFindInfo;
    TCHAR *pExtLong, *pExtShort;
    int nLen1 = 0, nLen2 = 0, c;

    pFindInfo = (LPWIN32_FIND_DATA) m_pFoundInfo;
    pExtLong = pFindInfo->cFileName;
    pExtShort = pFindInfo->cAlternateFileName;

    // if cAlternateFileName is empty =>
    // no long filename => bug does not take place
    if (_T('\0') == pExtShort[0])
      break;

    // calculate the length and position at the last character
    // of the string
    while (*pExtLong) { pExtLong++; nLen1++; }
    while (*pExtShort) { pExtShort++; nLen2++; }

    c = nLen1;
    if (c==0) break;    // exit
    while (c>0)
    {
      // last "extension" is what we will see in "shortened" name
      if (*pExtLong == '.')
        break;
      c--;
      pExtLong--;
    }

    c = nLen2;
    if (c==0) break;    // exit
    while (c>0)
    {
      // last "extension" is what we will see in "shortened" name
      if (*pExtShort == '.')
        break;
      c--;
      pExtShort--;
    }

    // now compare extensions
    if (*pExtLong == '.')
    {
      int res = _tcsicmp(pExtLong, pExtShort);
      if (res > 0)
      {
        // ok, fix the bug
        bRet = TRUE;
      }
    }

    break;
  }

  return bRet;
}

Downloads

Download demo project - 9 Kb
Download source - 16 Kb


Comments

  • FindFirstFile *.abc result depends on disable8dot3

    Posted by christianchristian on 02/05/2010 04:52am

    on Windows XP or 2003, run cmd line: fsutil.exe behavior query disable8dot3 anwsver is disable8dot3=0 Create 2 files : a.abc and b.abcd FindFirstFile returns : a.abc and b.abcb run cmd line: fsutil.exe behavior set disable8dot3 1 reboot Windows run cmd line: fsutil.exe behavior query disable8dot3 anwsver is now disable8dot3=1 Delete 2 previous files Create again 2 files : a.abc and b.abcd FindFirstFile returns only a.abc

    Reply
  • Bug in Workaround

    Posted by Bear of Little Brain on 04/25/2006 12:36pm

    I have been working on the same problem in Fortran, using WinAPIs through interfaces. Windows Explorer seems to work as it should, and GetOpenFileName also seems OK, although I have not done so much on it. Shoniev's fix may work on extensions, but not in the main part. If you create files abcdefgh01.fmd, abcdefgh02.fmd ... abcdefgh07.fmd, and use his CAdvancedFindFile.exe using *88* as a file spec, both the original and his fix pick up the last 3 files. Bear of Little Brain

    Reply
  • Simpler solution

    Posted by Legacy on 07/25/2003 12:00am

    Originally posted by: Richard Jones

    I like that you built a class (subclassing the find).

    I may even use it but I wonder if I really need to.

    What seems to be easier is just to extract out the extention from the filename that findfirst and findnext gives you, then compare it to the requested (ex.) *.* , *., *.ab ....by length (strlen).

    If not good comparison do continue.

    I also agree with you about the absurdity of returning the same error if findnext fails or is at the end.
    ?????

    You know Microsoft is really one big programming nightmare after another.

    They are like the chef who makes a pot of stew. He keeps changing his mind and adding more ingrediants because he wants to please everybody.

    Pretty soon he has a big bowl of shitty stew.

    • It's not a bug

      Posted by srelu on 04/26/2009 03:23am

      The feature of finding .resx instead of .res is intentional to allow the user to save on typing. It's your resposibility to check the returned filename for a closer match. Using too many wildcards is pointless, as '*' stands for all characters (no matter how many) up to the end of the string.

      Reply
    Reply
  • Does not work...

    Posted by Legacy on 01/16/2003 12:00am

    Originally posted by: Remon

    * I've copied the contents of "C:\WINNT\system32\Adobe\SVG Viewer" into "C:\1\SVG Viewer".
    * The "SVG Viewer" folder is the only entry of its parent folder.
    * The "SVG Viewer" contains the following files :
    AceLite.dll
    Agm.dll
    Bib.dll
    CoolType.dll
    NPSVGVw.dll
    ReadMe.html
    SVG Viewer License.txt
    SVGAbout.svg
    SVGControl.dll
    SVGHelp.html
    SVGRSRC.DLL
    SVGView.dll
    SVGViewer.dict
    SVGViewer.ini
    SVGViewer.zip

    * Check the recursive check-box.
    * Check the fix bug check-box.
    * Enter "*.htm" as mask

    First test :
    * Use either "C:\1\SVG Viewer" or "C:\WINNT\system32\Adobe\SVG Viewer" as Path.
    * Hit the "Find" button : No file found. THAT'S OK

    First test :
    * Use the parent folder : either "C:\1" or "C:\WINNT\system32\Adobe" as Path.
    * Hit the "Find" button : URP !!! The "ReadMe.html" file is found. THAT'S NOT OK

    Reply
  • The Microsoft method already works well...

    Posted by Legacy on 01/16/2003 12:00am

    Originally posted by: Eric Forget

    Ok... You're Idea is not a bad idea but the microsoft method already works well...

    you talk about "*.a" will not return "qqq.abc" file but if you'd understand the system you will know that to make the system return "qqq.abc" you should use the wild card "*.a*" that work realy well...

    I am not sure that you point is a nessecity!!
    Eric

    Reply
  • I'm with you!

    Posted by Legacy on 01/15/2003 12:00am

    Originally posted by: Charles Godwin

    GetOpenFileName has the same nasty problem. Try setting the filter to *. That is, list all file names with no extension. It can't be done.

    Thanks for the good article.

    Reply
Leave a Comment
  • Your email address will not be published. All fields are required.

Top White Papers and Webcasts

  • The impact of a data loss event can be significant. Real-time data is essential to remaining competitive. Many companies can no longer afford to rely on a truck arriving each day to take backup tapes offsite. For most companies, a cloud backup and recovery solution will eliminate, or significantly reduce, IT resources related to the mundane task of backup and allow your resources to be redeployed to more strategic projects. The cloud - can now be comfortable for you – with 100% recovery from anywhere all …

  • Download the Information Governance Survey Benchmark Report to gain insights that can help you further establish business value in your Records and Information Management (RIM) program and across your entire organization. Discover how your peers in the industry are dealing with this evolving information lifecycle management environment and uncover key insights such as: 87% of organizations surveyed have a RIM program in place 8% measure compliance 64% cannot get employees to "let go" of information for …

Most Popular Programming Stories

More for Developers

Latest Developer Headlines

RSS Feeds