Search and Replace with Regular Expressions

Introduction

Ice-cream is exquisite—what a pity it isn't illegal.
—Voltaire

Neither male or female parents are superfluous. Parents teach children things implicitly by their habits and sometimes explicitly when the kids are listening. Sometimes, a parent has a bad habit; the child sees the consequences and says "I don't want that bad thing to happen to me!" Sometimes, the reverse is true. Both kinds of lessons are valuable and different genders seem to have different habits.

One overt lesson my dad taught me was that a craftsman is known by his tools. Of all of the things he said to me, I don't know why this resonated so much with me, but hey, I'll take it.

Software development is a parable of the craftsman's tools. It was Arthur C. Clarke who said that "any sufficiently advanced technology is indistinguishable from magic." I will extend that and add that the more you know about your technology, trade, and tools, the more likely it is that you will do magical things.

In this article, I have taken a pedestrian subject—finding and finding and replacing code—and will add a twist. If you finish the article, you will know how to combine regular expressions with Visual Studio's Quick Find and Quick Replace capability to augment plain searching and replacing and Refactoring, adding to your ability to do magic.

Before You Get Started...

Ctrl + F and Ctrl +H—at least that's the way my keyboard is mapped invoke the Edit|Find and Replace|Quick Find and Quick Replace feature of Visual Studio. Type in some text and Visual Studio will search the current file, project, or solution for the text. Choose the Quick replace option and Visual studio will replace the found text with the replacement text. This is probably the most common kind of search and replace operation performed routinely by developers.

There is another meta-programming capability built into Visual Studio called Refactoring that will transform code from one thing to something else, including renaming something. Refactoring stems from a 1990 thesis by William Opdike and is essentially a formalized way to clean up or improve code.

Mix in search and replace, Refactoring, and simple copy and paste, and you have three ways to change code. Refactoring to neophytes is the closest thing to magic of the three. The problem is that none of these handle special cases such as searching and replacing across line feeds or handling varying styles of white space {for example, space versus tab} in a uniform way. Refactoring is mostly limited to known ways to improve code like renaming, copy and paste is tedious, and search and replace all may overwrite things you don't want overwritten, thus making search and replace sometimes a long, tedious process too.

So, look at a fourth way to find things, a way that permits substantially more expressiveness and preciseness, Regular Expressions.

If the Code is Funky, Change It

Many programmers are precisely old enough to have learned their coding standards implicitly from Charles Simonyi whether they know it or not. Simonyi is attributed with inventing the Hungarian notation—those ghastly prefixes that are everywhere in VB code. Alas, Dr. Simonyi is out to pasture and is spending his hard-earned money flying on Russian spaceships, sailing gigantic yachts, and dating Martha Stewart?! I don't want to get sued so I won't say anything about potential problems manifest in anyone's hardwiring, but I will say that I feel about dating Martha Stewart the same way I feel about the Hungarian notation or any variation of it. (Although Martha could organize my house any time she likes.)

So, anti-establishment, anti-notation wonks like me might write the code in Listing 1. I use a simple F for field prefix simply to distinguish field names from property names. F-prefixes are very easy to remember, it means field—to me—and no other prefix is necessary, anywhere.

The point is that sometimes we get valuable code from third parties and those other parties' styles do not match our own or the desired style. Rather than fret over so small a thing, simply change the code. (That's my scenario, humble as it is, and I am sticking to it!)

Listing 1: Found some code that doesn't fit your style? Instead of having coding standards meetings or getting all worked up, simply change the code.

Imports System.Data.Linq
Imports System.Data.Linq.Mapping
Imports System.Reflection
Imports System.Text


Module Module1

   Sub Main()

      Const connectionString As String = _
         "Data Source=.\SQLEXPRESS;AttachDbFilename=c:\temp\ _
            northwnd.mdf;" + _
         "Integrated Security=True;Connect Timeout=30; _
            User Instance=True"

      Dim northwind As Northwind = New Northwind(connectionString)
      Dim customers As Table(Of Customer) = _
         northwind.GetTable(Of Customer)()

      Dim firstFive = customers.Take(5)
      For Each item In firstFive
         Console.WriteLine(item)
      Next
         Console.ReadLine()
   End Sub

End Module

Public Class Northwind
   Inherits DataContext

   Public Sub New(ByVal connectionString As String)
      MyBase.New(connectionString)
      Me.Log = Console.Out
   End Sub

End Class

<Table(Name:="Customers")> _
PublicClass Customer

   Private FCustomerID As String
   Private FCompanyName As String
   Private FContactName As String
   Private FContactTitle As String
   Private FAddress As String
   Private FCity As String
   Private FRegion As String
   Private FPostalCode As String
   Private FCountry As String
   Private FPhone As String
   Private FFax As String

   <Column(Name:="CustomerID", Storage:="FCustomerID")> _
   Public Property CustomerID() As String
      Get
         Return FCustomerID
      End Get
      Set(ByVal value As String)
         FCustomerID = value
      End Set
    End Property

   <Column(name:="CompanyName", Storage:="FCompanyName")> _
      Public Property CompanyName() As String
      Get
         Return FCompanyName
      End Get
      Set(ByVal value As String)
         FCompanyName = value
      End Set
   End Property

   <Column(name:="ContactName", Storage:="FContactName")> _
   Public Property ContactName() As String
      Get
         Return FContactName
      End Get
      Set(ByVal value As String)
         FContactName = value
      End Set
   End Property

   <Column(name:="ContactTitle", Storage:="FContactTitle")> _
      Public Property ContactTitle() As String
      Get
         Return FContactTitle
      EndGet
      Set(ByVal value As String)
         FContactTitle = value
      End Set
   End Property

   <Column(name:="Address", Storage:="FAddress")> _
      Public Property Address() As String
      Get
         Return FAddress
      End Get
      Set(ByVal value As String)
         FAddress = value
      End Set
   End Property

   <Column(name:="City", Storage:="FCity")> _
      Public Property City() As String
      Get
         Return FCity
      End Get
      Set(ByVal value AsString)
         FCity = value
      End Set
   End Property

   <Column(name:="Region", Storage:="FRegion")> _
      Public Property Region() As String
      Get
         Return FRegion
      End Get
      Set(ByVal value As String)
         FRegion = value
      End Set
   End Property

   <Column(name:="PostalCode", Storage:="FPostalCode")> _
      Public Property PostalCode() As String
      Get
         Return FPostalCode
      End Get
      Set(ByVal value As String)
         FPostalCode = value
      End Set
   End Property

   <Column(name:="Country", Storage:="FCountry")> _
      Public Property Country() As String
      Get
         Return FCountry
      End Get
      Set(ByVal value As String)
         FCountry = value
      End Set
   End Property

   <Column(name:="Phone", Storage:="FPhone")> _
      Public Property Phone() As String
      Get
         Return FPhone
      End Get
      Set(ByVal value As String)
         FPhone = value
      End Set
   End Property

   <Column(name:="Fax", Storage:="FFax")> _
      Public Property Fax() As String
      Get
         Return FFax
      End Get
      Set(ByVal value As String)
         FFax = value
      End Set
   End Property

   Public Overrides Function ToString() As String

      Dim builder As StringBuilder = New StringBuilder()
      builder.AppendFormat("{0}", Me.GetType().Name)
      builder.AppendLine()
      builder.AppendFormat("{0}", New String("-", 40))
      builder.AppendLine()

      Dim info() As PropertyInfo = Me.GetType().GetProperties()

      For Each prop In info
         Try
            Dim value As Object = prop.GetValue(Me, Nothing)
            builder.AppendFormat("{0}: {1}", prop.Name, value)
         Catch ex As Exception
            builder.AppendFormat("{0}: {1}", prop.Name, "None")
         End Try
         builder.AppendLine()
      Next

      Return builder.ToString()
   End Function


End Class

Search and Replace with Regular Expressions

The code in Listing 1 uses an F prefix for fields to differentiate fields from properties as little as possible, but no other prefix convention is needed in a strongly typed language.

Now, I have seen people respond in several various ways to code that doesn't meet the "standard," including:

  1. The person who wrote this code is a nitwit and they don't know what they are doing (because they aren't following our standard).
  2. We better have coding standards meetings and (figure out the perfect standard).
  3. Fire that idiot (because he doesn't know what he is doing).

All three responses are pointless. There is no perfect standard and uniformity is only possible in herd animals (and McDonald's employees). Even people in the military have personality quirks bubbling to the surface. In response to number 2, coding standards meetings are stupid too. It all comes down to who has the loudest mouth, the most forceful personality, or the most clout. In the end, the person with the power gets to choose, so why have the meeting in the first place?! Finally, firing someone who doesn't conform is stupid. Software requires smart creative people, not mutant, borg-like conformists. Anyone who doesn't know that is a numbskull.

The right solution is quite simple: Let people code however they code. Don't judge, don't fuss, don't fret, and for god sakes don't meet on the subject. If, and only if the literal code-text is a deliverable, decide what you want it to look like after the application is deployed and pay some toolsmith, hobbyist, or intern to repaginate it. Violà!

The "intern will fix it" approach will be a lot cheaper than having highly compensated (I hope) developers arguing over semnatics like how many spaces to tab. No one really cares. And, if the intern breaks the code, again so what, you have it under version control, right? You are only paying the intern $10 per hour. Let him take another stab at it. (Better still, paginating in formatting source code is something that can be done reliably and cheaply; outsource it.)

Finally, give the intern a copy of this article so he doesn't take forever to clean up that sloppy code. (But only give him the part of this article that follows this part, so he doesn't learn that you are giving them grunt work.)

Test Your Regular Expressions with Search-Only First

Regular Expressions in search and replace mode can quickly become search and destroy mode. (But, hey, you are using version control right?!) Regular expressions are powerful and can do a lot with a very little bit of code, so unless you want to be checking things in and out all day long, test you expressions in search-only mode first.

In your scenario, you want to replace all of the elements in Listing 1 that have an F prefix. The first thing to do is to ensure that the dialog will be using regular expressions. In the Find and Replace dialog, expand the Find options section and make sure the options are configured as shown in Figure 1.

[Search.png]

Figure 1: Select search in hidden text to check those pesky collapsible regions and check use regular expressions.

The next thing you will need is a location containing the definitive source (or at least most of the information you'll need) for regular expressions in Visual Studio. See the help topic "Regular Expressions (Visual Studio)" at http://msdn2.microsoft.com/en-us/library/2k3te2cs.aspx.

Tip: Quick Find supports a Bookmark All operation. You can use this button to add a bookmark to everything that matches your regular expression as a means of testing the expressions correctness prior to replacing text.

The next step is to identify everything you want to replace and to figure out the regular expression for the find and replace pieces of the puzzle. The fields are using the F prefix. The property setters and getters are using the F prefix and the Storage named parameter of the ColumnAttribute is using the field names as strings.

Now, you could replace the fields by searching on "Private F" and replacing it with "Private m_", but what if you weren't sure how the whitespace had been added or whether the whitespace between Private and F were even uniform? Well, you can use the regular expression. Here is an example for the find part:

Private:WhF{:c+}

Private is a literal. :Wh means whitespaces, including tabs or literal spaces. F is a literal, and :c+ means any one or more characters after the F. Placing the :c+ in the brackets {:c+} treats the characters after the F as a separate group. This is important because you want to re-use those characters. You can refer to the group with the \# sequence as in \0 for group 0, \1 for group 1, and so on. The following sequence

Private m_\1

replaces the F prefix as in

Private FCustomerID As String

with an m_ yielding

Private m_CustomerID As String

You can use the Find-replace sequence of

"F{:c+}

"m_\1

to replace the F prefix in the field name in the storage argument with an m_ prefix, and you can use the following pairing to fix the Return and assignment statements in the Property getters and setters.

:WhF{:c+}

m_\1

Of course, the previous pairing will also Find "Function", "For", the property "Fax", and "firstFive". You could add the prevent match element ~(X) sub-expression to eliminate these from the search results, for example

:WhF~(unction)~(ax)~(or)~(irstFive){:c+}

m_\1

and the regular expression would successfully skip "Function", "For", "Fax", and "firstFive". Here is the completely revised code after the find and replace operation using the three variations of expressions on the code (see Listing 2).

Listing 2: The same code as listing 1 with the modified field elements using a more conventional naming style.

Imports System.Data.Linq
Imports System.Data.Linq.Mapping
Imports System.Reflection
Imports System.Text

Module Module1

   Sub Main()

      Const connectionString As String = _
         "Data Source=.\SQLEXPRESS;AttachDbFilename=c:\temp\ _
            northwnd.mdf;" + _
         "Integrated Security=True;Connect Timeout=30; _
            User Instance=True"

      Dim northwind As Northwind = _
         New Northwind(connectionString)
      Dim customers As Table(Of Customer) = _
         northwind.GetTable(Of Customer)()

      Dim firstFive = customers.Take(5)
      For Each item In firstFive
         Console.WriteLine(item)
      Next
         Console.ReadLine()
      End Sub

End Module

Public Class Northwind
   Inherits DataContext

   Public Sub New(ByVal connectionString As String)
      MyBase.New(connectionString)
      Me.Log = Console.Out
   End Sub

End Class

<Table(Name:="Customers")> _
Public Class Customer

   Private m_CustomerID As String
   Private m_CompanyName As String
   Private m_ContactName As String
   Private m_ContactTitle As String
   Private m_Address As String
   Private m_City As String
   Private m_Region As String
   Private m_PostalCode As String
   Private m_Country As String
   Private m_Phone As String
   Private m_Fax As String

   <Column(Name:="CustomerID", Storage:="m_CustomerID")> _
   Public Property CustomerID() As String
      Get
         Return m_CustomerID
      End Get
      Set(ByVal value As String)
         m_CustomerID = value
      End Set
   End Property

   <Column(name:="CompanyName", Storage:="m_CompanyName")> _
      Public Property CompanyName() As String
      Get
         Return m_CompanyName
      End Get
      Set(ByVal value As String)
         m_CompanyName = value
      End Set
   End Property

   <Column(name:="ContactName", Storage:="m_ContactName")> _
   Public Property ContactName() As String
      Get
         Return m_ContactName
      End Get
      Set(ByVal value As String)
         m_ContactName = value
      End Set
   End Property

   <Column(name:="ContactTitle", Storage:="m_ContactTitle")> _
      Public Property ContactTitle() As String
      Get
         Return m_ContactTitle
      End Get
      Set(ByVal value As String)
         m_ContactTitle = value
      End Set
   EndProperty

   <Column(name:="Address", Storage:="m_Address")> _
      Public Property Address() As String
      Get
         Return m_Address
      End Get
      Set(ByVal value As String)
         m_Address = value
      End Set
   End Property

   <Column(name:="City", Storage:="m_City")> _
      Public Property City() As String
      Get
         Return m_City
      End Get
      Set(ByVal value As String)
         m_City = value
      End Set
   EndProperty

   <Column(name:="Region", Storage:="m_Region")> _
      Public Property Region() As String
      Get
         Return m_Region
      End Get
      Set(ByVal value AsString)
         m_Region = value
      End Set
   End Property

   <Column(name:="PostalCode", Storage:="m_PostalCode")> _
      Public Property PostalCode() As String
      Get
         Return m_PostalCode
      End Get
      Set(ByVal value As String)
         m_PostalCode = value
      End Set
   End Property

   <Column(name:="Country", Storage:="m_Country")> _
      Public Property Country() As String
      Get
         Return m_Country
      End Get
      Set(ByVal value As String)
         m_Country = value
      End Set
   End Property

   <Column(name:="Phone", Storage:="m_Phone")> _
      Public Property Phone() As String
      Get
         Return m_Phone
      End Get
      Set(ByVal value As String)
         m_Phone = value
      End Set
   End Property

   <Column(name:="Fax", Storage:="m_Fax")> _
      Public Property Fax() As String
      Get
         Return m_Fax
      End Get
      Set(ByVal value As String)
         m_Fax = value
      End Set
   End Property

   Public Overrides Function ToString() As String

      Dim builder As StringBuilder = New StringBuilder()
      builder.AppendFormat("{0}", Me.GetType().Name)
      builder.AppendLine()
      builder.AppendFormat("{0}", New String("-", 40))
      builder.AppendLine()

   Dim info() As PropertyInfo = Me.GetType().GetProperties()

      For Each prop In info
         Try
            Dim value As Object = prop.GetValue(Me, Nothing)
            builder.AppendFormat("{0}: {1}n", prop.Name, value)
         Catch ex As Exception
            builder.AppendFormat("{0}: {1}", prop.Name, "None")
         End Try
         builder.AppendLine()
      Next

      Return builder.ToString()
   End Function


End Class

By the way, if you are wondering, the code in Listings 1 and 2 is an example of "LINQ to SQL" code that shows just how easy it is to create an object relational mapping and use LINQ to query SQL Server tables. (For more on LINQ to SQL, check out my upcoming book LINQ Unleashed: for C# from Sams in July 2008.)

It is worth noting that Visual Studio's Regular expressions aren't identical to regular expressions in the framework. One more thing to learn is occasionally being able to 'fix code' written large and quickly is a valuable and magical trick to have up your sleeve.

About the Author

Paul Kimmel is the VB Today columnist for www.codeguru.com and has written several books on object-oriented programming and .NET. Check out his upcoming book, LINQ Unleashed for C#, due in July 2008. Paul Kimmel is an Application Architect for EDS. You may contact him for technology questions at pkimmel@softconcepts.com.

If you are interested in joining or sponsoring a .NET Users Group, check out www.glugnet.org. Glugnet opened a users group branch in Flint, Michigan in August 2007. If you are interested in attending, check out the www.glugnet.org web site for updates.

Copyright © 2008 by Paul T. Kimmel. All Rights Reserved.



Comments

  • There are no comments yet. Be the first to comment!

Leave a Comment
  • Your email address will not be published. All fields are required.

Top White Papers and Webcasts

  • On-demand Event Event Date: September 10, 2014 Modern mobile applications connect systems-of-engagement (mobile apps) with systems-of-record (traditional IT) to deliver new and innovative business value. But the lifecycle for development of mobile apps is also new and different. Emerging trends in mobile development call for faster delivery of incremental features, coupled with feedback from the users of the app "in the wild." This loop of continuous delivery and continuous feedback is how the best mobile …

  • Java developers know that testing code changes can be a huge pain, and waiting for an application to redeploy after a code fix can take an eternity. Wouldn't it be great if you could see your code changes immediately, fine-tune, debug, explore and deploy code without waiting for ages? In this white paper, find out how that's possible with a Java plugin that drastically changes the way you develop, test and run Java applications. Discover the advantages of this plugin, and the changes you can expect to see …

Most Popular Programming Stories

More for Developers

Latest Developer Headlines

RSS Feeds