| CodeGuru Home | VC++ / MFC / C++ | .NET / C# | Visual Basic | Newsletters | VB Forums | Developer.com |
|
|||||||
| Managed C++ and C++/CLI Discuss Managed C++ and .NET-specific questions related to C++. |
![]() |
|
|
Thread Tools | Search this Thread | Rate Thread | Display Modes |
|
#1
|
|||
|
|||
|
Hi, I'm trying to split a database of English and Chinese sentences into arrays of individual words using Regex.Split. The problem is, English words get separated by spaces while Chinese words don't. This gets even more confusing when Chinese and English words exist in the same sentence.
Is there a way for Regex to automatically detect the language and perform the proper splits accordingly? Thanks so much! |
|
#2
|
|||
|
|||
|
Re: Culture SENSITIVE regex split
Took me a while but I figured it out! For anyone else interested in multilanguage splits, here's my way:
Use Match instead of Split E.g. Regex* rg = new Regex(S"[A-z]+|\\w); Match* match = rg->Match(yourString); The [A-z] part will target words in English, the \w will target any non-English characters(Chinse, Japanese, etc.). You can also use [A-z|0-9]+ to include attached numbers. Working so far... |
![]() |
| Bookmarks |
|
||||||
| Thread Tools | Search this Thread |
| Display Modes | Rate This Thread |
|
|