Full Text Search: The Key to Better Natural Language Queries for NoSQL in Node.js
Magine (for Visual Basic), including the full source code of the application.
The main Visual Basic source code files included in this download are:
- magine.vbp (project file)
- frmmagine.frm (form file)
- write.bas (module file)
Transposition is one of two methods used for encryption and cryptography. The other method is substitution. A world-famous encryption system was the Enigma Encryption System used by the Nazi regime during World War II. Enigma relied heavily on substitution. Modern-day software encryption applications use both methods together. The degree of complexity (for example, the number of times you transpose and substitute plain text and the order in which you do these transpositions and substitutions) partly determines the strength of the encryption.
Software encryption is used in many products and services, including online banking services, DVD film encoding, and so on.
Here is what this algorithm does:
No matter what text you type into it, it will give you the full table of all permutations. For example, if your source text is "malrsecb," the application generates a table with 40320 unique permutations; one of the permutations is the English word scramble.
Included in this download is a sample output file. The file name is:
- sample table.txt (45 Kb)—the source text used is the word 'virtual'
Note: You can also use Magine to scramble up (transpose) proper words (or plain text) into meaningless text. In the field of software encryption, 'meaningless text' takes the name 'cipher text.'
This algorithm is not something a developer will just slot into an existing application and expect immediate results. This is a learning algorithm. It is a mind opener for the developer who wants to advance beyond being a 'click click, easy does it' developer. I won't rule out other uses for a transposition algorithm, outside the field of software encryption.
IMPORTANT Technical Stuff
1. How permutations are generated
We begin with the text length (n). Increasing text lengths will generate more permutation by the factor n. Thus, the total number of permutations is called n-factorial. For example, a text length of 4 has: 1x2x3x4 = 24 permutations and a text length of 10 has 1x2x3x4x5x6x7x8x9x10 = 3,628,800 permutations. The notation used for n-factorial is 'n!'
Magine produces all permutations in a table (2-dimensional matrix) that has (n-1)! rows and n columns. So, for a text length of 8 characters, there will be (8-1)! = 7! = 5,040 rows and n = 8 columns, and the total permutations is 5,040 x 8 = 40,320 = 8!
We can now answer the next question in your (the reader's) mind. Does magine produce the permutations by rows, or by columns? The answer is 'by columns.' I shall use the example of a text length of n = 4 to illustrate. Here is what we do:
- Choose any text, say, '1234'. This is called the source text.
- Write out the source text in row 1, column 1.
- When generating the permutations in the first column, the first character (i.e. the '1' from our source text) must always stay in the first position, and we already know that the number of rows will be 6 (i.e. (4-1)! = 3! = 6). So, the first column will be of the form:
- The way we find the unknowns is to transpose (swap) the remaining characters between the second, third, and fourth positions, using the given source text of '1234'.
- In row 2, we swap the second and third characters to generate the permutation '1324'.
- In row 3, we swap the second and fourth characters to generate the permutation '1432'.
- Let's swap the third and fourth characters, but this time we will use the source text and the permutations in the second and third rows to do the swaps, each separately of course. With three additional swaps, rows 4, 5, and 6 will be completed. Let's do it.
- In row 4, we swap the third and fourth characters, using the source text, to generate the permutation '1243'.
- In row 5, we swap the third and fourth characters, using the permutation in row 2 (i.e. '1324'), to generate the permutation '1342'.
- In row 6, we swap the third and fourth characters, using the permutation in row 3 (i.e. '1432'), to generate the permutation '1423'.
- Now, would you believe that the remaining 18 (4! = 24 - 6 (from the first column)) permutations are easier to generate? We will be doing more swaps to generate the remaining permutations, but these swaps are not like the swaps we did for the first column. These transpositions (swaps) are called shifts. Lets re-look at the table we generated thus far and write down its properties:
- The table has one column.
- The table has six rows.
- Each permutation in all six rows is unique (verify).
- Each unique permutation begins with a '1' (i.e., the '1' is always in the first position).
- Two permutations were generated from swaps using the second character.
- Three permutations were generated from swaps using the last two (third and fourth) characters.
|1234||-||source text in row 1, column 1|
The *'s in rows 2 to 5 are, for the moment, unknown to us. So, let's do the next steps and find those unknowns.
There is nothing else to swap the second character with. Pause for a moment, and let's write out the table again with the *'s we do know.
|1324||-||swapped 2nd and 3rd|
|1432||-||swapped 2nd and 4th|
Column one is now completed and the table looks like this:
What if we shifted all the 1's from position 1 to position 2? Now, all the characters in the second position are shifted leftward to the first position, and the characters in third and fourth positions remain unchanged. The result is six additional unique permutations. The table now looks like this:
Do you see that the only difference between the permutations in column 1 and 2 is that the '1' character has been right-shifted one position? Two more right shifts, of only the '1' character, into third and fourth position respectively and the table is completed with all 24 unique permutations. Here is what the final table looks like:
When generating the first column, any text of length n always has n-2 different types of swaps. I will first illustrate this with a source text of 8 characters and then I will apply it to the example I did above with the source text '1234'.
|1.||6||=||8 - 2 (swaps with the second character)|
|2.||35||=||5 x 7 (swaps with the third character)|
|3.||168||=||4 x (6 x 7)) (swaps with the fourth character)|
|4.||630||=||3 x (5 x 6 x 7)) (swaps with the fifth character)|
|5.||1680||=||2 x (4 x 5 x 6 x 7)) (swaps with the sixth character)|
|6.||2520||=||1 x (3 x 4 x 5 x 6 x 7)) (swaps between the seventh and eighth character)|
|=||5040||(Total permutations in the first column; that is, equal to (n-1)! = (8-1)! = 7!)|
Once you have generated all the permutations in the first column, all that remains is right-shifting the first character by one position, each time, to generate the remaining columns.
Now, in the example I did previously, n = 4 and n-2 = 2. So we have:
|1.||2||=||4 - 2 (swaps with the second character)|
|2.||3||=||3 = 1 x 3 (swaps between the third and fourth character)|
|=||6||(Total permutations in the first column, that is equal to (n-1)! = (4-1)! = 3!)|
The execution of this algorithm is limited by the memory (both RAM and hard drive space) and speed of the computer used. An eight-character text (like the word 'universe') has 40320 permutations and generates a text file with a size of 393 Kb. A nine-character text (like the word 'published') has 362880 permutations and generates a text file with a size of 3,888 Kb (3.80 Mb). A 10-character text (like 'childgames') has 3,628,800 permutations and generates a text file with a size of 42,417 KB (41.42 Mb). The HTML file generated for a 10-character text is approximately 80 Mb in size and it probably won't open in your PC browser, but will more than likely cause your operating system to hang or crash. So don't try it. Whenever you use text larger than eight characters, choose plain text as the output format.
3. System Information (the Author's computer)
Programming Language Application: Visual Basic 6.0
Operating system: Windows® XP
RAM: 256 Mb
CPU: 1,33 Ghz [Intel® CeleronTM]
THE USER AGREES TO USE THE PROGRAM ENTIRELY AT HIS/HER OWN RISK. THIS PROGRAM (MAGINE), INCLUDING ITS SOURCE CODE AND DOCUMENTATION, IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND. THE AUTHOR (ZIAD CASSIM) DOES NOT ACCEPT ANY LIABILITY OR RESPONSIBILITY FOR DAMAGES IN ANY FORM, REGARDLESS OF CAUSE.
USE, COPY, TRANSFER, ALTER, RE-ENGINEER, MODIFY, AND/OR REPRODUCE THE PROGRAM (INCLUDING THE SOURCE CODE) IN ANY WAY YOU SEE FIT, PROVIDED THAT YOU AGREE IN FULL, WITHOUT RESERVATION, TO THE DISCLAIMER, AS WRITTEN ABOVE.
Comments and feedback can be written to the author, living in Durban, South Africa.
E-mail Ziad at: email@example.com.