MySQL: User Defined Function Tutorial
This tutorial explains what a User Defined Function (UDF) is, what it does, and why/when it is useful.
1. What Is a User Defined Function?
Basically, a User Defined Function (UDF) is a piece of code that extends the functionality of a MySQL server by adding a new function that behaves just like a native (built-in) MySQL function, such as abs() or concat(). UDFs are written in C (or C++ if you really need to). Maybe there is a way to write them in BASIC, .NET, or whatever but I don’t see why anybody would want to do that.
2. Why/When Are UDFs Useful?
As implied by the name, UDFs are useful when you need to extend the functionality of your MySQL server. This little table should make it clear which method is best for a given situation:
Method | Speed | Language | Development |
---|---|---|---|
Stored Procedures | slow | SQL | ~minutes (for small functions) |
UDF | fast | C | ~hour |
Native Function | fast | C | major pain |
And by “slow,” I mean “slower than the others!” Stored procedures are still much faster then normal SQL statements!
A little explanation on native functions: The code you have to write here is essentially the same as the one for a UDF, but you have to write it in the MySQL source code and recompile the whole thing. This will (believe me) be a lot of work because you have to do it again and again with every new version of MySQL.
3. How to Use UDFs
This part is really easy. When you have your UDF finished, you just use it like every other native function. For example:
SELECT MyFunction(data1, data2) FROM table
4. Writing the UDF
Now, you can get started writing your first UDF. Just follow these steps:
Step 1: Create a new shared-library project
In the example, I used VC++ 6.0 with a standard DLL).
Step 2: Create some headers
These headers are either standard library headers or from the MySQL Server’s include directory.
#ifdef STANDARD
/* STANDARD is defined. Don’t use any MySQL functions */
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#ifdef __WIN__
typedef unsigned __int64 ulonglong; /* Microsoft’s 64 bit types */
typedef __int64 longlong;
#else
typedef unsigned long long ulonglong;
typedef long long longlong;
#endif /*__WIN__*/
#else
#include <my_global.h>
#include <my_sys.h>
#endif
#include <mysql.h>
#include <ctype.h>
static pthread_mutex_t LOCK_hostname;
Step 3: Decide what kind of function you want
There are essentially two choices to be made:
- Is the function an aggregate function or not? (You will learn more about aggregate functions later.)
- Which type of return value should the function return? Here, you have four options:
Type | Description |
---|---|
STRING | A string literal. Translates to char* in C |
INTEGER | A normal integer. Translates to a 64-bit integer in C |
REAL | A floating point number. Translates to double in C |
DECIMAL | This one isn’t really finished at this time. MySQL treats it as STRING |
Step 4: Assign non-aggregate functions
Now, you have to declare and implement some functions the MySQL server needs to use your UDF. But, first some structs you’ll need for that:
- UDF_INIT:
- UDF_ARGS:
Type | Name | Description |
---|---|---|
my_bool | maybe_null | 1 if function can return NULL |
unsigned int | decimals | for REAL functions |
unsigned long | max_length | For string functions |
char * | ptr | free pointer for function data |
my_bool | const_item | 0 if result is independent of argument |
Type | Name | Description |
---|---|---|
unsigned int | arg_count | Number of argument |
enum Item_result * | arg_type | Array containing the types of the arguments |
char ** | args | Array of pointer to the arguments |
unsigned long * | lengths | Array of the argument’s lengths (only needed for strings) |
char * | maybe_null | Array of “maybe_null” flags (1 if argument maybe null) |
char ** | attributes | Array of pointers to the arguments’ attributes |
unsigned long * | attribute_lengths | Array of attributes’ lengths |
Step 5: Examine the functions
De-/Initialization:
extern “C” my_bool MyTest_init(UDF_INIT *initid, UDF_ARGS *args,
char *message)
{
// The most important thing to do here is to set up the memory
// you need…
// Say you need a lonlong type variable to keep a checksum
// although you do not need one in this case
longlong* i = new longlong; // create the variable
*i = 0; // set it to a value// store it as a char pointer in the pointer variable
// Make sure that you don’t run into typecasting troubles later!!
initid->ptr = (char*)i;// check the arguments format
if (args->arg_count != 1)
{
strcpy(message,”MyTest() requires one arguments”);
return 1;
}if (args->arg_type[0] != INT_RESULT)
{
strcpy(message,”MyTest() requires an integer”);
return 1;
}
return 0;
}extern “C” void MyTest_deinit(UDF_INIT *initid)
{
// Here you have to free the memory you allocated in the
// initialization function
delete (longlong*)initid->ptr;
}
The actual function:
extern “C” longlong MyTest(UDF_INIT *initid, UDF_ARGS *args,
char *is_null, char *error)
{
/* So, finally, this is the part were you do the real work.
This function is called for every record and the current value(s)
or better pointers to the current values are stroed in the
UDF_ARGS variable. You have to get the values, do your
calculation, and return the result.
NOTE: You can access the memory allocated in MyTest_init
through the UDF_INIT variable.
In this example, you will just add 5 to every value…*/
return *((longlong*)args->args[0])+5;
}
Step 6: All done!
Now, you have to compile the library and copy it to a directory where your OS can find it. On Windows, that would be anywhere the PATH System variable says. Personally, I use the MySQL server’s bin directory. You have to make sure that the library is in one of those directories; otherwise, MySQL can’t use it! Also, make sure to export all the functions MySQL needs!
Step 7: Tell MySQL about it
This is really straightforward: Just execute the following SQL command:
CREATE [AGGREGATE] FUNCTION MyTest
RETURNS [INTEGER|STRING|REAL|DECIMAL]
SONAME the_libraries_exact_name
Now, you can use it like any other function.
5. Aggregate Functions
Now, some words about aggregate functions. When your UDF is an aggregate function, you have to add some more functions and some functions are used in a different way. The calling sequence is:
- Call MyTest_init to allocate memory (just like a normal UDF).
- MySQL sorts the table according to the GROUP BY statement.
- Call MyTest_clear for the first row in each group.
- Call MyTest_add for each row that belongs to the same group.
- Call MyTest to get the result when the group changes or the last row has been processed.
- Repeat Steps 3 to 5 until all rows have been processed.
- Call MyTest_deinit to free any used memory.
Now, look at the new functions needed for the aggregate function. In this example, you’ll simply add up all the values (like the native SUM function).
void MyTest_clear(UDF_INIT *initid, char *is_null, char *error)
{
/* The clear function resets the sum to 0 for each new group
Of course, you have to allocate a longlong variable in the init
function and assign it to the pointer as seen above */
*((longlong*)initid->ptr) = 0;
}void MyTest_add(UDF_INIT *initid, UDF_ARGS *args, char *is_null,
char *error)
{
// For each row, the current value is added to the sum
*((longlong*)initid->ptr) = *((longlong*)initid->ptr)
+ *((longlong*)args->args[0]);
}longlong MyTest(UDF_INIT *initid, UDF_ARGS *args, char *is_null,
char *error)
{
// And, in the end, the sum is returned
return *((longlong*)initid->ptr);
}