Click to See Complete Forum and Search --> : How to speed up retrieving data from large table(need help)


haw_fong
June 23rd, 2005, 12:17 AM
Dear all,
I try to retrieve duplicate data from database but it take a lot of time to display and very slow.Even using index also same .Here is my sql statement :

$sqlselect2 = "SELECT DISTINCT p1.resume_id, p1.user_id, p1.email, p1.name, p1.gender, p1.house_tel, p1.office_tel, p1.hand_tel, p1.new_ic, p1.old_ic, p1.address1, p1.address2, p1.city, p1.dob FROM resume_personal p1 ,resume_personal p2 where p1.$searchString=p2.$searchString And p1.resume_id!=p2.resume_id Order by p1.$sort $order ";

$result2 = mysql_query($sqlselect2,$conn) or die(mysql_error());

Anyones got the better solution to retrieve duplicate data or to improve the speed?Thanks.

erickwidya
June 23rd, 2005, 01:09 AM
use GROUP BY instead DISTINCT
"SELECT p1.resume_id, p1.user_id, p1.email, p1.name, p1.gender, p1.house_tel, p1.office_tel, p1.hand_tel, p1.new_ic, p1.old_ic, p1.address1, p1.address2, p1.city, p1.dob FROM resume_personal p1
GROUP BY p1.resume_ID, p1.user_id, p1.email, p1.name, p1.gender, p1.house_tel, p1.office_tel, p1.hand_tel, p1.new_ic, p1.old_ic, p1.address1, p1.address2, p1.city, p1.dob
Order by p1.$sort $order ";

btw why u use this
...FROM resume_personal p1 ,resume_personal p2 where p1.$searchString=p2.$searchString And p1.resume_id!=p2.resume_id
Order by p1.$sort $order "to me : the JOIN not necessary

haw_fong
June 23rd, 2005, 03:13 AM
Here is my table strusture:

CREATE TABLE `resume_personal` (
`resume_id` int(11) NOT NULL auto_increment,
`user_id` varchar(50) NOT NULL default '',
`email` varchar(50) default NULL,
`pwd` text NOT NULL,
`name` varchar(80) NOT NULL default '',
`gender` char(1) NOT NULL default '',
`new_ic` varchar(14) default NULL,
`old_ic` varchar(14) default NULL,
`passport` varchar(20) default NULL,
`dob` date NOT NULL default '0000-00-00',
`address1` varchar(80) default NULL,
`address2` varchar(80) default NULL,
`city` varchar(30) default NULL,
`postcode` varchar(7) default NULL,
`state` varchar(30) default NULL,
`house_tel` varchar(12) default NULL,
`office_tel` varchar(12) default NULL,
`hand_tel` varchar(12) default NULL,
`marital` varchar(15) default NULL,
`race` char(1) NOT NULL default '',
`no_in_family` char(2) NOT NULL default '',
`position_in_family` char(1) NOT NULL default '',
`expected_sal` varchar(7) default NULL,
`earliest_date_commence` date NOT NULL default '0000-00-00',
`license` varchar(30) default NULL,
`spouse_name` varchar(80) default NULL,
`spouse_job` varchar(80) default NULL,
`spouse_company` varchar(80) default NULL,
`spouse_phone` varchar(12) default NULL,
`photo_location` varchar(80) default NULL,
`spoken1` varchar(30) default NULL,
`spoken2` varchar(30) default NULL,
`spoken3` varchar(30) default NULL,
`spoken4` varchar(30) default NULL,
`written1` varchar(30) default NULL,
`written2` varchar(30) default NULL,
`written3` varchar(30) default NULL,
`written4` varchar(30) default NULL,
`status` char(1) NOT NULL default '0',
`datestamp` datetime NOT NULL default '0000-00-00 00:00:00',
`position` varchar(100) default NULL,
`other_written` text,
`other_spoken` text,
`apply_date` datetime NOT NULL default '0000-00-00 00:00:00',
`access_level` int(1) NOT NULL default '0',
`corr_address1` varchar(80) NOT NULL default '',
`corr_address2` varchar(80) NOT NULL default '',
`corr_city` varchar(30) NOT NULL default '',
`corr_postcode` varchar(7) NOT NULL default '',
`corr_state` varchar(30) NOT NULL default '',
`mail_subscription` varchar(10) NOT NULL default 'Y',
`register_date` datetime NOT NULL default '0000-00-00 00:00:00',
`last_apply_date` datetime NOT NULL default '0000-00-00 00:00:00',
PRIMARY KEY (`resume_id`),
KEY `name` (`name`),
KEY `apply_date` (`apply_date`),
KEY `new_ic` (`new_ic`),
) TYPE=MyISAM AUTO_INCREMENT=58537 ;

cjard
June 23rd, 2005, 04:31 AM
decide what counts as a duplicate to you..


suppose in tblEmployees, the firstname, lastname and phoneNumber should be distinct. if they arent then you have duplicates you need to remove so you do:


select *
from
tblEmployees t,
(select firstName, lastName, phoneNum
from tblEmployees
group by firstName, lastName, phoneNum
having count(*)>1) dupFinder
where
t.firstName = dupFinder.firstname and
t.lastname = dupfinder.lastname and
t.phonenum = dupfinder.phonenum


here we have a small sql that we can think of as a temporary table:
(select firstName, lastName, phoneNum
from tblEmployees
group by firstName, lastName, phoneNum
having count(*)>1)

if finds the first,last and phone that appear more than once in the table

then the outer query pulls all the columns from tblemployees, of these duplicates



remove duplicates once; dont forever use distinct or group to remove them at query time

erickwidya
June 23rd, 2005, 10:58 PM
haw_fongi think u have an issue in designing the table
if u can get the "Nomalize form" for it, i think u'll get performance speed up

decide what counts as a duplicate to you..cjard's got a point here

and tell us what this table is for..the purpose of it

regards