BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
juju_p
Obsidian | Level 7

Dear all,

 

For certain operations that I perform in SAS, it takes a good while.

Whenever I run my programs the mouse freezes for at least a couple of minutes before showing the results.

 

At this time I am manpulating a dataset of about 36000 observations, so not that much! 

 

Do you have an idea why this issue?

 

See an example of runtime result


NOTE: There were 36264 observations read from the data set WORK.LAB_CHE.
NOTE: 33242 observations with duplicate key values were deleted.
NOTE: The data set WORK.LAB_CHE_S has 3022 observations and 23 variables.
NOTE: PROCEDURE SORT used (Total process time):
real time 0.14 seconds
cpu time 0.00 seconds


306 proc sql;
307 title 'Duplicate Samples for Clinical Chemistry';
308 select SUBJID,VISITD,LBTYPE,LBDAT,count(*) as Count
309 from lab_che_s
310 group by SUBJID, LBTYPE, LBDAT
311 having count(*) > 1;
NOTE: The query requires remerging summary statistics back with the original data.
312 title4 'Extreme labo values for Clinical Chemistry';
NOTE: PROCEDURE SQL used (Total process time):
real time 16:25.48
cpu time 1:02.04

 

BRs

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

First, I would try to get SQL out of the loop. Consider this:

proc sort data=lab_che_s;
by subjid lbtype lbdat;
run;

data want;
count = 0;
do until (last.lbdat);
  set lab_che_s;
  by subjid lbtype lbdat;
  count + 1;
end;
do until (last.lbdat);
  set lab_che_s;
  by subjid lbtype lbdat;
  if count > 1 then output;
end;
keep subjid lbtype lbdat visitd count;
run;

View solution in original post

9 REPLIES 9
juju_p
Obsidian | Level 7

Yes because I wanted to find out how many times we had a duplicate combination of SUBJID+LBTYPE+ LBDAT. I show the VISITD in the result because I want to know the visit where each duplicate occurs.

ballardw
Super User

@juju_p wrote:

Yes because I wanted to find out how many times we had a duplicate combination of SUBJID+LBTYPE+ LBDAT. I show the VISITD in the result because I want to know the visit where each duplicate occurs.


Here is an example of a similar problem. Find those combinations of Sex and Age that are duplicated in the SASHELP.CLASS data set and then get the Names associated with the duplicates.

 

Proc freq data=sashelp.class noprint;
   tables sex*age /out=temp (where=(count>1)) ;
run;

proc sql;
  create table dupes as
  select a.name, a.sex,a.age
  from sashelp.class as a inner join
       temp as b
       on a.sex=b.sex
       and a.age=b.age
  order by a.sex,a.age
  ;
quit;

Combinations of 3 variables would be quite similar. Tables statement in Freq (or any other similar count). Note the WHERE clause means the result output data set only has duplicates. Proc freq is sort of optimized for counting and would likely run quite a bit quicker than the Proc SQL part doing the same thing.

Then join the count data to the original data on the variable combinations, adding the identification variable(s) of interest. Group by the combination variable so you can see the similar items together or possibly even do some FIRST/LAST processing in a data step.

Kurt_Bremser
Super User

First, I would try to get SQL out of the loop. Consider this:

proc sort data=lab_che_s;
by subjid lbtype lbdat;
run;

data want;
count = 0;
do until (last.lbdat);
  set lab_che_s;
  by subjid lbtype lbdat;
  count + 1;
end;
do until (last.lbdat);
  set lab_che_s;
  by subjid lbtype lbdat;
  if count > 1 then output;
end;
keep subjid lbtype lbdat visitd count;
run;
juju_p
Obsidian | Level 7

This works quite well an fast! The only drawback is that it requires initial sorting of the data, which SQL doesn't. Thanks

Kurt_Bremser
Super User

@juju_p wrote:

This works quite well an fast! The only drawback is that it requires initial sorting of the data, which SQL doesn't. Thanks


SQL also needs sorting, it does it "under the hood", and often in a less efficient way than the explicit sort. One reason why several discrete steps often outperfom one SQL by orders of magnitude.

Also see Maxim 10.

Reeza
Super User

 

If you write it to a table and view the table instead of have it printed out to the results window is that faster?

 

proc sql;
 title 'Duplicate Samples for Clinical Chemistry';
create table duplicates as
 select SUBJID,VISITD,LBTYPE,LBDAT,count(*) as Count
 from lab_che_s
group by SUBJID, LBTYPE, LBDAT
 having count(*) > 1;
quit;

@juju_p wrote:

Dear all,

 

For certain operations that I perform in SAS, it takes a good while.

Whenever I run my programs the mouse freezes for at least a couple of minutes before showing the results.

 

At this time I am manpulating a dataset of about 36000 observations, so not that much! 

 

Do you have an idea why this issue?

 

See an example of runtime result


NOTE: There were 36264 observations read from the data set WORK.LAB_CHE.
NOTE: 33242 observations with duplicate key values were deleted.
NOTE: The data set WORK.LAB_CHE_S has 3022 observations and 23 variables.
NOTE: PROCEDURE SORT used (Total process time):
real time 0.14 seconds
cpu time 0.00 seconds


306 proc sql;
307 title 'Duplicate Samples for Clinical Chemistry';
308 select SUBJID,VISITD,LBTYPE,LBDAT,count(*) as Count
309 from lab_che_s
310 group by SUBJID, LBTYPE, LBDAT
311 having count(*) > 1;
NOTE: The query requires remerging summary statistics back with the original data.
312 title4 'Extreme labo values for Clinical Chemistry';
NOTE: PROCEDURE SQL used (Total process time):
real time 16:25.48
cpu time 1:02.04

 

BRs


 

andreas_lds
Jade | Level 19

Maybe i don't understand the task, but why can't you use proc summary?

 

proc summary data= lab_che_s nway;
  class subjd lbtype lbdat;
  output out= duplicates(drop= _type_ rename=(_freq_ = count) where=(count > 1));
run;

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 1516 views
  • 4 likes
  • 5 in conversation