BookmarkSubscribeRSS Feed
Gpepsicola
Calcite | Level 5


You are a perinatal and reproductive epidemiologist and are given an Excel spreadsheet, Project3.xlsx, which contains 1000 records. There are a total of 5 hospitals in the dataset and in each of these hospitals, there were three possible types of infant outcomes: normal (>= 2500 grams), low birth weight (<2500 grams but >= 1500 grams), and very low birth weight (< 1500 grams).

 

You are interested in conducting a sub-study in which you would like to randomly select 3 very low birth weight, 10 low birth weight, and 25 normal infants from each hospital.

 

12 REPLIES 12
mkeintz
PROC Star

Will I get course credit for doing this homework?

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Gpepsicola
Calcite | Level 5
No. I'm looking for suggestions to complete it myself. Thanks for inquiring.
mkeintz
PROC Star

OK, here's the randomizing part, untested.  It assumes dataset HAVE is sorted by hospital.  I know that RANUNI is deprecated but you can substitute SAS's improved uniform random number generator.

 

 

 

data want (drop=av_: w_: w ninfants);

  array avail {3} av_vlbw av_lbw av_nw;
  do w=1 to 3; avail{w}=0; end;

  array want  {3} w_vlbw w_lbw w_nw;
  w_vlbw=3;
  w_lbw=10; 
  w_nw=25;

  /* Get total available infants by weight group */
  do ninfants=1 by 1 until (last.hospital);
    set have;
    by hospital;
    w=1+(weight>=1500)+(weight>=2500);
    avail{w}=avail{w}+1;
  end;

  /* re-read and randomly draw */
  do until (last.hospital);
    set have;
    by hospital;
    w=1+(weight>=1500)+(weight>=2500);

    if want{w}/avail{w}>=ranuni(09184065) then do;
      output;
      want{w}=want{w}-1;
    end;
    avail{w}=avail{w}-1;
  end;
run;  
	  
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Reeza
Super User

 

IMO you should really look into PROC SURVEYSELECT.

Ksharp
Super User

proc import datafile='/folders/myfolders/project3.xlsx' out=have dbms=xlsx replace;
run;
proc sort data=have;by hospnm bthwt;run;
proc surveyselect data=have sampsize=25 selectall out=temp;
strata hospnm bthwt;
run;
data want;
do until(last.bthwt);
 set temp;
 by hospnm bthwt;
end;
_bthwt=bthwt;

do i=1 by 1 until(last.bthwt);
 set temp;
 by hospnm bthwt;
 if _bthwt='VLBW' then do;
  if i le 3 then output;
 end;
 else  if _bthwt='LBW' then do;
        if i le 10 then output;
       end;
   else output;
end;
keep hospnm bthwt subjno;
run;
proc print noobs;run;
Ksharp
Super User

proc import datafile='/folders/myfolders/project3.xlsx' out=have dbms=xlsx replace;
run;
proc sort data=have;by hospnm bthwt;run;
proc surveyselect data=have sampsize=25 selectall out=temp;
strata hospnm bthwt;
run;
data want;
 set temp;
 by hospnm bthwt;
 if first.bthwt then i=0;
 i+1;
 if bthwt='VLBW' then do;
  if i le 3 then output;
 end;
 else  if bthwt='LBW' then do;
        if i le 10 then output;
       end;
   else output;

keep hospnm bthwt subjno;
run;
proc print noobs;run;


mkeintz
PROC Star

@Ksharp

 

I have a concern about your proposed solution:

 

You're asking SURVEYSELECT for a sample size of 25 for each of 3 birthweight group, so that later you can take the smaller sample sizes of 3 and 10 from 2 of those groups.  Let's say the hospital has 5 VLBW births (or any number between 4 and 25), in which case surveyselect would keep all the VLBW births, presumably in the original order.  Then you take the first 3.

 

Frequently the original data might be sorted within hospital, say by date of birth. Then your suggestion risks never drawing VLBW births at the end of the time period.  I think you either have to pre-sort the data in random order within hospital (in which case surveyselect is not needed at all), or run surveyselect 3 times, with 3 different WHERE statements and 3 sampsize values.

 

regards,

Mark

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Ksharp
Super User
Mike,
You are right. I never thought proc surveyselect would not change the order of obs 
when sample size is less than 25. Bad feature for proc surveyselect .

This could be simple as yours :



proc import datafile='/folders/myfolders/project3.xlsx' out=have dbms=xlsx replace;
run;
data have;
 set have;
 _rand=ranuni(0);
run;
proc sort data=have;by hospnm bthwt _rand;run;
data want;
 set have;
 by hospnm bthwt;
 if first.bthwt then i=0;
 i+1;
 if bthwt='VLBW' then do;
  if i le 3 then output;
 end;
 else  if bthwt='LBW' then do;
        if i le 10 then output;
       end;
   else output;

keep hospnm bthwt subjno;
run;
proc print noobs;run;

Ksharp
Super User
Opps,
This ought to be right.



proc import datafile='/folders/myfolders/project3.xlsx' out=have dbms=xlsx replace;
run;
data have;
 set have;
 _rand=ranuni(0);
run;
proc sort data=have;by hospnm bthwt _rand;run;
data want;
 set have;
 by hospnm bthwt;
 if first.bthwt then i=0;
 i+1;
 if bthwt='VLBW' then do;
  if i le 3 then output;
 end;
 else  if bthwt='LBW' then do;
        if i le 10 then output;
       end;
   else do;
        if i le 25 then output;
   end;
keep hospnm bthwt subjno;
run;
proc print noobs;run;

mkeintz
PROC Star

@Ksharp

 

Nice program.  I like the idea of sorting by hospnm bthwt.  But I think further simplification is achieveable:

 

data have (drop=imax i rand);

  retain imax;

  set have;

  by hospnm bthwt;

  if first.bthwt then do;

    if bthwt='VLBW' then imax=3; else

    if bthwt='LBW' then imax=10; else

    if bthwt='Normal' then imax=25;

    i=0;

  end;

  i+1;

  if i<=imax;

run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
mkeintz
PROC Star

My comment to @Ksharp prompted a thought of a simpler approach.  It has more steps, but doesn't require counting the number of available births, and preserves equal sampling probability for each record within a hospital/birthweight group.

 

Now editted to use orginal var names and values:

And edditted again - in my first edit I used strikeout font for variable hospital, which I see is not preserved in the final version.  Those are gone now.

 

data need / view=need;
  set have;
  rnum=ranuni(01672356);
run;
proc sort data=need out=sorted;
  by hospm  rnum;
run;
data want (drop=rnum w);
  set sorted;
  by hospnm ;
  array need {3} _temporary_ ;
  if first.hospnm then do;
    need{1}=3; need{2}=10; need{3}=25;
  end;
w=findw('VLBW LBW Normal',trim(bthwt),' ','e'); if need{w}>0; need{w}=need{w}-1; run;

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 1322 views
  • 1 like
  • 4 in conversation