Solved: Re: Race looping

BaileyY · Posted 11-21-2014 11:53 PM

Hello

I am new to SAS and someone recommended I try asking for help on SAS forum. I am trying to assign a primary race to patients. The form allows for up to 5 different races to be provided. I am having difficulty trying to write SAS code to look at each race1 to race5 to determine a primary race.

The different race categories are common and the complexity is correctly identifying mixed and missing/not reported. I have a table of five patients with different reported races in column race1-race5 and trying to write code to produce results under PrimaryRace column. Can anyone help me? Thank you

patient	race1	race2	race3	race4	race5	PrimaryRace
2005blwek	missing	a	a	a	a	asian
2012rhiol	a	missing	b	missing	a	mixed
1989lliok	missing	b	missing	missing	w	mixed
2000ijlli	w	w	w	w	w	white

PGStats · Posted 11-22-2014 12:20 AM

Something like this, using an array :

data have;

input (patient race1 race2 race3 race4 race5) ($);

datalines;

2005blwek missing a a a a asian

2012rhiol a missing b missing a mixed

1989lliok missing b missing missing w mixed

2000ijlli w w w w w white

;

data want;

set have;

length primaryRace $8;

array r{*} race:;

primaryRace = "missing";

do i = 1 to dim(r);

if r{i} ne "missing" then

if primaryRace = "missing" then primaryRace = r{i};

else if primaryRace ne r{i} then do;

primaryRace = "mixed";

leave;

end;

drop i;

run;

proc print data=want noobs; run;

PG

Message was edited by: PG Added provision for the case where race1 - race5 is all missing.

PG

View solution in original post

PGStats · Posted 11-22-2014 12:20 AM

Something like this, using an array :

data have;

input (patient race1 race2 race3 race4 race5) ($);

datalines;

2005blwek missing a a a a asian

2012rhiol a missing b missing a mixed

1989lliok missing b missing missing w mixed

2000ijlli w w w w w white

;

data want;

set have;

length primaryRace $8;

array r{*} race:;

primaryRace = "missing";

do i = 1 to dim(r);

if r{i} ne "missing" then

if primaryRace = "missing" then primaryRace = r{i};

else if primaryRace ne r{i} then do;

primaryRace = "mixed";

leave;

end;

drop i;

run;

proc print data=want noobs; run;

PG

Message was edited by: PG Added provision for the case where race1 - race5 is all missing.

PG

BaileyY · Posted 11-22-2014 10:39 PM

PG

Your suggestion worked perfectly!! Thank you!! I looked up arrays and found it fabulous that arrays repeats actions for consistent variable. I wrote a data step to look at every single race column and by the time I reached the fourth race, it become so confusing that I lost sight of what I was trying to do. Thank you very much. I really want to learn and try to understand how the code really works- can you help me understand how that do loop worked and what that "dim" does? Thank you again!

PGStats · Posted 11-22-2014 11:32 PM

Yes, of course, I hope these comments help:

data want;

set have;

length primaryRace $8;

/* Define array r containing all variables with names starting with 'race' , i.e. race1 ... race5*/

/* From this point on, saying r{1} is the same as saying race1, etc. */

array r{*} race:;

/* Initialize primaryRace to the value it should have if all races are missing */

primaryRace = "missing";

/* Loop for i = 1 to the size of array r, i.e. from 1 to 5 */

do i = 1 to dim(r);

/* Only if racei is not missing... */

if r{i} ne "missing" then

/* If primaryRace is not set yet then make it racei */

if primaryRace = "missing" then primaryRace = r{i};

/* If primaryRace already has a value and it is not the same

as racei then switch primaryRace to "mixed" and leave the loop */

else if primaryRace ne r{i} then do;

primaryRace = "mixed";

leave;

end;

drop i;

run;

PG

BaileyY · Posted 11-23-2014 12:04 PM

Thank you PG. Appreciate your help. Can I ask for your advice on possibly changing the temporary work folder on my laptop. I have my data on my external drive and running SAS on my company laptop. Appears the temp library is in the C drive of my computer, however, its only 500gb and I keep deleting the older temp work datasets that i no longer need. I dont have permissions on my laptop to make any changes to the configuration.... is there any other way to re-direct the 'work' library to my external?

PGStats · Posted 11-23-2014 05:05 PM

Well, I had to do a bit of testing... Creating a desktop shortcut like

"C:\Program Files\SASHome9_4\SASFoundation\9.4\sas.exe" -work "F:\temp"

works on my machine to relocate the WORK directory. SAS command line options have priority over any configuration file settings.

Try it!

PG

Patrick · Posted 11-22-2014 01:14 AM

Another way of getting there. Pierre's code is more appropriate for a beginner though.

data have;

infile datalines truncover dlm=',';

input (patient race1 race2 race3 race4 race5 PrimaryRace) (:$9.);

datalines;

2005blwek,missing,a,a,a,a,asian

2012rhiol,a,missing,b,missing,a,mixed

1989lliok,missing,b,missing,missing,w,mixed

2000ijlli,w,w,w,w,w,white

;

run;

proc format;

value $race

'a' = 'asian'

'b' = 'somthing else'

'w' = 'white'

'missing',' ' = 'missing'

;

run;

data want(drop=_:) ;

if _n_=1 then

do;

length _race $9.;

declare hash h1();

_rc=h1.defineKey('_race');

_rc=h1.defineDone();

end;

set have;

_race='';

array races {*} race1 - race5;

do _i=1 to dim(races);

if races[_i] ne 'missing' then

do;

_race=races[_i];

_rc=h1.add();

end;

_nraces=h1.num_items;

_rc=h1.clear();

if _nraces<2 then derived_PrimaryRace=put(_race,$race.);

else derived_PrimaryRace='mixed';

run;

Ksharp · Posted 11-22-2014 03:55 AM

Like Patrick's, an array version code.


data have;
input (patient race1 race2 race3 race4 race5) ($);
datalines;
2005blwek . a a a a 
2012rhiol a . b . a 
1989lliok . b . . w 
2000ijlli w w w w w 
;
data want;
set have;
length primaryRace $8;
array r{*} race:;
primaryRace =coalescec(of race:);
do i = 1 to dim(r)-1;
 do j=i+1 to dim(r);
    if not missing(r{i}) and not missing(r{j}) and r{i} ne r{j} then do;
     primaryRace = "mixed";
            leave;
    end;
 end;
end;
drop i j;
run;

Xia Keshan

art297 · Posted 11-22-2014 12:38 PM

If you don't need to keep race1 to race5 in the file you create, then you could do what you want/need with something like:

proc format;

invalue $race

'a' = 'asian'

'b' = 'black'

'w' = 'white'

'h' = 'hispanic'

'missing',' ' = ' '

;

run;

data have(keep=patient race);

infile datalines truncover dlm=',';

informat patient $9.;

informat race1-race5 $race7.;

array races(5) $ race1-race5;

input patient race1-race5;

call missing(PrimaryRace);

do i=1 to 5;

if not missing(races(i)) then do;

race=races(i);

output;

end;

datalines;

2005blwek,missing,a,a,a,a,asian

2012rhiol,a,missing,b,missing,a,mixed

1989lliok,missing,b,missing,missing,w,mixed

2000ijlli,w,w,w,w,w,white

;

run;

proc sql;

create table want as

select distinct patient,

case count(distinct race)

when 1 then race

else 'Mixed'

end as PrimaryRace

from have

group by patient

;

quit;

Haikuo · Posted 11-22-2014 10:31 PM

If 'missing' is blank, then here is a another options with long, nested functions:

data have;

input (patient race1 race2 race3 race4 race5) ($);

datalines;

2005blwek . a a a a

2012rhiol a . b . a

1989lliok . b . . w

2000ijlli w w w w w

ariagakle . . . . .

;

data want;

set have;

length primary_race $ 10;

primary_race=ifc(lengthn(compress(cats(of race:),first(cats(of race:))))=0, first(cats(of race:)), 'mixed');

run;

Haikuo

SannaSanna · Posted 05-06-2015 06:37 PM

Hello-

I have something similar that I am trying to do and would like to ask for help. I used the code below but instead of just looking to see if primaryRace is different than r{1} (which I think this code is doing) - I was wondering how I can add additional code to look at each race1-5 and any one of the race entered is 'Asian' and another race is 'Filipino' the primaryRace would be 'Filipino' instead of 'mixed'.

Can anyone help me? Please?

*****************************************************

data want;

set have;

length primaryRace $8;

array r{*} race:;

primaryRace = "missing";

do i = 1 to dim(r);

if r{i} ne "missing" then

if primaryRace = "missing" then primaryRace = r{i};

else if primaryRace ne r{i} then do;

primaryRace = "mixed";

leave;

end;

drop i;

run;

********************************************

ballardw · Posted 05-06-2015 07:02 PM

For this forum we prefer that you start a new thread and then cite a similar previous thread such as this one.

It would be helpful to provide a few example records with examples of different values, how "missing" may be indicated, whether the values a text or numeric codes corresponding to a text value.

The basic rules to me sound like they may be:

If there are no valid race values entered then PrimaryRace is assigned the value of "missing"

If there is only one "valid" (non-missing) race entery then that value is assigned to the PrimaryRace.

If there are EXACTLY two valid race values, one of Asian and one of Filipino then PrimaryRace is assigned as Filipino

Any other combination of 2 or more valid race values assigns the value of "mixed" to PrimaryRace.

Does this summarize your problem correctly?

Registration is open

SAS Training: Just a Click Away