Help using Base SAS procedures

Race looping

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 8
Accepted Solution

Race looping

Hello

I am new to SAS and someone recommended I try asking for help on SAS forum.  I am trying to assign a primary race to patients.  The form allows for up to 5 different races to be provided.  I am having difficulty trying to write SAS code to look at each race1 to race5 to determine a primary race.

The different race categories are common and the complexity is correctly identifying mixed and missing/not reported.  I have a table of five patients with different reported races in column race1-race5 and trying to write code to produce results under PrimaryRace column.  Can anyone help me?   Thank you

patientrace1race2race3race4race5PrimaryRace
2005blwekmissingaaaaasian
2012rhiolamissingbmissingamixed
1989lliokmissingbmissingmissingwmixed
2000ijlliwwwwwwhite

Accepted Solutions
Solution
‎11-22-2014 12:20 AM
Respected Advisor
Posts: 4,934

Re: Race looping

Something like this, using an array :

data have;

input (patient race1 race2 race3 race4 race5) ($);

datalines;

2005blwek missing a a a a asian

2012rhiol a missing b missing a mixed

1989lliok missing b missing missing w mixed

2000ijlli w w w w w white

;

data want;

set have;

length primaryRace $8;

array r{*} race:;

primaryRace = "missing";

do i = 1 to dim(r);

    if r{i} ne "missing" then

        if primaryRace = "missing" then primaryRace = r{i};

        else if primaryRace ne r{i} then do;

            primaryRace = "mixed";

            leave;

            end;

    end;

drop i;

run;

proc print data=want noobs; run;

PG

Message was edited by: PG Added provision for the case where race1 - race5 is all missing.

PG

View solution in original post


All Replies
Solution
‎11-22-2014 12:20 AM
Respected Advisor
Posts: 4,934

Re: Race looping

Something like this, using an array :

data have;

input (patient race1 race2 race3 race4 race5) ($);

datalines;

2005blwek missing a a a a asian

2012rhiol a missing b missing a mixed

1989lliok missing b missing missing w mixed

2000ijlli w w w w w white

;

data want;

set have;

length primaryRace $8;

array r{*} race:;

primaryRace = "missing";

do i = 1 to dim(r);

    if r{i} ne "missing" then

        if primaryRace = "missing" then primaryRace = r{i};

        else if primaryRace ne r{i} then do;

            primaryRace = "mixed";

            leave;

            end;

    end;

drop i;

run;

proc print data=want noobs; run;

PG

Message was edited by: PG Added provision for the case where race1 - race5 is all missing.

PG
Occasional Contributor
Posts: 8

Re: Race looping

PG

Your suggestion worked perfectly!!  Thank you!!  I looked up arrays and found it fabulous that arrays repeats actions for consistent variable.  I wrote a data step to look at every single race column and by the time I reached the fourth race, it become so confusing that I lost sight of what I was trying to do.  Thank you very much.  I really want to learn and try to understand how the code really works- can you help me understand how that do loop worked and what that "dim" does?  Thank you again!

Respected Advisor
Posts: 4,934

Re: Race looping

Yes, of course, I hope these comments help:

data want;

set have;

length primaryRace $8;

/* Define array r containing all variables with names starting with 'race' , i.e. race1 ... race5*/

/* From this point on, saying r{1} is the same as saying race1, etc. */

array r{*} race:;

/* Initialize primaryRace to the value it should have if all races are missing */

primaryRace = "missing";

/* Loop for i = 1 to the size of array r, i.e. from 1 to 5 */

do i = 1 to dim(r);

    /* Only if racei is not missing... */

    if r{i} ne "missing" then

        /* If primaryRace is not set yet then make it racei */

        if primaryRace = "missing" then primaryRace = r{i};

        /* If primaryRace already has a value and it is not the same

           as racei then switch primaryRace to "mixed" and leave the loop */

        else if primaryRace ne r{i} then do;

            primaryRace = "mixed";

            leave;

            end;

    end;

drop i;

run;


PG

PG
Occasional Contributor
Posts: 8

Re: Race looping

Thank you PG.   Appreciate your help.  Can I ask for your advice on possibly changing the temporary work folder on my laptop.  I have my data on my external drive and running SAS on my company laptop.  Appears the temp library is in the C drive of my computer, however, its only 500gb and I keep deleting the older temp work datasets that i no longer need.  I dont have permissions on my laptop to make any changes to the configuration.... is there any other way to re-direct the 'work' library to my external? 

Respected Advisor
Posts: 4,934

Re: Race looping

Well, I had to do a bit of testing... Creating a desktop shortcut like

"C:\Program Files\SASHome9_4\SASFoundation\9.4\sas.exe" -work "F:\temp"

works on my machine to relocate the WORK directory. SAS command line options have priority over any configuration file settings.

Try it!

PG

PG
Respected Advisor
Posts: 4,173

Re: Race looping

Another way of getting there. Pierre's code is more appropriate for a beginner though.

data have;

  infile datalines truncover dlm=',';

  input (patient race1 race2 race3 race4 race5 PrimaryRace) (:$9.);

  datalines;

2005blwek,missing,a,a,a,a,asian

2012rhiol,a,missing,b,missing,a,mixed

1989lliok,missing,b,missing,missing,w,mixed

2000ijlli,w,w,w,w,w,white

;

run;

proc format;

  value $race

    'a'           = 'asian'

    'b'           = 'somthing else'

    'w'           = 'white'

    'missing',' ' = 'missing'

    ;

run;

data want(drop=_Smiley Happy ;

  if _n_=1 then

    do;

      length _race $9.;

      declare hash h1();

      _rc=h1.defineKey('_race');

      _rc=h1.defineDone();

    end;

  set have;

  _race='';

  array races {*} race1 - race5;

  do _i=1 to dim(races);

    if races[_i] ne 'missing' then

      do;

        _race=races[_i];

        _rc=h1.add();

      end;

  end;

  _nraces=h1.num_items;

  _rc=h1.clear();

  if _nraces<2 then derived_PrimaryRace=put(_race,$race.);

  else derived_PrimaryRace='mixed';

run;

Super User
Posts: 10,046

Re: Race looping

Like Patrick's, an array version code.


data have;
input (patient race1 race2 race3 race4 race5) ($);
datalines;
2005blwek . a a a a 
2012rhiol a . b . a 
1989lliok . b . . w 
2000ijlli w w w w w 
;
data want;
set have;
length primaryRace $8;
array r{*} race:;
primaryRace =coalescec(of race:);
do i = 1 to dim(r)-1;
 do j=i+1 to dim(r);
    if not missing(r{i}) and not missing(r{j}) and r{i} ne r{j} then do;
     primaryRace = "mixed";
            leave;
    end;
 end;
end;
drop i j;
run;

Xia Keshan

PROC Star
Posts: 7,492

Re: Race looping

If you don't need to keep race1 to race5 in the file you create, then you could do what you want/need with something like:

proc format;

  invalue $race

    'a'           = 'asian'

    'b'           = 'black'

    'w'           = 'white'

    'h'           = 'hispanic'

    'missing',' ' = ' '

    ;

run;

data have(keep=patient race);

  infile datalines truncover dlm=',';

  informat patient $9.;

  informat race1-race5 $race7.;

  array races(5) $ race1-race5;

  input patient race1-race5;

  call missing(PrimaryRace);

  do i=1 to 5;

    if not missing(races(i)) then do;

      race=races(i);

      output;

    end;

  end;

  datalines;

2005blwek,missing,a,a,a,a,asian

2012rhiol,a,missing,b,missing,a,mixed

1989lliok,missing,b,missing,missing,w,mixed

2000ijlli,w,w,w,w,w,white

;

run;

proc sql;

  create table want as

    select distinct patient,

      case count(distinct race)

        when 1 then race

        else 'Mixed'

        end as PrimaryRace

          from have

            group by patient

  ;

quit;

Respected Advisor
Posts: 3,156

Re: Race looping

If 'missing' is blank, then here is a another options with long, nested functions:

data have;

     input (patient race1 race2 race3 race4 race5) ($);

     datalines;

2005blwek . a a a a

2012rhiol a . b . a

1989lliok . b . . w

2000ijlli w w w w w

ariagakle . . . . .

;

data want;

     set have;

     length primary_race $ 10;

     primary_race=ifc(lengthn(compress(cats(of raceSmiley Happy,first(cats(of raceSmiley Happy)))=0, first(cats(of raceSmiley Happy), 'mixed');

run;


Haikuo

Contributor
Posts: 61

Re: Race looping

Hello-

I have something similar that I am trying to do and would like to ask for help.  I used the code below but instead of  just looking to see if primaryRace is different than r{1} (which I think this code is doing) - I was wondering how I can add additional code to look at each race1-5 and any one of the race entered is 'Asian' and another race is 'Filipino' the primaryRace would be 'Filipino' instead of 'mixed'.

Can anyone help me?  Please?

*****************************************************

data want;

set have;

length primaryRace $8;

array r{*} race:;

primaryRace = "missing";

do i = 1 to dim(r);

    if r{i} ne "missing" then

        if primaryRace = "missing" then primaryRace = r{i};

        else if primaryRace ne r{i} then do;

            primaryRace = "mixed";

            leave;

            end;

    end;

drop i;

run;

********************************************

Super User
Posts: 11,343

Re: Race looping

Posted in reply to Suzanne_Ed

For this forum we prefer that you start a new thread and then cite a similar previous thread such as this one.

It would be helpful to provide a few example records with examples of different values, how "missing" may be indicated, whether the values a text or numeric codes corresponding to a text value.

The basic rules to me sound like they may be:

If there are no valid race values entered then PrimaryRace is assigned the value of "missing"

If there is only one "valid" (non-missing) race entery then that value is assigned to the PrimaryRace.

If there are EXACTLY two valid race values, one of Asian and one of Filipino then PrimaryRace is assigned as Filipino

Any other combination of 2 or more valid race values assigns the value of "mixed" to PrimaryRace.

Does this summarize your problem correctly?

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 11 replies
  • 521 views
  • 3 likes
  • 8 in conversation