BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
BaileyY
Obsidian | Level 7

Hello

I am new to SAS and someone recommended I try asking for help on SAS forum.  I am trying to assign a primary race to patients.  The form allows for up to 5 different races to be provided.  I am having difficulty trying to write SAS code to look at each race1 to race5 to determine a primary race.

The different race categories are common and the complexity is correctly identifying mixed and missing/not reported.  I have a table of five patients with different reported races in column race1-race5 and trying to write code to produce results under PrimaryRace column.  Can anyone help me?   Thank you

patientrace1race2race3race4race5PrimaryRace
2005blwekmissingaaaaasian
2012rhiolamissingbmissingamixed
1989lliokmissingbmissingmissingwmixed
2000ijlliwwwwwwhite
1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

Something like this, using an array :

data have;

input (patient race1 race2 race3 race4 race5) ($);

datalines;

2005blwek missing a a a a asian

2012rhiol a missing b missing a mixed

1989lliok missing b missing missing w mixed

2000ijlli w w w w w white

;

data want;

set have;

length primaryRace $8;

array r{*} race:;

primaryRace = "missing";

do i = 1 to dim(r);

    if r{i} ne "missing" then

        if primaryRace = "missing" then primaryRace = r{i};

        else if primaryRace ne r{i} then do;

            primaryRace = "mixed";

            leave;

            end;

    end;

drop i;

run;

proc print data=want noobs; run;

PG

Message was edited by: PG Added provision for the case where race1 - race5 is all missing.

PG

View solution in original post

11 REPLIES 11
PGStats
Opal | Level 21

Something like this, using an array :

data have;

input (patient race1 race2 race3 race4 race5) ($);

datalines;

2005blwek missing a a a a asian

2012rhiol a missing b missing a mixed

1989lliok missing b missing missing w mixed

2000ijlli w w w w w white

;

data want;

set have;

length primaryRace $8;

array r{*} race:;

primaryRace = "missing";

do i = 1 to dim(r);

    if r{i} ne "missing" then

        if primaryRace = "missing" then primaryRace = r{i};

        else if primaryRace ne r{i} then do;

            primaryRace = "mixed";

            leave;

            end;

    end;

drop i;

run;

proc print data=want noobs; run;

PG

Message was edited by: PG Added provision for the case where race1 - race5 is all missing.

PG
BaileyY
Obsidian | Level 7

PG

Your suggestion worked perfectly!!  Thank you!!  I looked up arrays and found it fabulous that arrays repeats actions for consistent variable.  I wrote a data step to look at every single race column and by the time I reached the fourth race, it become so confusing that I lost sight of what I was trying to do.  Thank you very much.  I really want to learn and try to understand how the code really works- can you help me understand how that do loop worked and what that "dim" does?  Thank you again!

PGStats
Opal | Level 21

Yes, of course, I hope these comments help:

data want;

set have;

length primaryRace $8;

/* Define array r containing all variables with names starting with 'race' , i.e. race1 ... race5*/

/* From this point on, saying r{1} is the same as saying race1, etc. */

array r{*} race:;

/* Initialize primaryRace to the value it should have if all races are missing */

primaryRace = "missing";

/* Loop for i = 1 to the size of array r, i.e. from 1 to 5 */

do i = 1 to dim(r);

    /* Only if racei is not missing... */

    if r{i} ne "missing" then

        /* If primaryRace is not set yet then make it racei */

        if primaryRace = "missing" then primaryRace = r{i};

        /* If primaryRace already has a value and it is not the same

           as racei then switch primaryRace to "mixed" and leave the loop */

        else if primaryRace ne r{i} then do;

            primaryRace = "mixed";

            leave;

            end;

    end;

drop i;

run;


PG

PG
BaileyY
Obsidian | Level 7

Thank you PG.   Appreciate your help.  Can I ask for your advice on possibly changing the temporary work folder on my laptop.  I have my data on my external drive and running SAS on my company laptop.  Appears the temp library is in the C drive of my computer, however, its only 500gb and I keep deleting the older temp work datasets that i no longer need.  I dont have permissions on my laptop to make any changes to the configuration.... is there any other way to re-direct the 'work' library to my external? 

PGStats
Opal | Level 21

Well, I had to do a bit of testing... Creating a desktop shortcut like

"C:\Program Files\SASHome9_4\SASFoundation\9.4\sas.exe" -work "F:\temp"

works on my machine to relocate the WORK directory. SAS command line options have priority over any configuration file settings.

Try it!

PG

PG
Patrick
Opal | Level 21

Another way of getting there. Pierre's code is more appropriate for a beginner though.

data have;

  infile datalines truncover dlm=',';

  input (patient race1 race2 race3 race4 race5 PrimaryRace) (:$9.);

  datalines;

2005blwek,missing,a,a,a,a,asian

2012rhiol,a,missing,b,missing,a,mixed

1989lliok,missing,b,missing,missing,w,mixed

2000ijlli,w,w,w,w,w,white

;

run;

proc format;

  value $race

    'a'           = 'asian'

    'b'           = 'somthing else'

    'w'           = 'white'

    'missing',' ' = 'missing'

    ;

run;

data want(drop=_:) ;

  if _n_=1 then

    do;

      length _race $9.;

      declare hash h1();

      _rc=h1.defineKey('_race');

      _rc=h1.defineDone();

    end;

  set have;

  _race='';

  array races {*} race1 - race5;

  do _i=1 to dim(races);

    if races[_i] ne 'missing' then

      do;

        _race=races[_i];

        _rc=h1.add();

      end;

  end;

  _nraces=h1.num_items;

  _rc=h1.clear();

  if _nraces<2 then derived_PrimaryRace=put(_race,$race.);

  else derived_PrimaryRace='mixed';

run;

Ksharp
Super User

Like Patrick's, an array version code.


data have;
input (patient race1 race2 race3 race4 race5) ($);
datalines;
2005blwek . a a a a 
2012rhiol a . b . a 
1989lliok . b . . w 
2000ijlli w w w w w 
;
data want;
set have;
length primaryRace $8;
array r{*} race:;
primaryRace =coalescec(of race:);
do i = 1 to dim(r)-1;
 do j=i+1 to dim(r);
    if not missing(r{i}) and not missing(r{j}) and r{i} ne r{j} then do;
     primaryRace = "mixed";
            leave;
    end;
 end;
end;
drop i j;
run;

Xia Keshan

art297
Opal | Level 21

If you don't need to keep race1 to race5 in the file you create, then you could do what you want/need with something like:

proc format;

  invalue $race

    'a'           = 'asian'

    'b'           = 'black'

    'w'           = 'white'

    'h'           = 'hispanic'

    'missing',' ' = ' '

    ;

run;

data have(keep=patient race);

  infile datalines truncover dlm=',';

  informat patient $9.;

  informat race1-race5 $race7.;

  array races(5) $ race1-race5;

  input patient race1-race5;

  call missing(PrimaryRace);

  do i=1 to 5;

    if not missing(races(i)) then do;

      race=races(i);

      output;

    end;

  end;

  datalines;

2005blwek,missing,a,a,a,a,asian

2012rhiol,a,missing,b,missing,a,mixed

1989lliok,missing,b,missing,missing,w,mixed

2000ijlli,w,w,w,w,w,white

;

run;

proc sql;

  create table want as

    select distinct patient,

      case count(distinct race)

        when 1 then race

        else 'Mixed'

        end as PrimaryRace

          from have

            group by patient

  ;

quit;

Haikuo
Onyx | Level 15

If 'missing' is blank, then here is a another options with long, nested functions:

data have;

     input (patient race1 race2 race3 race4 race5) ($);

     datalines;

2005blwek . a a a a

2012rhiol a . b . a

1989lliok . b . . w

2000ijlli w w w w w

ariagakle . . . . .

;

data want;

     set have;

     length primary_race $ 10;

     primary_race=ifc(lengthn(compress(cats(of race:),first(cats(of race:))))=0, first(cats(of race:)), 'mixed');

run;


Haikuo

SannaSanna
Quartz | Level 8

Hello-

I have something similar that I am trying to do and would like to ask for help.  I used the code below but instead of  just looking to see if primaryRace is different than r{1} (which I think this code is doing) - I was wondering how I can add additional code to look at each race1-5 and any one of the race entered is 'Asian' and another race is 'Filipino' the primaryRace would be 'Filipino' instead of 'mixed'.

Can anyone help me?  Please?

*****************************************************

data want;

set have;

length primaryRace $8;

array r{*} race:;

primaryRace = "missing";

do i = 1 to dim(r);

    if r{i} ne "missing" then

        if primaryRace = "missing" then primaryRace = r{i};

        else if primaryRace ne r{i} then do;

            primaryRace = "mixed";

            leave;

            end;

    end;

drop i;

run;

********************************************

ballardw
Super User

For this forum we prefer that you start a new thread and then cite a similar previous thread such as this one.

It would be helpful to provide a few example records with examples of different values, how "missing" may be indicated, whether the values a text or numeric codes corresponding to a text value.

The basic rules to me sound like they may be:

If there are no valid race values entered then PrimaryRace is assigned the value of "missing"

If there is only one "valid" (non-missing) race entery then that value is assigned to the PrimaryRace.

If there are EXACTLY two valid race values, one of Asian and one of Filipino then PrimaryRace is assigned as Filipino

Any other combination of 2 or more valid race values assigns the value of "mixed" to PrimaryRace.

Does this summarize your problem correctly?

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 1882 views
  • 3 likes
  • 8 in conversation