Solved: Re: create new columns based on array values

twenty7 · Posted 01-17-2024 01:04 PM

Hi all,

I have a dataset which resembles the table below

id	cd_1	cd_2	cd_3	cd_4	cd_5	cd_6
1234	AB01	AX01	AZ01
1236	AY01	AB01
1239	AF01	AG01	XZ01	AA01	AX01

I want my code to add additional columns based on the values above so it looks something like the below

id	AB01	AX01	AZ01	AY01	FB01	AF01	AG01	XZ01	AA01
1234	1	1	1	0	0	0	0	0	0
1236	1	0	0	1	0	0	0	0	0
1239	0	1	0	0	0	1	1	1	1

I have created an array from columns cd_1-cd_99 but have no idea on which functions I would need to use to determine the value in the array position and then create a new column where the name is the array value and the contents is either 1 or 0

I want to avoid listing out all of the possible values for cd_1-cd_99 as there are around 30 but these will occasionally change

Any help would be greatly appreciated

Thank you

Ksharp · Posted 01-17-2024 09:08 PM

data have;
  infile datalines truncover dsd dlm=' ';
  input (id cd_1 cd_2 cd_3 cd_4 cd_5 cd_6) ($);
  datalines;
1234 AB01 AX01 AZ01   
1236 AY01 AB01    
1239 AF01 AG01 XZ01 AA01 AX01 
;

proc sort data=have out=temp;
by id;
run;
proc transpose data=temp out=temp2;
by id;
var cd_:;
run;
data temp3;
 set temp2(where=(col1 is not missing));
 v=1;
run;
proc transpose data=temp3 out=temp4;
by id;
var v;
id col1;
run;
proc stdize data=temp4(drop=_NAME_) out=want reponly missing=0;
run;

View solution in original post

PaigeMiller · Posted 01-17-2024 02:26 PM

A very similar question was asked here. I am skeptical that this re-arrangement actually improves anything. Can you please tell us what you plan to do next with this data? Most of the time (like almost all of the time), this wide format that you want is harder to work in SAS with than the long format I propose. So really, do tell us what is next.

--
Paige Miller

twenty7 · Posted 01-17-2024 04:08 PM

whilst I do accept that the wider format is more difficult to work with it is needed in this instance as the data is to be exported and shared with non-technical users who need to be able to easily identify which records have a particular code

ballardw · Posted 01-17-2024 06:34 PM

Did you look at the link to the other EXTREMELY similar post that @PaigeMiller posted?

Which is so similar it appears as if this is related to a homework assignment.

twenty7 · Posted 01-18-2024 04:16 AM

I did, however my understanding is that this solution relies on knowing all of the potential values

I ca also confirm this is no homework assignment!

PaigeMiller · Posted 01-18-2024 07:11 AM

@twenty7 wrote:
I did, however my understanding is that this solution relies on knowing all of the potential values

Not knowing all the potential values REALLY REALLY REALLY implies you want a long data set, and then SAS can determine what values exist and act accordingly. But you also said:

@twenty7 wrote:
it is needed in this instance as the data is to be exported and shared with non-technical users who need to be able to easily identify which records have a particular code

If non-technical users need the wide data set, then get a solution from the long data set, and then "transpose" to wide for these non-technical users.

I see @Ksharp has already done that.

--
Paige Miller

Patrick · Posted 01-17-2024 06:45 PM

Below one way to go. Ideally your codes comply with SAS naming conventions for SAS variables.

/* source data */
data have;
  infile datalines truncover dsd dlm=' ';
  input (id cd_1 cd_2 cd_3 cd_4 cd_5 cd_6) ($);
  some_other_var='ABC';
  datalines;
1234 AB01 AX01 AZ01   
1236 AY01 AB01    
1239 AF01 AG01 XZ01 AA01 AX01 
1999 $a#4
;

/* create table with distinct codes from all source variables */
data _null_;
  if _n_=1 then
    do;
      length varname $32;
      dcl hash h1(ordered:'y');
      h1.defineKey('varname');
      h1.defineData('varname');
      h1.defineDone();
    end;
  set have end=last;
  array vars{*} cd_:;
  do i=1 to dim(vars);
    varname=upcase(vars[i]);
    if not missing(varname) then h1.ref();
  end;
  if last then h1.output(dataset:'varnames');
run;

/* prepare data for 
   - creation of macro variable with desired code based variable names
   - informat that maps variable names to position (number) in array defined using above macro variable
*/
data varnames;
  set varnames;
  retain fmtname 'varpos' type 'i';
  start=varname;
  label=_n_;
run;

/* populate macro variable with distinct list of existing codes */
proc sql noprint;
  select cats("'",varname,"'n") into :varlist separated by ' '
  from varnames
  order by label
  ;
quit;

/* create informat to retrieve position (number) of word in variable list */
proc format cntlin=varnames;
run;

/* create want table */
data want;
  set have;
  array src_vars{*} cd_:;
  array trg_vars{*} 3 &varlist;
  do i=1 to dim(trg_vars);
    trg_vars[i]=0;
  end;
  do i=1 to dim(src_vars);
    if missing(src_vars[i]) then continue;
    trg_vars[input(upcase(src_vars[i]),varpos.)] = not missing(src_vars[i]);
  end;
  drop cd_: i;
run;

proc print data=want;
run;

Ksharp · Posted 01-17-2024 09:08 PM

data have;
  infile datalines truncover dsd dlm=' ';
  input (id cd_1 cd_2 cd_3 cd_4 cd_5 cd_6) ($);
  datalines;
1234 AB01 AX01 AZ01   
1236 AY01 AB01    
1239 AF01 AG01 XZ01 AA01 AX01 
;

proc sort data=have out=temp;
by id;
run;
proc transpose data=temp out=temp2;
by id;
var cd_:;
run;
data temp3;
 set temp2(where=(col1 is not missing));
 v=1;
run;
proc transpose data=temp3 out=temp4;
by id;
var v;
id col1;
run;
proc stdize data=temp4(drop=_NAME_) out=want reponly missing=0;
run;

Registration is open

SAS Training: Just a Click Away