Hi all,
I have a dataset which resembles the table below
id | cd_1 | cd_2 | cd_3 | cd_4 | cd_5 | cd_6 |
1234 | AB01 | AX01 | AZ01 | |||
1236 | AY01 | AB01 | ||||
1239 | AF01 | AG01 | XZ01 | AA01 | AX01 |
I want my code to add additional columns based on the values above so it looks something like the below
id | AB01 | AX01 | AZ01 | AY01 | FB01 | AF01 | AG01 | XZ01 | AA01 |
1234 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
1236 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
1239 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 1 |
I have created an array from columns cd_1-cd_99 but have no idea on which functions I would need to use to determine the value in the array position and then create a new column where the name is the array value and the contents is either 1 or 0
I want to avoid listing out all of the possible values for cd_1-cd_99 as there are around 30 but these will occasionally change
Any help would be greatly appreciated
Thank you
data have;
infile datalines truncover dsd dlm=' ';
input (id cd_1 cd_2 cd_3 cd_4 cd_5 cd_6) ($);
datalines;
1234 AB01 AX01 AZ01
1236 AY01 AB01
1239 AF01 AG01 XZ01 AA01 AX01
;
proc sort data=have out=temp;
by id;
run;
proc transpose data=temp out=temp2;
by id;
var cd_:;
run;
data temp3;
set temp2(where=(col1 is not missing));
v=1;
run;
proc transpose data=temp3 out=temp4;
by id;
var v;
id col1;
run;
proc stdize data=temp4(drop=_NAME_) out=want reponly missing=0;
run;
A very similar question was asked here. I am skeptical that this re-arrangement actually improves anything. Can you please tell us what you plan to do next with this data? Most of the time (like almost all of the time), this wide format that you want is harder to work in SAS with than the long format I propose. So really, do tell us what is next.
whilst I do accept that the wider format is more difficult to work with it is needed in this instance as the data is to be exported and shared with non-technical users who need to be able to easily identify which records have a particular code
Did you look at the link to the other EXTREMELY similar post that @PaigeMiller posted?
Which is so similar it appears as if this is related to a homework assignment.
@twenty7 wrote:
I did, however my understanding is that this solution relies on knowing all of the potential values
Not knowing all the potential values REALLY REALLY REALLY implies you want a long data set, and then SAS can determine what values exist and act accordingly. But you also said:
@twenty7 wrote:
it is needed in this instance as the data is to be exported and shared with non-technical users who need to be able to easily identify which records have a particular code
If non-technical users need the wide data set, then get a solution from the long data set, and then "transpose" to wide for these non-technical users.
I see @Ksharp has already done that.
Below one way to go. Ideally your codes comply with SAS naming conventions for SAS variables.
/* source data */
data have;
infile datalines truncover dsd dlm=' ';
input (id cd_1 cd_2 cd_3 cd_4 cd_5 cd_6) ($);
some_other_var='ABC';
datalines;
1234 AB01 AX01 AZ01
1236 AY01 AB01
1239 AF01 AG01 XZ01 AA01 AX01
1999 $a#4
;
/* create table with distinct codes from all source variables */
data _null_;
if _n_=1 then
do;
length varname $32;
dcl hash h1(ordered:'y');
h1.defineKey('varname');
h1.defineData('varname');
h1.defineDone();
end;
set have end=last;
array vars{*} cd_:;
do i=1 to dim(vars);
varname=upcase(vars[i]);
if not missing(varname) then h1.ref();
end;
if last then h1.output(dataset:'varnames');
run;
/* prepare data for
- creation of macro variable with desired code based variable names
- informat that maps variable names to position (number) in array defined using above macro variable
*/
data varnames;
set varnames;
retain fmtname 'varpos' type 'i';
start=varname;
label=_n_;
run;
/* populate macro variable with distinct list of existing codes */
proc sql noprint;
select cats("'",varname,"'n") into :varlist separated by ' '
from varnames
order by label
;
quit;
/* create informat to retrieve position (number) of word in variable list */
proc format cntlin=varnames;
run;
/* create want table */
data want;
set have;
array src_vars{*} cd_:;
array trg_vars{*} 3 &varlist;
do i=1 to dim(trg_vars);
trg_vars[i]=0;
end;
do i=1 to dim(src_vars);
if missing(src_vars[i]) then continue;
trg_vars[input(upcase(src_vars[i]),varpos.)] = not missing(src_vars[i]);
end;
drop cd_: i;
run;
proc print data=want;
run;
data have;
infile datalines truncover dsd dlm=' ';
input (id cd_1 cd_2 cd_3 cd_4 cd_5 cd_6) ($);
datalines;
1234 AB01 AX01 AZ01
1236 AY01 AB01
1239 AF01 AG01 XZ01 AA01 AX01
;
proc sort data=have out=temp;
by id;
run;
proc transpose data=temp out=temp2;
by id;
var cd_:;
run;
data temp3;
set temp2(where=(col1 is not missing));
v=1;
run;
proc transpose data=temp3 out=temp4;
by id;
var v;
id col1;
run;
proc stdize data=temp4(drop=_NAME_) out=want reponly missing=0;
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.