I have a table with a structure like:
select cohort, number_Of_students, college, college_desc, gender, gender_desc,
residency, residency_desc, schev_ethnicity_desc, schev_ethnicity_id,
academic_origin, academic_origin_id,
alive_1, ret_1, deg_cum_1,
alive_2, ret_2, deg_cum_2,
alive_3, ret_3, deg_cum_3,
…
retention_type
from Graduation_Retention_Transfer
That I would like to transpose to:
select cohort, number_Of_students, college, college_desc, gender, gender_desc,
residency, residency_desc, schev_ethnicity_desc, schev_ethnicity_id,
academic_origin, academic_origin_id, year, alive, ret, deg_cum
Any recommendations on how I can do this in enterprise guide using SAS 9.4?
A few more questions about the input data.
Your example data shows RET not Alive as not always with Deg_cum. Is that the actual case?
Do you ever have records with more than Alive_5 and Deg_cum_5?
Are any of the following scenarios NEVER going to occur in your data? Or if they do what would the output look like
DATA inputSet;
INFILE DATALINES DLM='#';
INPUT cohort number_Of_students college $ alive_1 ret_1 deg_cum_1
alive_2 ret_2 deg_cum_2
alive_3 ret_3 deg_cum_3
alive_4 deg_cum_4
alive_5 deg_cum_5;
DATALINES;
200510#12#Humanities#12#12#0#12#11#0#12#10#0#11#1#11#8
200511#12#Humanities#12#12#0#12#11#0#12#10#0#11#1##
200512#12#Humanities#12#12#0#12#11#0#######
200513#12#Humanities#12#12#0#12#11#0####11#1#11#8
200514#12#Humanities#12#12#0#12#11#0#12#10#0#11#1#11#8
200515#12#Humanities#12#12#0####12#10#0#11#1#11#8
200516#12#Humanities#12#12#0####12##0#11#1#11#8
;
If there are always 5 and RET is only defined for 3 then this might work:
Data trans;
set have;
Array a alive_1 - alive_5;
array r ret_1 - ret_3;
array d deg_cum_1 - deg_cum_5;
/* assumes same number of variables of each type*/
do i = 1 to dim(a);
alive= a[i];
if i le 3 then ret = r[i]; else ret =0;
deg_cum= d[i];
output;
end;
drop alive_: ret_; deg_cum_: i;
run;
Code would look something like this
Data trans;
set have;
Array a alive_1 - alive_n;
array r ret_1 - ret_n;
array d deg_cum_1 - deg_cum_n;
/* assumes same number of variables of each type*/
do i = 1 to dim(alive);
alive= a[i];
ret = r[i];
deg_cum= d[i];
output;
end;
drop alive_: ret_; deg_cum_: i;
run;
I got the error
ERROR: Alphabetic prefixes for enumerated variables (alive_1-alive_n) are different.
This might be because alive only goes to 3 and the others go to 8.
Note that I did point out that it assumed the same number of each of the 3 variables as that makes the most sense to transpose.
When they are not the same, what is the relationship between Alive , ret and deg_cum? Since at least 5 values of ret and deg_cum do not have a one-to-one with alive, which value of alive is associated with which ret and deg_cum? I hope there is a pattern of some sort. Possibly that the the 3rd Alive goes with the 4 through 8 ret and deg_cum.
Show a few rows of in the input and what you would like the output to look like.
Below is sample data
DATA inputSet;
INFILE DATALINES DLM='#';
INPUT cohort number_Of_students college $ alive_1 ret_1 deg_cum_1
alive_2 ret_2 deg_cum_2
alive_3 ret_3 deg_cum_3
alive_4 deg_cum_4
alive_5 deg_cum_5;
DATALINES;
200510#12#Humanities#12#12#0#12#11#0#12#10#0#11#1#11#8
;
proc print data=inputSet;
Title 'input';
run;
DATA resultSet;
INFILE DATALINES DLM='#';
INPUT cohort number_Of_students college $ year1 alive ret deg_cum;
DATALINES;
200510#12#Humanities#2005#12#12#0
200510#12#Humanities#2006#12#11#0
200510#12#Humanities#2007#12#10#0
200510#12#Humanities#2008#11#0#1
200510#12#Humanities#2009#11#0#8
;
PROC PRINT DATA=resultSet;
TITLE 'result';
RUN;
I mixed up that whole alive thing.. being alive is important.
A few more questions about the input data.
Your example data shows RET not Alive as not always with Deg_cum. Is that the actual case?
Do you ever have records with more than Alive_5 and Deg_cum_5?
Are any of the following scenarios NEVER going to occur in your data? Or if they do what would the output look like
DATA inputSet;
INFILE DATALINES DLM='#';
INPUT cohort number_Of_students college $ alive_1 ret_1 deg_cum_1
alive_2 ret_2 deg_cum_2
alive_3 ret_3 deg_cum_3
alive_4 deg_cum_4
alive_5 deg_cum_5;
DATALINES;
200510#12#Humanities#12#12#0#12#11#0#12#10#0#11#1#11#8
200511#12#Humanities#12#12#0#12#11#0#12#10#0#11#1##
200512#12#Humanities#12#12#0#12#11#0#######
200513#12#Humanities#12#12#0#12#11#0####11#1#11#8
200514#12#Humanities#12#12#0#12#11#0#12#10#0#11#1#11#8
200515#12#Humanities#12#12#0####12#10#0#11#1#11#8
200516#12#Humanities#12#12#0####12##0#11#1#11#8
;
If there are always 5 and RET is only defined for 3 then this might work:
Data trans;
set have;
Array a alive_1 - alive_5;
array r ret_1 - ret_3;
array d deg_cum_1 - deg_cum_5;
/* assumes same number of variables of each type*/
do i = 1 to dim(a);
alive= a[i];
if i le 3 then ret = r[i]; else ret =0;
deg_cum= d[i];
output;
end;
drop alive_: ret_; deg_cum_: i;
run;
Ret always ends at 3 and the other two variables always keep going to the same number as all other rows.
When I run the example I get:
ERROR: The DIM, LBOUND, and HBOUND functions require an array name for the first argument.
Change dim(alive) to dim(a).
Ballard, that worked thanks for your help on this. I was having a really hard time with it.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.