I have a table with a structure like:
select cohort, number_Of_students, college, college_desc, gender, gender_desc,
residency, residency_desc, schev_ethnicity_desc, schev_ethnicity_id,
academic_origin, academic_origin_id,
alive_1, ret_1, deg_cum_1,
alive_2, ret_2, deg_cum_2,
alive_3, ret_3, deg_cum_3,
…
retention_type
from Graduation_Retention_Transfer
That I would like to transpose to:
select cohort, number_Of_students, college, college_desc, gender, gender_desc,
residency, residency_desc, schev_ethnicity_desc, schev_ethnicity_id,
academic_origin, academic_origin_id, year, alive, ret, deg_cum
Any recommendations on how I can do this in enterprise guide using SAS 9.4?
A few more questions about the input data.
Your example data shows RET not Alive as not always with Deg_cum. Is that the actual case?
Do you ever have records with more than Alive_5 and Deg_cum_5?
Are any of the following scenarios NEVER going to occur in your data? Or if they do what would the output look like
DATA inputSet;
INFILE DATALINES DLM='#';
INPUT cohort number_Of_students college $ alive_1 ret_1 deg_cum_1
alive_2 ret_2 deg_cum_2
alive_3 ret_3 deg_cum_3
alive_4 deg_cum_4
alive_5 deg_cum_5;
DATALINES;
200510#12#Humanities#12#12#0#12#11#0#12#10#0#11#1#11#8
200511#12#Humanities#12#12#0#12#11#0#12#10#0#11#1##
200512#12#Humanities#12#12#0#12#11#0#######
200513#12#Humanities#12#12#0#12#11#0####11#1#11#8
200514#12#Humanities#12#12#0#12#11#0#12#10#0#11#1#11#8
200515#12#Humanities#12#12#0####12#10#0#11#1#11#8
200516#12#Humanities#12#12#0####12##0#11#1#11#8
;
If there are always 5 and RET is only defined for 3 then this might work:
Data trans;
set have;
Array a alive_1 - alive_5;
array r ret_1 - ret_3;
array d deg_cum_1 - deg_cum_5;
/* assumes same number of variables of each type*/
do i = 1 to dim(a);
alive= a[i];
if i le 3 then ret = r[i]; else ret =0;
deg_cum= d[i];
output;
end;
drop alive_: ret_; deg_cum_: i;
run;
Code would look something like this
Data trans;
set have;
Array a alive_1 - alive_n;
array r ret_1 - ret_n;
array d deg_cum_1 - deg_cum_n;
/* assumes same number of variables of each type*/
do i = 1 to dim(alive);
alive= a[i];
ret = r[i];
deg_cum= d[i];
output;
end;
drop alive_: ret_; deg_cum_: i;
run;
I got the error
ERROR: Alphabetic prefixes for enumerated variables (alive_1-alive_n) are different.
This might be because alive only goes to 3 and the others go to 8.
Note that I did point out that it assumed the same number of each of the 3 variables as that makes the most sense to transpose.
When they are not the same, what is the relationship between Alive , ret and deg_cum? Since at least 5 values of ret and deg_cum do not have a one-to-one with alive, which value of alive is associated with which ret and deg_cum? I hope there is a pattern of some sort. Possibly that the the 3rd Alive goes with the 4 through 8 ret and deg_cum.
Show a few rows of in the input and what you would like the output to look like.
Below is sample data
DATA inputSet;
INFILE DATALINES DLM='#';
INPUT cohort number_Of_students college $ alive_1 ret_1 deg_cum_1
alive_2 ret_2 deg_cum_2
alive_3 ret_3 deg_cum_3
alive_4 deg_cum_4
alive_5 deg_cum_5;
DATALINES;
200510#12#Humanities#12#12#0#12#11#0#12#10#0#11#1#11#8
;
proc print data=inputSet;
Title 'input';
run;
DATA resultSet;
INFILE DATALINES DLM='#';
INPUT cohort number_Of_students college $ year1 alive ret deg_cum;
DATALINES;
200510#12#Humanities#2005#12#12#0
200510#12#Humanities#2006#12#11#0
200510#12#Humanities#2007#12#10#0
200510#12#Humanities#2008#11#0#1
200510#12#Humanities#2009#11#0#8
;
PROC PRINT DATA=resultSet;
TITLE 'result';
RUN;
I mixed up that whole alive thing.. being alive is important.
A few more questions about the input data.
Your example data shows RET not Alive as not always with Deg_cum. Is that the actual case?
Do you ever have records with more than Alive_5 and Deg_cum_5?
Are any of the following scenarios NEVER going to occur in your data? Or if they do what would the output look like
DATA inputSet;
INFILE DATALINES DLM='#';
INPUT cohort number_Of_students college $ alive_1 ret_1 deg_cum_1
alive_2 ret_2 deg_cum_2
alive_3 ret_3 deg_cum_3
alive_4 deg_cum_4
alive_5 deg_cum_5;
DATALINES;
200510#12#Humanities#12#12#0#12#11#0#12#10#0#11#1#11#8
200511#12#Humanities#12#12#0#12#11#0#12#10#0#11#1##
200512#12#Humanities#12#12#0#12#11#0#######
200513#12#Humanities#12#12#0#12#11#0####11#1#11#8
200514#12#Humanities#12#12#0#12#11#0#12#10#0#11#1#11#8
200515#12#Humanities#12#12#0####12#10#0#11#1#11#8
200516#12#Humanities#12#12#0####12##0#11#1#11#8
;
If there are always 5 and RET is only defined for 3 then this might work:
Data trans;
set have;
Array a alive_1 - alive_5;
array r ret_1 - ret_3;
array d deg_cum_1 - deg_cum_5;
/* assumes same number of variables of each type*/
do i = 1 to dim(a);
alive= a[i];
if i le 3 then ret = r[i]; else ret =0;
deg_cum= d[i];
output;
end;
drop alive_: ret_; deg_cum_: i;
run;
Ret always ends at 3 and the other two variables always keep going to the same number as all other rows.
When I run the example I get:
ERROR: The DIM, LBOUND, and HBOUND functions require an array name for the first argument.
Change dim(alive) to dim(a).
Ballard, that worked thanks for your help on this. I was having a really hard time with it.
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.