The data set is real life file with just the column names modified for simplicity.
Yesterday or earlier I discovered I can run my job using the array dimension 1:25 and do i=1 to 25 without any modification to my data structure, it will run even though the real variable is not consecutive and will produce all those columns, when the subgroups have no value their output columns will have all missing values. All I need to do is in the keep, retain line, I list only those columns that have a value on subgroups and where they should have data (some missing is considered to have data).
Previous my program won't run because I forget to define the output columns to be character ones.
DATA want_wide;
SET have_long;
BY sch_id;
KEEP sch_id
EM_ELA_status1 EM_ELA_status4-EM_ELA_status9 EM_ELA_status13-EM_ELA_status14 EM_ELA_status25
EM_Math_status1 EM_Math_status4-EM_Math_status9 EM_Math_status13-EM_Math_status14 EM_Math_status25
HS_ELA_status1 HS_ELA_status4-HS_ELA_status9 HS_ELA_status13-HS_ELA_status14 HS_ELA_status25
HS_Math_status1 HS_Math_status4-HS_Math_status9 HS_Math_status13-HS_Math_status14 HS_Math_status25;
RETAIN sch_id
EM_ELA_status1 EM_ELA_status4-EM_ELA_status9 EM_ELA_status13-EM_ELA_status14 EM_ELA_status25
EM_Math_status1 EM_Math_status4-EM_Math_status9 EM_Math_status13-EM_Math_status14 EM_Math_status25
HS_ELA_status1 HS_ELA_status4-HS_ELA_status9 HS_ELA_status13-HS_ELA_status14 HS_ELA_status25
HS_Math_status1 HS_Math_status4-HS_Math_status9 HS_Math_status13-HS_Math_status14 HS_Math_status25;
ARRAY emela (1:25) $ EM_ELA_status1 - EM_ELA_status25 ;
ARRAY emmath (1:25) $ EM_Math_status1 - EM_Math_status25 ;
ARRAY hsela (1:25) $ HS_ELA_status1 - HS_ELA_status25 ;
ARRAY hsmath (1:25) $ HS_Math_status1 - HS_Math_status25 ;
IF first.sch_id THEN DO;
DO i = 1 to 25 ;
emela( i ) = "" ;
emmath (i) = "";
hsela( i ) = "" ;
hsmath (i) = "";
END;
END;
emela (groupid) = EM_ELA_status;
emmath (groupid) = EM_Math_status;
hsela (groupid) = HS_ELA_status;
hsmath (groupid) = HS_Math_status;
IF last.sch_id THEN OUTPUT ;
RUN; *WORKS WELL;
*advantages: have less codes and programs than using transpose approach (code);
*shortcoming: output columns name have subgroup's numeric values instead of subgroup names-this is required by business convention, therefore I have to do further work to modify those 40 column names ending with number (1,4,5,6,7,8,9,13,14,25) into group names such as "All_students, SWD, Native_Amer, etc. after the main part of EM_ELA_status;
Make your data taller (longer) first
proc transpose data=have_long out=long_long ;
by sch_id groupid;
var EM_ELA_status EM_Math_status HS_ELA_status HS_Math_status;
run;
and then run PROC TRANSPOSE.
proc transpose data=long_long out=want_wide(drop=_name_) ;
by sch_id;
var col1;
id _name_ groupid;
run;
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.