Hi folks,
I would greatly appreciate help with the following task...
I have two longitudinal datasets containing data from one school district over the years 2002-2010. One of the datasets contains student-level data for approximately 130,000 students (each with a unique ID). The other dataset contains school-level data for approximately 600 schools (each with a unique school code). I would like to combine the two datasets.
What I HAVE:
Data = StdData (see Table 1)
The dataset with student-level data includes columns identifying which schools a given student attended throughout the years. For example, student 100145 attended school 8543 from 2002 – 2007, and school 2455 from 2008 – 2010.
Data = SchData (see Table 2)
The dataset with school-level data includes 7 variables per year (2002-2010), for a total of 63 variables:
PctFRlunch_02
AvgParEd_02
PctHisp_02
PctTotELs_02
PctELsSpan_02
PctELsPLS_02
PctELsL1_02
PctFRlunch_03
AvgParEd_03
PctHisp_03
PctTotELs_03
PctELsSpan_03
PctELsPLS_03
PctELsL1_03
(Etc. for each year through 2010, for a total of 63 variables)
What I WANT:
Data = want (see Table 3)
When the datasets are combined, each observation will contain data for one student, and year-specific data for the schools which that student attended each year.
---------------------------------------------------------------------
Table 1
Data = StdData
Student-level data - which school(s) each student attended each year (There are missing values when there are missing data, or when a given student is not enrolled in this school district.)
StudentID | SchCode02 | SchCode03 | SchCode04 | SchCode05 | SchCode06 | SchCode07 | SchCode08 | SchCode09 | SchCode10 |
100145 | 8543 | 8543 | 8543 | 8543 | 8543 | 8543 | 2455 | 2455 | 2455 |
100362 | 8543 | 8543 | 9652 | 3402 | 3402 | 3402 | 4656 | 4656 | . |
100595 | . | . | 3402 | 3402 | 3402 | 3402 | 6548 | 6548 | . |
100788 | 2104 | 2104 | 2104 | 2104 | 2104 | . | . | . | . |
100900 | . | . | . | . | . | . | . | 7502 | 7502 |
102044 | 3402 | 3402 | 3402 | 3402 | 3402 | 3402 | 3402 | 3402 | 3402 |
105305 | . | 4645 | 4645 | 4645 | 4645 | 4645 | 4645 | 4645 | 6548 |
108643 | . | . | . | 8543 | 8543 | 8543 | 8543 | 8543 | 8543 |
109224 | 3402 | 3402 | 2455 | 2455 | 2455 | . | . | . | . |
Etc. |
Table 2
Data = SchData
Time-varying school-level variables (There are missing values when there are missing data for a given school-year, or if a given school did not exist in a given year.)
SchCode | PctFRlunch_02 | AvgParEduc_02 | PctHisp_02 | PctTotELs_02 | PctELsSpan_02 | PctELsPLS_02 | PctELsL1_02 | PctFRlunch_03 | AvgParEduc_03 | PctHisp_03 | PctTotELs_03 | PctELsSpan_03 | PctELsPLS_03 | PctELsL1_03 | PctFRlunch_04 | AvgParEduc_04 | PctHisp_04 | PctTotELs_04 | PctELsSpan_04 | PctELsPLS_04 | PctELsL1_04 | Etc. |
2104 | 65 | 3.07 | 6.4909 | 2.2312 | 100.0000 | 9.09 | .00 | 100 | 2 | 93.1727 | 57.4297 | 96.5035 | 32.87 | 6.99 | 100 | 1.75 | 86.2191 | 57.9505 | 93.2927 | 28.66 | 28.66 | |
2455 | 91 | 2.25 | 89.3899 | 41.3793 | 97.4359 | 97.44 | .00 | 74 | 2 | 7.7821 | 2.9183 | 86.6667 | 13.33 | .00 | 77 | 2.68 | 6.5737 | 3.5857 | 77.7778 | .00 | .00 | |
3402 | 100 | 1.65 | 99.2293 | 67.2447 | 100.0000 | 61.03 | 19.77 | 92 | 2 | 88.9175 | 44.3299 | 95.3488 | 58.72 | .00 | 91 | 2.32 | 89.6825 | 45.7672 | 95.9538 | 57.23 | 57.23 | |
4645 | 51 | 3.09 | 37.9189 | 19.2240 | 76.1468 | 67.89 | .00 | 100 | 2 | 98.8428 | 63.4523 | 99.8480 | 51.98 | 19.45 | 100 | 1.63 | 98.4451 | 65.1118 | 100.0000 | 68.66 | 68.66 | |
4656 | 99 | 2.30 | 95.4897 | 68.4278 | 96.7985 | 79.47 | .00 | 51 | 3 | 39.3881 | 23.1358 | 80.1653 | 41.32 | .00 | 57 | 3.17 | 35.4970 | 22.1095 | 76.1468 | 26.61 | 26.61 | |
6548 | 100 | 1.99 | 78.9924 | 62.8327 | 99.2436 | 90.17 | .00 | 99 | 2 | 95.5966 | 67.3295 | 97.2574 | 63.71 | .00 | 98 | 2.28 | 96.6480 | 66.4804 | 97.2689 | 24.16 | 24.16 | |
7502 | 100 | 1.79 | 91.5604 | 70.6575 | 96.8056 | 90.69 | .00 | 100 | 2 | 82.3588 | 66.3033 | 99.2526 | 90.28 | .00 | 100 | 2.34 | 83.2632 | 70.5263 | 99.7015 | 8.66 | 8.66 | |
8543 | 100 | 1.73 | 92.8000 | 76.2909 | 100.0000 | 65.11 | 29.55 | 100 | 2 | 90.2554 | 64.3330 | 96.1765 | 85.29 | .00 | 96 | 1.84 | 90.6907 | 62.4625 | 95.6731 | 70.35 | 70.35 | |
9652 | 84 | 2.60 | 74.4511 | 44.3114 | 85.5856 | 77.93 | .00 | 100 | 2 | 91.5340 | 80.3184 | 100.0000 | 74.05 | 21.98 | 100 | 1.75 | 92.8318 | 81.1214 | 100.0000 | 61.85 | 61.85 | |
Etc. |
Table 3
Data = want
StudentID | SchCode02 | SchCode03 | SchCode04 | SchCode05 | SchCode06 | SchCode07 | SchCode08 | SchCode09 | SchCode10 | PctFRlunch_02 | AvgParEduc_02 | PctHisp_02 | PctTotELs_02 | PctELsSpan_02 | PctELsPLS_02 | PctELsL1_02 | PctFRlunch_03 | AvgParEduc_03 | PctHisp_03 | PctTotELs_03 | PctELsSpan_03 | PctELsPLS_03 | PctELsL1_03 | PctFRlunch_04 | AvgParEduc_04 | PctHisp_04 | PctTotELs_04 | PctELsSpan_04 | PctELsPLS_04 | PctELsL1_04 | Etc. |
100145 | 8543 | 8543 | 8543 | 8543 | 8543 | 8543 | 2455 | 2455 | 2455 | 100 | 1.73 | 92.8000 | 76.2909 | 100.0000 | 65.11 | 29.55 | 100 | 2 | 90.2554 | 64.3330 | 96.1765 | 85.29 | .00 | 96 | 1.84 | 90.6907 | 62.4625 | 95.6731 | 70.35 | 70.35 | |
100362 | 8543 | 8543 | 9652 | 3402 | 3402 | 3402 | 4656 | 4656 | . | 100 | 1.73 | 92.8000 | 76.2909 | 100.0000 | 65.11 | 29.55 | 100 | 2 | 90.2554 | 64.3330 | 96.1765 | 85.29 | .00 | 100 | 1.75 | 92.8318 | 81.1214 | 100.0000 | 61.85 | 61.85 | |
100595 | . | . | 3402 | 3402 | 3402 | 3402 | 6548 | 6548 | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | 91 | 2.32 | 89.6825 | 45.7672 | 95.9538 | 57.23 | 57.23 | |
100788 | 2104 | 2104 | 2104 | 2104 | 2104 | . | . | . | . | 65 | 3.07 | 6.4909 | 2.2312 | 100.0000 | 9.09 | .00 | 100 | 2 | 93.1727 | 57.4297 | 96.5035 | 32.87 | 6.99 | 100 | 1.75 | 86.2191 | 57.9505 | 93.2927 | 28.66 | 28.66 | |
100900 | . | . | . | . | . | . | . | 7502 | 7502 | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | |
102044 | 3402 | 3402 | 3402 | 3402 | 3402 | 3402 | 3402 | 3402 | 3402 | 100 | 1.65 | 99.2293 | 67.2447 | 100.0000 | 61.03 | 19.77 | 92 | 2 | 88.9175 | 44.3299 | 95.3488 | 58.72 | .00 | 91 | 2.32 | 89.6825 | 45.7672 | 95.9538 | 57.23 | 57.23 | |
105305 | . | 4645 | 4645 | 4645 | 4645 | 4645 | 4645 | 4645 | 6548 | . | . | . | . | . | . | . | 100 | 2 | 98.8428 | 63.4523 | 99.8480 | 51.98 | 19.45 | 100 | 1.63 | 98.4451 | 65.1118 | 100.0000 | 68.66 | 68.66 | |
108643 | . | . | . | 8543 | 8543 | 8543 | 8543 | 8543 | 8543 | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | |
109224 | 3402 | 3402 | 2455 | 2455 | 2455 | . | . | . | . | 100 | 1.65 | 99.2293 | 67.2447 | 100.0000 | 61.03 | 19.77 | 92 | 2 | 88.9175 | 44.3299 | 95.3488 | 58.72 | .00 | 77 | 2.68 | 6.5737 | 3.5857 | 77.7778 | .00 | .00 | |
Etc. |
I think that Jagadish simply didn't carry the idea out far enough. The following uses a macro that I and some others just finished developing and will be presenting at some upcoming SAS conferences.
The bulk of the work was done in making the file tall. The macro is simply a way of doing the transpose in one step. The only difference between its result and your desired result is the order of the output variables. You could use proc transpose do make the file wide, but that would require a number of steps.
Let me know if this works for you, as I'm just now starting to try to get some feedback regarding the macro.
proc import datafile="c:\art\location.xls" out=StdData
dbms=excel replace;
getnames=yes;
sheet=sheet1;
run;
proc import datafile="c:\art\location.xls" out=SchData
dbms=excel replace;
getnames=yes;
sheet=sheet2;
run;
data std_vertical;
set StdData;
array sch(*) schcode02-schcode10;
do i = 1 to 9;
schcode=sch(i);
year=2001+i;
if not missing(schcode) then output;
end;
drop schcode02-schcode10 i;
run;
proc sort data=std_vertical;
by schcode year;
run;
data sch_vertical (keep=schcode
year
pctFRlunch
AvgParEduc
PctHisp
PctTotELs
PctELSpan
PctELsPLS
PctELsL1);
set SchData;
array pctFRlunc(*) pctFRlunch_02-pctFRlunch_04; /*-pctFRlunch_10*/;
array AvgParEdu(*) AvgParEduc_02-AvgParEduc_04; /*-AvgParEduc_10*/;
array PctHis(*) PctHisp_02-PctHisp_04; /*-PctHisp_10*/;
array PctTotEL(*) PctTotELs_02-PctTotELs_04; /*-PctTotELs_10*/;
array PctELSpa(*) PctELSpan_02-PctELSpan_04; /*-PctELSpan_10*/;
array PctELsPL(*) PctELsPLS_02-PctELsPLS_04; /*-PctELsPLS_10*/;
array PctELsL(*) PctELsL1_02-PctELsL1_04; /*-PctELsL1_10*/;
do i = 1 to 3; /*9*/
pctFRlunch=pctFRlunc(i);
AvgParEduc=AvgParEdu(i);
PctHisp=PctHis(i);
PctTotELs=PctTotEL(i);
PctELSpan=PctELSpa(i);
PctELsPLS=PctELsPL(i);
PctELsL1=PctELsL(i);
year=2001+i;
output;
end;
run;
proc sort data=sch_vertical;
by schcode year;
run;
data student_school;
merge sch_vertical (in=a) Std_vertical (in=b);
by schcode year;
if a and b;
run;
proc sort data=student_school;
by studentID year;
run;
%macro transpose(libname_in=work,
libname_out=work,
filename_in =have,
filename_out=want,
by_variable=id,
vars=ind1-ind4,
prefix=,
idlabel=date,
idlabel_format=yymon7.,
idlabel_prefix=,
guessingrows=1000);
data _temp;
set &libname_in..&filename_in. (obs=1 keep=&vars.);
run;
%let vars_char="";
%let varlist_char="";
%let vars_num="";
%let varlist_num="";
proc sql noprint;
select name
into :vars_char separated by " "
from dictionary.columns
where libname="WORK" and
memname="_TEMP" and
type="char"
;
select name
into :vars_num separated by " "
from dictionary.columns
where libname="WORK" and
memname="_TEMP" and
type="num"
;
%if &vars_char ne "" %then %do;
select distinct
%do i=1 %to %sysfunc(countw("&vars_char."));
%if &i. lt %sysfunc(countw("&vars_char.")) %then %do;
" "||"&prefix."||strip(scan("&vars_char.",&i.))||"_"||
"&idlabel_prefix."||
put(&idlabel.,&idlabel_format)||
%end;
%else %do;
" "||"&prefix."||strip(scan("&vars_char.",&i.))||"_"||
"&idlabel_prefix."||
put(&idlabel.,&idlabel_format),
%end;
%end;
&idlabel.
into :varlist_char separated by " ",
:junk
from &libname_in..&filename_in. (obs=&guessingrows.)
order by &idlabel.
;
%let num_charvars=&sqlobs.;
%end;
%if &vars_num ne "" %then %do;
select distinct
%do i=1 %to %sysfunc(countw("&vars_num."));
%if &i. lt %sysfunc(countw("&vars_num.")) %then %do;
" "||"&prefix."||strip(scan("&vars_num.",&i.))||"_"||
"&idlabel_prefix."||
put(&idlabel.,&idlabel_format)||
%end;
%else %do;
" "||"&prefix."||strip(scan("&vars_num.",&i.))||"_"||
"&idlabel_prefix."||
put(&idlabel.,&idlabel_format),
%end;
%end;
&idlabel.
into :varlist_num separated by " ",
:junk
from &libname_in..&filename_in. (obs=&guessingrows.)
order by &idlabel.
;
%let num_numvars=&sqlobs.;
%end;
create table _for_format as
select distinct &idlabel. as start
from &libname_in..&filename_in. (obs=&guessingrows.)
order by &idlabel.
;
%let num_numlabels=&sqlobs.;
quit;
data _for_format;
set _for_format;
retain fmtname "labelfmt" type "N";
label=_n_-1;
run;
proc format cntlin = _for_format;
run ;
filename sascode temp;
data _null_;
file sascode;
length var $32;
put "data &libname_out..&filename_out.;";
put " set &libname_in..&filename_in.;";
put " by &by_variable.;";
%if &vars_char. ne "" %then %do;
put " array want_char(*) $";
%do i=1 %to %eval(&num_numlabels.*%sysfunc(countw("&vars_char.")));
var=scan("&varlist_char.",&i.);
put @7 var;
%end;
put @5 ";";
put " array have_char(*) $ &vars_char.;";
put " retain want_char;";
put " if first.&by_variable. then call missing(of want_char(*));";
put " _nchar=put(&idlabel.,labelfmt.)*dim(have_char);";
put " do _i=1 to dim(have_char);";
put " want_char(_nchar+_i)=have_char(_i);";
put " end;";
%end;
%if &vars_num. ne "" %then %do;
put " array want_num(*) ";
%do i=1 %to %eval(&num_numlabels.*%sysfunc(countw("&vars_num.")));
var=scan("&varlist_num.",&i.);
put @7 var;
%end;
put @5 ";";
put " array have_num(*) &vars_num.;";
put " retain want_num;";
put " if first.&by_variable. then call missing(of want_num(*));";
put " _nnum=put(&idlabel.,labelfmt.)*dim(have_num);";
put " do _i=1 to dim(have_num);";
put " want_num(_nnum+_i)=have_num(_i);";
put " end;";
%end;
put " drop &idlabel. _: &vars.;";
put " if last.&by_variable. then output;";
put "run;";
run;
%include sascode;
%mend transpose;
options NOQUOTELENMAX;
%transpose(filename_in =student_school,
by_variable=StudentId,
vars=SchCode pctFRlunch--PctELsL1,
idlabel=year,
idlabel_format=4.,
guessingrows=1000)
Hi,
Please use the below code to get the output you desire. I have given the comments which code does what. Hope it will help you to understand the code.
Please try and let me know if it works.
%*------------------------------------------------------------*;
%* to import the data from xls, incase data is
is in the excel format*;
%*------------------------------------------------------------*;
proc import datafile="location.xls" out=StdData
dbms=excel replace;
getnames=yes;
sheet=sheet1;
run;
proc import datafile="location.xls" out=SchData
dbms=excel replace;
getnames=yes;
sheet=sheet2;
run;
%*------------------------------------------------------------*;
%* to convert the horizontal student data to vertical *;
%*------------------------------------------------------------*;
data vertical;
set StdData;
array sch(9) schcode02-schcode10;
do i = 1 to 9;
schcode=sch(i);
output;
end;
drop schcode02-schcode10 i;
run;
%*------------------------------------------------------------*;
%* to remove the duplicate records from vertical*;
%*------------------------------------------------------------*;
proc sort data=vertical nodup;
by schcode;
run;
proc sort data=SchData;
by schcode;
run;
%*------------------------------------------------------------*;
%* to get the records of school from schdata *;
%*------------------------------------------------------------*;
data student_school(rename=(schcode=schcode02));
merge vertical(in=a) SchData(in=b);
by schcode;
if a;
run;
%*------------------------------------------------------------*;
%* now we have the studentid with specific school codes *;
%*------------------------------------------------------------*;
proc sort data=StdData out=StdData;
by studentid schcode02;
run;
proc sort data=student_school out=student_school_;
by studentid schcode02;
run;
%*------------------------------------------------------------*;
%* to get the final dataset *;
%*------------------------------------------------------------*;
data want;
merge StdData(in=a) student_school_(in=b);
by studentid schcode02;
if a;
run;
%*------------------------------------------------------------*;
%* end *;
%*------------------------------------------------------------*;
Thanks,
Jagadish
Thanks very much for your response. I appreciate it. There was a problem with the code. Everything ran without error; however, the resulting file is incorrect. The school-level data is incorrect. In the final dataset, there are a lot of missing data in places where the data are not actually missing (in the original file). And the school-level data that is there, is incorrect.
I think at least one problem is with the vertical step and the subsequent elimination of duplicates. It seems this step doesn't account for missing data.
Thoughts?
I think that Jagadish simply didn't carry the idea out far enough. The following uses a macro that I and some others just finished developing and will be presenting at some upcoming SAS conferences.
The bulk of the work was done in making the file tall. The macro is simply a way of doing the transpose in one step. The only difference between its result and your desired result is the order of the output variables. You could use proc transpose do make the file wide, but that would require a number of steps.
Let me know if this works for you, as I'm just now starting to try to get some feedback regarding the macro.
proc import datafile="c:\art\location.xls" out=StdData
dbms=excel replace;
getnames=yes;
sheet=sheet1;
run;
proc import datafile="c:\art\location.xls" out=SchData
dbms=excel replace;
getnames=yes;
sheet=sheet2;
run;
data std_vertical;
set StdData;
array sch(*) schcode02-schcode10;
do i = 1 to 9;
schcode=sch(i);
year=2001+i;
if not missing(schcode) then output;
end;
drop schcode02-schcode10 i;
run;
proc sort data=std_vertical;
by schcode year;
run;
data sch_vertical (keep=schcode
year
pctFRlunch
AvgParEduc
PctHisp
PctTotELs
PctELSpan
PctELsPLS
PctELsL1);
set SchData;
array pctFRlunc(*) pctFRlunch_02-pctFRlunch_04; /*-pctFRlunch_10*/;
array AvgParEdu(*) AvgParEduc_02-AvgParEduc_04; /*-AvgParEduc_10*/;
array PctHis(*) PctHisp_02-PctHisp_04; /*-PctHisp_10*/;
array PctTotEL(*) PctTotELs_02-PctTotELs_04; /*-PctTotELs_10*/;
array PctELSpa(*) PctELSpan_02-PctELSpan_04; /*-PctELSpan_10*/;
array PctELsPL(*) PctELsPLS_02-PctELsPLS_04; /*-PctELsPLS_10*/;
array PctELsL(*) PctELsL1_02-PctELsL1_04; /*-PctELsL1_10*/;
do i = 1 to 3; /*9*/
pctFRlunch=pctFRlunc(i);
AvgParEduc=AvgParEdu(i);
PctHisp=PctHis(i);
PctTotELs=PctTotEL(i);
PctELSpan=PctELSpa(i);
PctELsPLS=PctELsPL(i);
PctELsL1=PctELsL(i);
year=2001+i;
output;
end;
run;
proc sort data=sch_vertical;
by schcode year;
run;
data student_school;
merge sch_vertical (in=a) Std_vertical (in=b);
by schcode year;
if a and b;
run;
proc sort data=student_school;
by studentID year;
run;
%macro transpose(libname_in=work,
libname_out=work,
filename_in =have,
filename_out=want,
by_variable=id,
vars=ind1-ind4,
prefix=,
idlabel=date,
idlabel_format=yymon7.,
idlabel_prefix=,
guessingrows=1000);
data _temp;
set &libname_in..&filename_in. (obs=1 keep=&vars.);
run;
%let vars_char="";
%let varlist_char="";
%let vars_num="";
%let varlist_num="";
proc sql noprint;
select name
into :vars_char separated by " "
from dictionary.columns
where libname="WORK" and
memname="_TEMP" and
type="char"
;
select name
into :vars_num separated by " "
from dictionary.columns
where libname="WORK" and
memname="_TEMP" and
type="num"
;
%if &vars_char ne "" %then %do;
select distinct
%do i=1 %to %sysfunc(countw("&vars_char."));
%if &i. lt %sysfunc(countw("&vars_char.")) %then %do;
" "||"&prefix."||strip(scan("&vars_char.",&i.))||"_"||
"&idlabel_prefix."||
put(&idlabel.,&idlabel_format)||
%end;
%else %do;
" "||"&prefix."||strip(scan("&vars_char.",&i.))||"_"||
"&idlabel_prefix."||
put(&idlabel.,&idlabel_format),
%end;
%end;
&idlabel.
into :varlist_char separated by " ",
:junk
from &libname_in..&filename_in. (obs=&guessingrows.)
order by &idlabel.
;
%let num_charvars=&sqlobs.;
%end;
%if &vars_num ne "" %then %do;
select distinct
%do i=1 %to %sysfunc(countw("&vars_num."));
%if &i. lt %sysfunc(countw("&vars_num.")) %then %do;
" "||"&prefix."||strip(scan("&vars_num.",&i.))||"_"||
"&idlabel_prefix."||
put(&idlabel.,&idlabel_format)||
%end;
%else %do;
" "||"&prefix."||strip(scan("&vars_num.",&i.))||"_"||
"&idlabel_prefix."||
put(&idlabel.,&idlabel_format),
%end;
%end;
&idlabel.
into :varlist_num separated by " ",
:junk
from &libname_in..&filename_in. (obs=&guessingrows.)
order by &idlabel.
;
%let num_numvars=&sqlobs.;
%end;
create table _for_format as
select distinct &idlabel. as start
from &libname_in..&filename_in. (obs=&guessingrows.)
order by &idlabel.
;
%let num_numlabels=&sqlobs.;
quit;
data _for_format;
set _for_format;
retain fmtname "labelfmt" type "N";
label=_n_-1;
run;
proc format cntlin = _for_format;
run ;
filename sascode temp;
data _null_;
file sascode;
length var $32;
put "data &libname_out..&filename_out.;";
put " set &libname_in..&filename_in.;";
put " by &by_variable.;";
%if &vars_char. ne "" %then %do;
put " array want_char(*) $";
%do i=1 %to %eval(&num_numlabels.*%sysfunc(countw("&vars_char.")));
var=scan("&varlist_char.",&i.);
put @7 var;
%end;
put @5 ";";
put " array have_char(*) $ &vars_char.;";
put " retain want_char;";
put " if first.&by_variable. then call missing(of want_char(*));";
put " _nchar=put(&idlabel.,labelfmt.)*dim(have_char);";
put " do _i=1 to dim(have_char);";
put " want_char(_nchar+_i)=have_char(_i);";
put " end;";
%end;
%if &vars_num. ne "" %then %do;
put " array want_num(*) ";
%do i=1 %to %eval(&num_numlabels.*%sysfunc(countw("&vars_num.")));
var=scan("&varlist_num.",&i.);
put @7 var;
%end;
put @5 ";";
put " array have_num(*) &vars_num.;";
put " retain want_num;";
put " if first.&by_variable. then call missing(of want_num(*));";
put " _nnum=put(&idlabel.,labelfmt.)*dim(have_num);";
put " do _i=1 to dim(have_num);";
put " want_num(_nnum+_i)=have_num(_i);";
put " end;";
%end;
put " drop &idlabel. _: &vars.;";
put " if last.&by_variable. then output;";
put "run;";
run;
%include sascode;
%mend transpose;
options NOQUOTELENMAX;
%transpose(filename_in =student_school,
by_variable=StudentId,
vars=SchCode pctFRlunch--PctELsL1,
idlabel=year,
idlabel_format=4.,
guessingrows=1000)
Thank you very much for this. Again, I really appreciate it.
This code worked in that within the ‘want’ file, students are matched up with the correct, year-specific school variables. This worked very well. However, there are only 81,687 observations (students) in the ‘want’ file, and in the original ‘StdData’ file there are 139,909 students. Also, in the ‘want’ file only the school variables 2002 through 2004 are included (rather than through 2010).
The error(s) could be mine. I created several more school variables and included them in the ‘SchData’ file. I modified your code to include these variables. The only other changes I made were to call the unique student identifier “StdPseudoId” rather than “StudentId” as this is how it is in the original file (I mistakenly wrote ‘StudentId’ in my posted question); and a couple of other variable names were slightly different in my file from what I wrote in the post, so I changed the code in those places as well. It does not seem that variable names are the issue, though, because for 2002, 2003, 2004 all variables are perfect in the ‘want’ file.
Could you take a look at my log (below) and let me know if you understand the error? Thanks very much for your help.
NOTE: The import data set has 139909 observations and 10 variables.
NOTE: Compressing data set WORK.STDDATA decreased size by 17.89 percent.
Compressed is 1138 pages; un-compressed would require 1386 pages.
NOTE: WORK.STDDATA data set was successfully created.
NOTE: PROCEDURE IMPORT used (Total process time):
real time 0.35 seconds
user cpu time 0.04 seconds
system cpu time 0.20 seconds
memory 225.01k
OS Memory 18108.00k
Timestamp 01/21/2013 05:06:15 PM
NOTE: PROCEDURE CONTENTS used (Total process time):
real time 0.06 seconds
user cpu time 0.06 seconds
system cpu time 0.01 seconds
memory 296.54k
OS Memory 18108.00k
Timestamp 01/21/2013 05:06:22 PM
NOTE: The import data set has 599 observations and 170 variables.
NOTE: Compressing data set WORK.SCHDATA decreased size by 17.31 percent.
Compressed is 43 pages; un-compressed would require 52 pages.
NOTE: WORK.SCHDATA data set was successfully created.
NOTE: PROCEDURE IMPORT used (Total process time):
real time 0.05 seconds
user cpu time 0.03 seconds
system cpu time 0.01 seconds
memory 575.53k
OS Memory 18108.00k
Timestamp 01/21/2013 05:06:41 PM
PROC CONTENTS DATA = StdData VARNUM; run;
PROC CONTENTS DATA = SchData VARNUM; run;
NOTE: PROCEDURE CONTENTS used (Total process time):
real time 0.16 seconds
user cpu time 0.06 seconds
system cpu time 0.03 seconds
memory 416.18k
OS Memory 18108.00k
Timestamp 01/21/2013 05:06:49 PM
509 data std_vertical;
510 set StdData;
511 array sch(*) schcode02-schcode10;
512 do i = 1 to 9;
513 schcode=sch(i);
514 year=2001+i;
515 if not missing(schcode) then output;
516 end;
517 drop schcode02-schcode10 i;
518 run;
NOTE: There were 139909 observations read from the data set WORK.STDDATA.
NOTE: The data set WORK.STD_VERTICAL has 810540 observations and 3 variables.
NOTE: Compressing data set WORK.STD_VERTICAL increased size by 27.96 percent.
Compressed is 6174 pages; un-compressed would require 4825 pages.
NOTE: DATA statement used (Total process time):
real time 0.91 seconds
user cpu time 0.56 seconds
system cpu time 0.25 seconds
memory 301.56k
OS Memory 18108.00k
Timestamp 01/21/2013 05:07:53 PM
519
520 proc sort data=std_vertical;
521 by schcode year;
522 run;
NOTE: There were 810540 observations read from the data set WORK.STD_VERTICAL.
NOTE: The data set WORK.STD_VERTICAL has 810540 observations and 3 variables.
NOTE: Compressing data set WORK.STD_VERTICAL increased size by 27.96 percent.
Compressed is 6174 pages; un-compressed would require 4825 pages.
NOTE: PROCEDURE SORT used (Total process time):
real time 11.29 seconds
user cpu time 1.88 seconds
system cpu time 1.54 seconds
memory 45409.18k
OS Memory 64188.00k
Timestamp 01/21/2013 05:08:04 PM
523
524 data sch_vertical (keep=schcode
525 year
526 PctFRlunch
527 PctRespParEd
528 AvgParEd
529 SCI
530 TotEnr
531 AfrAmerPct
532 HispPct
533 WhitePct
534 PctTotELs
535 PctELsSpan
536 PctELsELDorSDAIE
537 PctELsPLS
538 PctELsL1
539 SpanTstoSsRatio
540 SpanAidestoSsRatio
541 SpanTsandAidestoSsRatio
542 EngTstoSsRatio
543 PctTotReclassYr);
544 set SchData;
545 array PctFRlunc(*) PctFRlunch_02-PctFRlunch_04; /*-PctFRlunch_10*/;
546 array PctRespParE(*) PctRespParEd_02-PctRespParEd_04; /*-PctRespParEd_10*/;
547 array AvgParE(*) AvgParEd_02-AvgParEd_04; /*-AvgParEd_10*/;
548 array SC(*) SCI_02-SCI_04; /*-SCI_10*/;
549 array TotEn(*) TotEnr_02-TotEnr_04; /*-TotEnr_10*/;
550 array AfrAmerPc(*) AfrAmerPct_02-AfrAmerPct_04; /*-AfrAmerPct_10*/;
551 array HispPc(*) HispPct_02-HispPct_04; /*-HispPct_10*/;
552 array WhitePc(*) WhitePct_02-WhitePct_04; /*-WhitePct_10*/;
553 array PctTotEL(*) PctTotELs_02-PctTotELs_04; /*-PctTotELs_10*/;
554 array PctELsSpa(*) PctELsSpan_02-PctELsSpan_04; /*-PctELsSpan_10*/;
555 array PctELsELDorSDAI(*) PctELsELDorSDAIE_02-PctELsELDorSDAIE_04; /*-PctELsELDorSDAIE_10*/;
556 array PctELsPL(*) PctELsPLS_02-PctELsPLS_04; /*-PctELsPLS_10*/;
557 array PctELsL(*) PctELsL1_02-PctELsL1_04; /*-PctELsL1_10*/;
558 array SpanTstoSsRati(*) SpanTstoSsRatio_02-SpanTstoSsRatio_04; /*-SpanTstoSsRatio_10*/;
559 array SpanAidestoSsRati(*) SpanAidestoSsRatio_02-SpanAidestoSsRatio_04;
559! /*-SpanAidestoSsRatio_10*/;
560 array SpanTsandAidestoSsRati(*) SpanTsandAidestoSsRatio_02-SpanTsandAidestoSsRatio_04;
560! /*-SpanTsandAidestoSsRatio_10*/;
561 array EngTstoSsRati(*) EngTstoSsRatio_02-EngTstoSsRatio_04; /*-EngTstoSsRatio_10*/;
562 array PctTotReclassY(*) PctTotReclassYr_02-PctTotReclassYr_04; /*-PctTotReclassYr_10*/;
563 do i = 1 to 3; /*9*/
564 PctFRlunch=PctFRlunc(i);
565 PctRespParEd=PctRespParE(i);
566 AvgParEd=AvgParE(i);
567 SCI=SC(i);
568 TotEnr=TotEn(i);
569 AfrAmerPct=AfrAmerPc(i);
570 HispPct=HispPc(i);
571 WhitePct=WhitePc(i);
572 PctTotELs=PctTotEL(i);
573 PctELsSpan=PctELsSpa(i);
574 PctELsELDorSDAIE=PctELsELDorSDAI(i);
575 PctELsPLS=PctELsPL(i);
576 PctELsL1=PctELsL(i);
577 SpanTstoSsRatio=SpanTstoSsRati(i);
578 SpanAidestoSsRatio=SpanAidestoSsRati(i);
579 SpanTsandAidestoSsRatio=SpanTsandAidestoSsRati(i);
580 EngTstoSsRatio=EngTstoSsRati(i);
581 PctTotReclassYr=PctTotReclassY(i);
582 year=2001+i;
583 output;
584 end;
585 run;
NOTE: There were 599 observations read from the data set WORK.SCHDATA.
NOTE: The data set WORK.SCH_VERTICAL has 1797 observations and 20 variables.
NOTE: Compressing data set WORK.SCH_VERTICAL decreased size by 11.11 percent.
Compressed is 32 pages; un-compressed would require 36 pages.
NOTE: DATA statement used (Total process time):
real time 0.14 seconds
user cpu time 0.04 seconds
system cpu time 0.07 seconds
memory 606.57k
OS Memory 18364.00k
Timestamp 01/21/2013 05:08:04 PM
586
587 proc sort data=sch_vertical;
588 by schcode year;
589 run;
NOTE: There were 1797 observations read from the data set WORK.SCH_VERTICAL.
NOTE: The data set WORK.SCH_VERTICAL has 1797 observations and 20 variables.
NOTE: Compressing data set WORK.SCH_VERTICAL decreased size by 11.11 percent.
Compressed is 32 pages; un-compressed would require 36 pages.
NOTE: PROCEDURE SORT used (Total process time):
real time 0.02 seconds
user cpu time 0.01 seconds
system cpu time 0.00 seconds
memory 561.84k
OS Memory 18364.00k
Timestamp 01/21/2013 05:08:04 PM
590
591 data student_school;
592 merge sch_vertical (in=a) Std_vertical (in=b);
593 by schcode year;
594 if a and b;
595 run;
NOTE: There were 1797 observations read from the data set WORK.SCH_VERTICAL.
NOTE: There were 810540 observations read from the data set WORK.STD_VERTICAL.
NOTE: The data set WORK.STUDENT_SCHOOL has 170700 observations and 21 variables.
NOTE: Compressing data set WORK.STUDENT_SCHOOL decreased size by 9.28 percent.
Compressed is 3227 pages; un-compressed would require 3557 pages.
NOTE: DATA statement used (Total process time):
real time 0.81 seconds
user cpu time 0.48 seconds
system cpu time 0.28 seconds
memory 543.12k
OS Memory 18364.00k
Timestamp 01/21/2013 05:08:05 PM
596
597 proc sort data=student_school;
598 by StdPseudoId year;
599 run;
NOTE: There were 170700 observations read from the data set WORK.STUDENT_SCHOOL.
NOTE: The data set WORK.STUDENT_SCHOOL has 170700 observations and 21 variables.
NOTE: Compressing data set WORK.STUDENT_SCHOOL decreased size by 9.28 percent.
Compressed is 3227 pages; un-compressed would require 3557 pages.
NOTE: PROCEDURE SORT used (Total process time):
real time 1.43 seconds
user cpu time 0.51 seconds
system cpu time 0.51 seconds
memory 34453.26k
OS Memory 51388.00k
Timestamp 01/21/2013 05:08:07 PM
600
601 %macro transpose(libname_in=work,
602 libname_out=work,
603 filename_in =have,
604 filename_out=want,
605 by_variable=id,
606 vars=ind1-ind4,
607 prefix=,
608 idlabel=date,
609 idlabel_format=yymon7.,
610 idlabel_prefix=,
611 guessingrows=1000);
612
613 data _temp;
614 set &libname_in..&filename_in. (obs=1 keep=&vars.);
615 run;
616
617 %let vars_char="";
618 %let varlist_char="";
619 %let vars_num="";
620 %let varlist_num="";
621
622 proc sql noprint;
623 select name
624 into :vars_char separated by " "
625 from dictionary.columns
626 where libname="WORK" and
627 memname="_TEMP" and
628 type="char"
629 ;
630 select name
631 into :vars_num separated by " "
632 from dictionary.columns
633 where libname="WORK" and
634 memname="_TEMP" and
635 type="num"
636 ;
637
638 %if &vars_char ne "" %then %do;
639 select distinct
640 %do i=1 %to %sysfunc(countw("&vars_char."));
641 %if &i. lt %sysfunc(countw("&vars_char.")) %then %do;
642 " "||"&prefix."||strip(scan("&vars_char.",&i.))||"_"||
643 "&idlabel_prefix."||
644 put(&idlabel.,&idlabel_format)||
645 %end;
646 %else %do;
647 " "||"&prefix."||strip(scan("&vars_char.",&i.))||"_"||
648 "&idlabel_prefix."||
649 put(&idlabel.,&idlabel_format),
650 %end;
651 %end;
652 &idlabel.
653 into :varlist_char separated by " ",
654 :junk
655 from &libname_in..&filename_in. (obs=&guessingrows.)
656 order by &idlabel.
657 ;
658 %let num_charvars=&sqlobs.;
659 %end;
660
661 %if &vars_num ne "" %then %do;
662 select distinct
663 %do i=1 %to %sysfunc(countw("&vars_num."));
664 %if &i. lt %sysfunc(countw("&vars_num.")) %then %do;
665 " "||"&prefix."||strip(scan("&vars_num.",&i.))||"_"||
666 "&idlabel_prefix."||
667 put(&idlabel.,&idlabel_format)||
668 %end;
669 %else %do;
670 " "||"&prefix."||strip(scan("&vars_num.",&i.))||"_"||
671 "&idlabel_prefix."||
672 put(&idlabel.,&idlabel_format),
673 %end;
674 %end;
675 &idlabel.
676 into :varlist_num separated by " ",
677 :junk
678 from &libname_in..&filename_in. (obs=&guessingrows.)
679 order by &idlabel.
680 ;
681 %let num_numvars=&sqlobs.;
682 %end;
683
684 create table _for_format as
685 select distinct &idlabel. as start
686 from &libname_in..&filename_in. (obs=&guessingrows.)
687 order by &idlabel.
688 ;
689 %let num_numlabels=&sqlobs.;
690
691 quit;
692
693 data _for_format;
694 set _for_format;
695 retain fmtname "labelfmt" type "N";
696 label=_n_-1;
697 run;
698
699
700 proc format cntlin = _for_format;
701 run ;
702
703 filename sascode temp;
704 data _null_;
705 file sascode;
706 length var $32;
707 put "data &libname_out..&filename_out.;";
708 put " set &libname_in..&filename_in.;";
709 put " by &by_variable.;";
710 %if &vars_char. ne "" %then %do;
711 put " array want_char(*) $";
712 %do i=1 %to %eval(&num_numlabels.*%sysfunc(countw("&vars_char.")));
713 var=scan("&varlist_char.",&i.);
714 put @7 var;
715 %end;
716 put @5 ";";
717 put " array have_char(*) $ &vars_char.;";
718 put " retain want_char;";
719 put " if first.&by_variable. then call missing(of want_char(*));";
720 put " _nchar=put(&idlabel.,labelfmt.)*dim(have_char);";
721 put " do _i=1 to dim(have_char);";
722 put " want_char(_nchar+_i)=have_char(_i);";
723 put " end;";
724 %end;
725 %if &vars_num. ne "" %then %do;
726 put " array want_num(*) ";
727 %do i=1 %to %eval(&num_numlabels.*%sysfunc(countw("&vars_num.")));
728 var=scan("&varlist_num.",&i.);
729 put @7 var;
730 %end;
731 put @5 ";";
732 put " array have_num(*) &vars_num.;";
733 put " retain want_num;";
734 put " if first.&by_variable. then call missing(of want_num(*));";
735 put " _nnum=put(&idlabel.,labelfmt.)*dim(have_num);";
736 put " do _i=1 to dim(have_num);";
737 put " want_num(_nnum+_i)=have_num(_i);";
738 put " end;";
739 %end;
740 put " drop &idlabel. _: &vars.;";
741 put " if last.&by_variable. then output;";
742 put "run;";
743 run;
744 %include sascode;
745 %mend transpose;
746
747 options NOQUOTELENMAX;
748 %transpose(filename_in =student_school,
749 by_variable=StdPseudoId,
750 vars=SchCode PctFRlunch--PctTotReclassYr,
751 idlabel=year,
752 idlabel_format=4.,
753 guessingrows=1000)
NOTE: There were 1 observations read from the data set WORK.STUDENT_SCHOOL.
NOTE: The data set WORK._TEMP has 1 observations and 19 variables.
NOTE: Compressing data set WORK._TEMP increased size by 100.00 percent.
Compressed is 2 pages; un-compressed would require 1 pages.
NOTE: DATA statement used (Total process time):
real time 0.22 seconds
user cpu time 0.00 seconds
system cpu time 0.01 seconds
memory 349.00k
OS Memory 18364.00k
Timestamp 01/21/2013 05:08:08 PM
NOTE: No rows were selected.
NOTE: Compression was disabled for data set WORK._FOR_FORMAT because compression overhead would
increase the size of the data set.
NOTE: Table WORK._FOR_FORMAT created, with 3 rows and 1 columns.
NOTE: PROCEDURE SQL used (Total process time):
real time 0.20 seconds
user cpu time 0.07 seconds
system cpu time 0.03 seconds
memory 1460.71k
OS Memory 18364.00k
Timestamp 01/21/2013 05:08:09 PM
NOTE: There were 3 observations read from the data set WORK._FOR_FORMAT.
NOTE: The data set WORK._FOR_FORMAT has 3 observations and 4 variables.
NOTE: Compressing data set WORK._FOR_FORMAT increased size by 100.00 percent.
Compressed is 2 pages; un-compressed would require 1 pages.
NOTE: DATA statement used (Total process time):
real time 0.04 seconds
user cpu time 0.01 seconds
system cpu time 0.00 seconds
memory 299.71k
OS Memory 18364.00k
Timestamp 01/21/2013 05:08:09 PM
NOTE: Format LABELFMT has been output.
NOTE: PROCEDURE FORMAT used (Total process time):
real time 0.06 seconds
user cpu time 0.01 seconds
system cpu time 0.01 seconds
memory 169.15k
OS Memory 18364.00k
Timestamp 01/21/2013 05:08:09 PM
NOTE: There were 3 observations read from the data set WORK._FOR_FORMAT.
NOTE: 72 records were written to the file SASCODE.
The minimum record length was 4.
The maximum record length was 249.
NOTE: DATA statement used (Total process time):
real time 0.04 seconds
user cpu time 0.03 seconds
system cpu time 0.01 seconds
memory 963.56k
OS Memory 19436.00k
Timestamp 01/21/2013 05:08:09 PM
NOTE: Character values have been converted to numeric values at the places given by:
(Line):(Column).
819:11
NOTE: There were 170700 observations read from the data set WORK.STUDENT_SCHOOL.
NOTE: The data set WORK.WANT has 81687 observations and 58 variables.
NOTE: Compressing data set WORK.WANT decreased size by 18.97 percent.
Compressed is 1892 pages; un-compressed would require 2335 pages.
NOTE: DATA statement used (Total process time):
real time 0.87 seconds
user cpu time 0.37 seconds
system cpu time 0.26 seconds
memory 548.03k
OS Memory 18876.00k
Timestamp 01/21/2013 05:08:10 PM
You didn't change all of the code to reflect 10 years of data. Since your example only had four years of school data, my code only reflected those four years. I tried to show, in comments, where they "should" have been changed to 9 or 10 based on how they were applied.
However, the code you just posted, didn't reflect those changes, thus you are only getting 4 years of results.
Try it by running the following. BTW, the macro (while included below) hasn't changed from my previous post, just the datasteps used to prepare the data:
data std_vertical;
set StdData;
array sch(*) schcode02-schcode10;
do i = 1 to 9;
schcode=sch(i);
year=2001+i;
if not missing(schcode) then output;
end;
drop schcode02-schcode10 i;
run;
proc sort data=std_vertical;
by schcode year;
run;
data sch_vertical (keep=schcode
year
pctFRlunch
AvgParEduc
PctHisp
PctTotELs
PctELSpan
PctELsPLS
PctELsL1);
set SchData;
array pctFRlunc(*) pctFRlunch_02-pctFRlunch_10;
array AvgParEdu(*) AvgParEduc_02-AvgParEduc_10;
array PctHis(*) PctHisp_02-PctHisp_10;
array PctTotEL(*) PctTotELs_02-PctTotELs_10;
array PctELSpa(*) PctELSpan_02-PctELSpan_10;
array PctELsPL(*) PctELsPLS_02-PctELsPLS_10;
array PctELsL(*) PctELsL1_02-PctELsL1_10;
do i = 1 to 9;
pctFRlunch=pctFRlunc(i);
AvgParEduc=AvgParEdu(i);
PctHisp=PctHis(i);
PctTotELs=PctTotEL(i);
PctELSpan=PctELSpa(i);
PctELsPLS=PctELsPL(i);
PctELsL1=PctELsL(i);
year=2001+i;
output;
end;
run;
proc sort data=sch_vertical;
by schcode year;
run;
data student_school;
merge sch_vertical (in=a) Std_vertical (in=b);
by schcode year;
if a and b;
run;
proc sort data=student_school;
by studentID year;
run;
%macro transpose(libname_in=work,
libname_out=work,
filename_in =have,
filename_out=want,
by_variable=id,
vars=ind1-ind4,
prefix=,
idlabel=date,
idlabel_format=yymon7.,
idlabel_prefix=,
guessingrows=1000);
data _temp;
set &libname_in..&filename_in. (obs=1 keep=&vars.);
run;
%let vars_char="";
%let varlist_char="";
%let vars_num="";
%let varlist_num="";
proc sql noprint;
select name
into :vars_char separated by " "
from dictionary.columns
where libname="WORK" and
memname="_TEMP" and
type="char"
;
select name
into :vars_num separated by " "
from dictionary.columns
where libname="WORK" and
memname="_TEMP" and
type="num"
;
%if &vars_char ne "" %then %do;
select distinct
%do i=1 %to %sysfunc(countw("&vars_char."));
%if &i. lt %sysfunc(countw("&vars_char.")) %then %do;
" "||"&prefix."||strip(scan("&vars_char.",&i.))||"_"||
"&idlabel_prefix."||
put(&idlabel.,&idlabel_format)||
%end;
%else %do;
" "||"&prefix."||strip(scan("&vars_char.",&i.))||"_"||
"&idlabel_prefix."||
put(&idlabel.,&idlabel_format),
%end;
%end;
&idlabel.
into :varlist_char separated by " ",
:junk
from &libname_in..&filename_in. (obs=&guessingrows.)
order by &idlabel.
;
%let num_charvars=&sqlobs.;
%end;
%if &vars_num ne "" %then %do;
select distinct
%do i=1 %to %sysfunc(countw("&vars_num."));
%if &i. lt %sysfunc(countw("&vars_num.")) %then %do;
" "||"&prefix."||strip(scan("&vars_num.",&i.))||"_"||
"&idlabel_prefix."||
put(&idlabel.,&idlabel_format)||
%end;
%else %do;
" "||"&prefix."||strip(scan("&vars_num.",&i.))||"_"||
"&idlabel_prefix."||
put(&idlabel.,&idlabel_format),
%end;
%end;
&idlabel.
into :varlist_num separated by " ",
:junk
from &libname_in..&filename_in. (obs=&guessingrows.)
order by &idlabel.
;
%let num_numvars=&sqlobs.;
%end;
create table _for_format as
select distinct &idlabel. as start
from &libname_in..&filename_in. (obs=&guessingrows.)
order by &idlabel.
;
%let num_numlabels=&sqlobs.;
quit;
data _for_format;
set _for_format;
retain fmtname "labelfmt" type "N";
label=_n_-1;
run;
proc format cntlin = _for_format;
run ;
filename sascode temp;
data _null_;
file sascode;
length var $32;
put "data &libname_out..&filename_out.;";
put " set &libname_in..&filename_in.;";
put " by &by_variable.;";
%if &vars_char. ne "" %then %do;
put " array want_char(*) $";
%do i=1 %to %eval(&num_numlabels.*%sysfunc(countw("&vars_char.")));
var=scan("&varlist_char.",&i.);
put @7 var;
%end;
put @5 ";";
put " array have_char(*) $ &vars_char.;";
put " retain want_char;";
put " if first.&by_variable. then call missing(of want_char(*));";
put " _nchar=put(&idlabel.,labelfmt.)*dim(have_char);";
put " do _i=1 to dim(have_char);";
put " want_char(_nchar+_i)=have_char(_i);";
put " end;";
%end;
%if &vars_num. ne "" %then %do;
put " array want_num(*) ";
%do i=1 %to %eval(&num_numlabels.*%sysfunc(countw("&vars_num.")));
var=scan("&varlist_num.",&i.);
put @7 var;
%end;
put @5 ";";
put " array have_num(*) &vars_num.;";
put " retain want_num;";
put " if first.&by_variable. then call missing(of want_num(*));";
put " _nnum=put(&idlabel.,labelfmt.)*dim(have_num);";
put " do _i=1 to dim(have_num);";
put " want_num(_nnum+_i)=have_num(_i);";
put " end;";
%end;
put " drop &idlabel. _: &vars.;";
put " if last.&by_variable. then output;";
put "run;";
run;
%include sascode;
%mend transpose;
options NOQUOTELENMAX;
%transpose(filename_in =student_school,
by_variable=StudentId,
vars=SchCode pctFRlunch--PctELsL1,
idlabel=year,
idlabel_format=4.,
guessingrows=1000)
ah! I had made my edits to the code (regarding additional variables) in Word… and then read through the log to try and figure out the error... and I mistook the comments for code. An embarrassing mistake! I should have examined the code in the editor window – the color-coding would have helped me.
Your comments were actually very clear. I see now. And this time the resulting 'want' dataset was correct.
There were, however, 139,616 observations rather than the original 139,909. I figured out this is because 293 students attended schools that are, for some reason, not included in the school file. So those students were dropped from the ‘want’ dataset. This isn’t a problem, as I can just add them back in, but I figured I’d let you know in case it’s relevant for your macro.
Thank you!!
I would be interested to know how much real and CPU time the process required for your data. In our tests, thus far, the %transpose macro runs almost 40 times faster than doing similar complex jobs using PROC TRANSPOSE.
It was fast. I've pasted the log below so you can see the time for the whole process. But, you should know I ran this on a MacBook Pro running SAS via Virtual Machine. Had I booted up into Windows directly (my hard drive is partitioned), I imagine it would have been faster still.
1818
1819
1820 data std_vertical;
1821 set StdData;
1822 array sch(*) schcode02-schcode10;
1823 do i = 1 to 9;
1824 schcode=sch(i);
1825 year=2001+i;
1826 if not missing(schcode) then output;
1827 end;
1828 drop schcode02-schcode10 i;
1829 run;
NOTE: There were 139909 observations read from the data set WORK.STDDATA.
NOTE: The data set WORK.STD_VERTICAL has 810540 observations and 3 variables.
NOTE: Compressing data set WORK.STD_VERTICAL increased size by 27.96 percent.
Compressed is 6174 pages; un-compressed would require 4825 pages.
NOTE: DATA statement used (Total process time):
real time 0.93 seconds
user cpu time 0.29 seconds
system cpu time 0.32 seconds
memory 346.03k
OS Memory 18620.00k
Timestamp 01/21/2013 11:31:01 PM
1830
1831 proc sort data=std_vertical;
1832 by schcode year;
1833 run;
NOTE: There were 810540 observations read from the data set WORK.STD_VERTICAL.
NOTE: The data set WORK.STD_VERTICAL has 810540 observations and 3 variables.
NOTE: Compressing data set WORK.STD_VERTICAL increased size by 27.96 percent.
Compressed is 6174 pages; un-compressed would require 4825 pages.
NOTE: PROCEDURE SORT used (Total process time):
real time 1.48 seconds
user cpu time 0.99 seconds
system cpu time 0.43 seconds
memory 45408.96k
OS Memory 64700.00k
Timestamp 01/21/2013 11:31:03 PM
1834
1835 data sch_vertical (keep=schcode
1836 year
1837 PctFRlunch
1838 PctRespParEd
1839 AvgParEd
1840 SCI
1841 TotEnr
1842 AfrAmerPct
1843 HispPct
1844 WhitePct
1845 PctTotELs
1846 PctELsSpan
1847 PctELsELDorSDAIE
1848 PctELsPLS
1849 PctELsL1
1850 SpanTstoSsRatio
1851 SpanAidestoSsRatio
1852 SpanTsandAidestoSsRatio
1853 EngTstoSsRatio
1854 PctTotReclassYr);
1855 set SchData;
1856 array PctFRlunc(*) PctFRlunch_02-PctFRlunch_10; /*-PctFRlunch_10*/;
1857 array PctRespParE(*) PctRespParEd_02-PctRespParEd_10; /*-PctRespParEd_10*/;
1858 array AvgParE(*) AvgParEd_02-AvgParEd_10; /*-AvgParEd_10*/;
1859 array SC(*) SCI_02-SCI_10; /*-SCI_10*/;
1860 array TotEn(*) TotEnr_02-TotEnr_10; /*-TotEnr_10*/;
1861 array AfrAmerPc(*) AfrAmerPct_02-AfrAmerPct_10; /*-AfrAmerPct_10*/;
1862 array HispPc(*) HispPct_02-HispPct_10; /*-HispPct_10*/;
1863 array WhitePc(*) WhitePct_02-WhitePct_10; /*-WhitePct_10*/;
1864 array PctTotEL(*) PctTotELs_02-PctTotELs_10; /*-PctTotELs_10*/;
1865 array PctELsSpa(*) PctELsSpan_02-PctELsSpan_10; /*-PctELsSpan_10*/;
1866 array PctELsELDorSDAI(*) PctELsELDorSDAIE_02-PctELsELDorSDAIE_10; /*-PctELsELDorSDAIE_10*/;
1867 array PctELsPL(*) PctELsPLS_02-PctELsPLS_10; /*-PctELsPLS_10*/;
1868 array PctELsL(*) PctELsL1_02-PctELsL1_10; /*-PctELsL1_10*/;
1869 array SpanTstoSsRati(*) SpanTstoSsRatio_02-SpanTstoSsRatio_10; /*-SpanTstoSsRatio_10*/;
1870 array SpanAidestoSsRati(*) SpanAidestoSsRatio_02-SpanAidestoSsRatio_10;
1870 ! /*-SpanAidestoSsRatio_10*/;
1871 array SpanTsandAidestoSsRati(*) SpanTsandAidestoSsRatio_02-SpanTsandAidestoSsRatio_10;
1871 ! /*-SpanTsandAidestoSsRatio_10*/;
1872 array EngTstoSsRati(*) EngTstoSsRatio_02-EngTstoSsRatio_10; /*-EngTstoSsRatio_10*/;
1873 array PctTotReclassY(*) PctTotReclassYr_02-PctTotReclassYr_10; /*-PctTotReclassYr_10*/;
1874 do i = 1 to 9; /*9*/
1875 PctFRlunch=PctFRlunc(i);
1876 PctRespParEd=PctRespParE(i);
1877 AvgParEd=AvgParE(i);
1878 SCI=SC(i);
1879 TotEnr=TotEn(i);
1880 AfrAmerPct=AfrAmerPc(i);
1881 HispPct=HispPc(i);
1882 WhitePct=WhitePc(i);
1883 PctTotELs=PctTotEL(i);
1884 PctELsSpan=PctELsSpa(i);
1885 PctELsELDorSDAIE=PctELsELDorSDAI(i);
1886 PctELsPLS=PctELsPL(i);
1887 PctELsL1=PctELsL(i);
1888 SpanTstoSsRatio=SpanTstoSsRati(i);
1889 SpanAidestoSsRatio=SpanAidestoSsRati(i);
1890 SpanTsandAidestoSsRatio=SpanTsandAidestoSsRati(i);
1891 EngTstoSsRatio=EngTstoSsRati(i);
1892 PctTotReclassYr=PctTotReclassY(i);
1893 year=2001+i;
1894 output;
1895 end;
1896 run;
NOTE: There were 599 observations read from the data set WORK.SCHDATA.
NOTE: The data set WORK.SCH_VERTICAL has 5391 observations and 20 variables.
NOTE: Compressing data set WORK.SCH_VERTICAL decreased size by 14.02 percent.
Compressed is 92 pages; un-compressed would require 107 pages.
NOTE: DATA statement used (Total process time):
real time 0.06 seconds
user cpu time 0.01 seconds
system cpu time 0.04 seconds
memory 718.37k
OS Memory 18620.00k
Timestamp 01/21/2013 11:31:03 PM
1897
1898 proc sort data=sch_vertical;
1899 by schcode year;
1900 run;
NOTE: There were 5391 observations read from the data set WORK.SCH_VERTICAL.
NOTE: The data set WORK.SCH_VERTICAL has 5391 observations and 20 variables.
NOTE: Compressing data set WORK.SCH_VERTICAL decreased size by 14.02 percent.
Compressed is 92 pages; un-compressed would require 107 pages.
NOTE: PROCEDURE SORT used (Total process time):
real time 0.01 seconds
user cpu time 0.01 seconds
system cpu time 0.00 seconds
memory 1240.29k
OS Memory 18620.00k
Timestamp 01/21/2013 11:31:03 PM
1901
1902 data student_school;
1903 merge sch_vertical (in=a) Std_vertical (in=b);
1904 by schcode year;
1905 if a and b;
1906 run;
NOTE: There were 5391 observations read from the data set WORK.SCH_VERTICAL.
NOTE: There were 810540 observations read from the data set WORK.STD_VERTICAL.
NOTE: The data set WORK.STUDENT_SCHOOL has 804830 observations and 21 variables.
NOTE: Compressing data set WORK.STUDENT_SCHOOL decreased size by 12.13 percent.
Compressed is 14734 pages; un-compressed would require 16768 pages.
NOTE: DATA statement used (Total process time):
real time 2.83 seconds
user cpu time 0.67 seconds
system cpu time 1.09 seconds
memory 543.43k
OS Memory 18620.00k
Timestamp 01/21/2013 11:31:06 PM
1907
1908 proc sort data=student_school;
1909 by StdPseudoId year;
1910 run;
NOTE: There were 804830 observations read from the data set WORK.STUDENT_SCHOOL.
NOTE: The data set WORK.STUDENT_SCHOOL has 804830 observations and 21 variables.
NOTE: Compressing data set WORK.STUDENT_SCHOOL decreased size by 12.12 percent.
Compressed is 14736 pages; un-compressed would require 16768 pages.
NOTE: PROCEDURE SORT used (Total process time):
real time 5.08 seconds
user cpu time 1.65 seconds
system cpu time 1.13 seconds
memory 161260.60k
OS Memory 181428.00k
Timestamp 01/21/2013 11:31:11 PM
1911
1912 %macro transpose(libname_in=work,
1913 libname_out=work,
1914 filename_in =have,
1915 filename_out=want,
1916 by_variable=id,
1917 vars=ind1-ind4,
1918 prefix=,
1919 idlabel=date,
1920 idlabel_format=yymon7.,
1921 idlabel_prefix=,
1922 guessingrows=1000);
1923
1924 data _temp;
1925 set &libname_in..&filename_in. (obs=1 keep=&vars.);
1926 run;
1927
1928 %let vars_char="";
1929 %let varlist_char="";
1930 %let vars_num="";
1931 %let varlist_num="";
1932
1933 proc sql noprint;
1934 select name
1935 into :vars_char separated by " "
1936 from dictionary.columns
1937 where libname="WORK" and
1938 memname="_TEMP" and
1939 type="char"
1940 ;
1941 select name
1942 into :vars_num separated by " "
1943 from dictionary.columns
1944 where libname="WORK" and
1945 memname="_TEMP" and
1946 type="num"
1947 ;
1948
1949 %if &vars_char ne "" %then %do;
1950 select distinct
1951 %do i=1 %to %sysfunc(countw("&vars_char."));
1952 %if &i. lt %sysfunc(countw("&vars_char.")) %then %do;
1953 " "||"&prefix."||strip(scan("&vars_char.",&i.))||"_"||
1954 "&idlabel_prefix."||
1955 put(&idlabel.,&idlabel_format)||
1956 %end;
1957 %else %do;
1958 " "||"&prefix."||strip(scan("&vars_char.",&i.))||"_"||
1959 "&idlabel_prefix."||
1960 put(&idlabel.,&idlabel_format),
1961 %end;
1962 %end;
1963 &idlabel.
1964 into :varlist_char separated by " ",
1965 :junk
1966 from &libname_in..&filename_in. (obs=&guessingrows.)
1967 order by &idlabel.
1968 ;
1969 %let num_charvars=&sqlobs.;
1970 %end;
1971
1972 %if &vars_num ne "" %then %do;
1973 select distinct
1974 %do i=1 %to %sysfunc(countw("&vars_num."));
1975 %if &i. lt %sysfunc(countw("&vars_num.")) %then %do;
1976 " "||"&prefix."||strip(scan("&vars_num.",&i.))||"_"||
1977 "&idlabel_prefix."||
1978 put(&idlabel.,&idlabel_format)||
1979 %end;
1980 %else %do;
1981 " "||"&prefix."||strip(scan("&vars_num.",&i.))||"_"||
1982 "&idlabel_prefix."||
1983 put(&idlabel.,&idlabel_format),
1984 %end;
1985 %end;
1986 &idlabel.
1987 into :varlist_num separated by " ",
1988 :junk
1989 from &libname_in..&filename_in. (obs=&guessingrows.)
1990 order by &idlabel.
1991 ;
1992 %let num_numvars=&sqlobs.;
1993 %end;
1994
1995 create table _for_format as
1996 select distinct &idlabel. as start
1997 from &libname_in..&filename_in. (obs=&guessingrows.)
1998 order by &idlabel.
1999 ;
2000 %let num_numlabels=&sqlobs.;
2001
2002 quit;
2003
2004 data _for_format;
2005 set _for_format;
2006 retain fmtname "labelfmt" type "N";
2007 label=_n_-1;
2008 run;
2009
2010
2011 proc format cntlin = _for_format;
2012 run ;
2013
2014 filename sascode temp;
2015 data _null_;
2016 file sascode;
2017 length var $32;
2018 put "data &libname_out..&filename_out.;";
2019 put " set &libname_in..&filename_in.;";
2020 put " by &by_variable.;";
2021 %if &vars_char. ne "" %then %do;
2022 put " array want_char(*) $";
2023 %do i=1 %to %eval(&num_numlabels.*%sysfunc(countw("&vars_char.")));
2024 var=scan("&varlist_char.",&i.);
2025 put @7 var;
2026 %end;
2027 put @5 ";";
2028 put " array have_char(*) $ &vars_char.;";
2029 put " retain want_char;";
2030 put " if first.&by_variable. then call missing(of want_char(*));";
2031 put " _nchar=put(&idlabel.,labelfmt.)*dim(have_char);";
2032 put " do _i=1 to dim(have_char);";
2033 put " want_char(_nchar+_i)=have_char(_i);";
2034 put " end;";
2035 %end;
2036 %if &vars_num. ne "" %then %do;
2037 put " array want_num(*) ";
2038 %do i=1 %to %eval(&num_numlabels.*%sysfunc(countw("&vars_num.")));
2039 var=scan("&varlist_num.",&i.);
2040 put @7 var;
2041 %end;
2042 put @5 ";";
2043 put " array have_num(*) &vars_num.;";
2044 put " retain want_num;";
2045 put " if first.&by_variable. then call missing(of want_num(*));";
2046 put " _nnum=put(&idlabel.,labelfmt.)*dim(have_num);";
2047 put " do _i=1 to dim(have_num);";
2048 put " want_num(_nnum+_i)=have_num(_i);";
2049 put " end;";
2050 %end;
2051 put " drop &idlabel. _: &vars.;";
2052 put " if last.&by_variable. then output;";
2053 put "run;";
2054 run;
2055 %include sascode;
2056 %mend transpose;
2057
2058 options NOQUOTELENMAX;
2059 %transpose(filename_in =student_school,
2060 by_variable=StdPseudoId,
2061 vars=SchCode PctFRlunch--PctTotReclassYr,
2062 idlabel=year,
2063 idlabel_format=4.,
2064 guessingrows=1000)
NOTE: There were 1 observations read from the data set WORK.STUDENT_SCHOOL.
NOTE: The data set WORK._TEMP has 1 observations and 19 variables.
NOTE: Compressing data set WORK._TEMP increased size by 100.00 percent.
Compressed is 2 pages; un-compressed would require 1 pages.
NOTE: DATA statement used (Total process time):
real time 0.10 seconds
user cpu time 0.00 seconds
system cpu time 0.00 seconds
memory 349.37k
OS Memory 18620.00k
Timestamp 01/21/2013 11:31:11 PM
NOTE: No rows were selected.
NOTE: Compression was disabled for data set WORK._FOR_FORMAT because compression overhead would
increase the size of the data set.
NOTE: Table WORK._FOR_FORMAT created, with 9 rows and 1 columns.
NOTE: PROCEDURE SQL used (Total process time):
real time 0.18 seconds
user cpu time 0.01 seconds
system cpu time 0.00 seconds
memory 1446.71k
OS Memory 18620.00k
Timestamp 01/21/2013 11:31:11 PM
NOTE: There were 9 observations read from the data set WORK._FOR_FORMAT.
NOTE: The data set WORK._FOR_FORMAT has 9 observations and 4 variables.
NOTE: Compressing data set WORK._FOR_FORMAT increased size by 100.00 percent.
Compressed is 2 pages; un-compressed would require 1 pages.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
user cpu time 0.00 seconds
system cpu time 0.00 seconds
memory 302.03k
OS Memory 18620.00k
Timestamp 01/21/2013 11:31:11 PM
NOTE: Format LABELFMT is already on the library.
NOTE: Format LABELFMT has been output.
NOTE: PROCEDURE FORMAT used (Total process time):
real time 0.00 seconds
user cpu time 0.00 seconds
system cpu time 0.00 seconds
memory 169.37k
OS Memory 18620.00k
Timestamp 01/21/2013 11:31:11 PM
NOTE: There were 9 observations read from the data set WORK._FOR_FORMAT.
NOTE: 186 records were written to the file SASCODE.
The minimum record length was 4.
The maximum record length was 249.
NOTE: DATA statement used (Total process time):
real time 0.04 seconds
user cpu time 0.00 seconds
system cpu time 0.01 seconds
memory 2035.60k
OS Memory 20180.00k
Timestamp 01/21/2013 11:31:11 PM
NOTE: Character values have been converted to numeric values at the places given by:
(Line):(Column).
2244:11
NOTE: There were 804830 observations read from the data set WORK.STUDENT_SCHOOL.
NOTE: The data set WORK.WANT has 139616 observations and 172 variables.
NOTE: Compressing data set WORK.WANT decreased size by 27.67 percent.
Compressed is 9181 pages; un-compressed would require 12694 pages.
NOTE: DATA statement used (Total process time):
real time 6.66 seconds
user cpu time 0.74 seconds
system cpu time 0.65 seconds
memory 660.89k
OS Memory 18620.00k
Timestamp 01/21/2013 11:31:18 PM
Hi.
How about this?
Using some of Jagadish's code:
(the 'vertical' data set that he created)
data long;
set std;
array sch(9) schcode02-schcode10;
do i = 1 to 9;
schcode=sch(i);
output;
end;
drop schcode02-schcode10 i;
run;
*sort by studentId so we can merge back with the student data;
proc sort data = std;by studentId;
proc sort data = long;by studentId;
data temp;
merge std long;
by studentId;
run;
*sort by school code, so we can merge;
proc sort data = temp;by schCode;
proc sort data = school;by schCode;
data want;
merge temp school;
by schCode;
drop schCode02--SchCode10;
if schCode ne .;
run;
proc sort data = want;by studentId;run;
Good luck!
Anca.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.