Help using Base SAS procedures

Combining two longitudinal datasets (one student-level and one school-level)

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 7
Accepted Solution

Combining two longitudinal datasets (one student-level and one school-level)

Hi folks,

I would greatly appreciate help with the following task...

I have two longitudinal datasets containing data from one school district over the years 2002-2010.  One of the datasets contains student-level data for approximately 130,000 students (each with a unique ID). The other dataset contains school-level data for approximately 600 schools (each with a unique school code). I would like to combine the two datasets.

What I HAVE:

Data = StdData (see Table 1)

The dataset with student-level data includes columns identifying which schools a given student attended throughout the years. For example, student 100145 attended school 8543 from 2002 – 2007, and school 2455 from 2008 – 2010.

Data = SchData  (see Table 2)

The dataset with school-level data includes 7 variables per year (2002-2010), for a total of 63 variables:

PctFRlunch_02

AvgParEd_02

PctHisp_02

PctTotELs_02

PctELsSpan_02

PctELsPLS_02

PctELsL1_02

PctFRlunch_03

AvgParEd_03

PctHisp_03

PctTotELs_03

PctELsSpan_03

PctELsPLS_03

PctELsL1_03

(Etc. for each year through 2010, for a total of 63 variables)

What I WANT:

Data = want (see Table 3)

When the datasets are combined, each observation will contain data for one student, and year-specific data for the schools which that student attended each year.


---------------------------------------------------------------------

Table 1

Data = StdData

Student-level data - which school(s) each student attended each year (There are missing values when there are missing data, or when a given student is not enrolled in this school district.)

StudentID

SchCode02

SchCode03

SchCode04

SchCode05

SchCode06

SchCode07

SchCode08

SchCode09

SchCode10

100145

8543

8543

8543

8543

8543

8543

2455

2455

2455

100362

8543

8543

9652

3402

3402

3402

4656

4656

.

100595

.

.

3402

3402

3402

3402

6548

6548

.

100788

2104

2104

2104

2104

2104

.

.

.

.

100900

.

.

.

.

.

.

.

7502

7502

102044

3402

3402

3402

3402

3402

3402

3402

3402

3402

105305

.

4645

4645

4645

4645

4645

4645

4645

6548

108643

.

.

.

8543

8543

8543

8543

8543

8543

109224

3402

3402

2455

2455

2455

.

.

.

.

Etc.

Table 2

Data = SchData

Time-varying school-level variables (There are missing values when there are missing data for a given school-year, or if a given school did not exist in a given year.)

SchCodePctFRlunch_02AvgParEduc_02PctHisp_02PctTotELs_02PctELsSpan_02PctELsPLS_02PctELsL1_02PctFRlunch_03AvgParEduc_03PctHisp_03PctTotELs_03PctELsSpan_03PctELsPLS_03PctELsL1_03PctFRlunch_04AvgParEduc_04PctHisp_04PctTotELs_04PctELsSpan_04PctELsPLS_04PctELsL1_04  Etc. 
2104653.076.49092.2312100.00009.09.00100293.172757.429796.503532.876.991001.7586.219157.950593.292728.6628.66
2455912.2589.389941.379397.435997.44.007427.78212.918386.666713.33.00772.686.57373.585777.7778.00.00
34021001.6599.229367.2447100.000061.0319.7792288.917544.329995.348858.72.00912.3289.682545.767295.953857.2357.23
4645513.0937.918919.224076.146867.89.00100298.842863.452399.848051.9819.451001.6398.445165.1118100.000068.6668.66
4656992.3095.489768.427896.798579.47.0051339.388123.135880.165341.32.00573.1735.497022.109576.146826.6126.61
65481001.9978.992462.832799.243690.17.0099295.596667.329597.257463.71.00982.2896.648066.480497.268924.1624.16
75021001.7991.560470.657596.805690.69.00100282.358866.303399.252690.28.001002.3483.263270.526399.70158.668.66
85431001.7392.800076.2909100.000065.1129.55100290.255464.333096.176585.29.00961.8490.690762.462595.673170.3570.35
9652842.6074.451144.311485.585677.93.00100291.534080.3184100.000074.0521.981001.7592.831881.1214100.000061.8561.85
Etc.

Table 3

Data = want

StudentIDSchCode02SchCode03SchCode04SchCode05SchCode06SchCode07SchCode08SchCode09SchCode10PctFRlunch_02AvgParEduc_02PctHisp_02PctTotELs_02PctELsSpan_02PctELsPLS_02PctELsL1_02PctFRlunch_03AvgParEduc_03PctHisp_03PctTotELs_03PctELsSpan_03PctELsPLS_03PctELsL1_03PctFRlunch_04AvgParEduc_04PctHisp_04PctTotELs_04PctELsSpan_04PctELsPLS_04PctELsL1_04   Etc.
1001458543854385438543854385432455245524551001.7392.800076.2909100.000065.1129.55100290.255464.333096.176585.29.00961.8490.690762.462595.673170.3570.35
10036285438543965234023402340246564656.1001.7392.800076.2909100.000065.1129.55100290.255464.333096.176585.29.001001.7592.831881.1214100.000061.8561.85
100595..340234023402340265486548...............912.3289.682545.767295.953857.2357.23
10078821042104210421042104....653.076.49092.2312100.00009.09.00100293.172757.429796.503532.876.991001.7586.219157.950593.292728.6628.66
100900.......75027502.....................
1020443402340234023402340234023402340234021001.6599.229367.2447100.000061.0319.7792288.917544.329995.348858.72.00912.3289.682545.767295.953857.2357.23
105305.46454645464546454645464546456548.......100298.842863.452399.848051.9819.451001.6398.445165.1118100.000068.6668.66
108643...854385438543854385438543.....................
10922434023402245524552455....1001.6599.229367.2447100.000061.0319.7792288.917544.329995.348858.72.00772.686.57373.585777.7778.00.00
Etc.

Accepted Solutions
Solution
‎01-20-2013 08:20 PM
PROC Star
Posts: 7,363

Re: Combining two longitudinal datasets (one student-level and one school-level)

I think that Jagadish simply didn't carry the idea out far enough.  The following uses a macro that I and some others just finished developing and will be presenting at some upcoming SAS conferences.

The bulk of the work was done in making the file tall.  The macro is simply a way of doing the transpose in one step.  The only difference between its result and your desired result is the order of the output variables.  You could use proc transpose do make the file wide, but that would require a number of steps.

Let me know if this works for you, as I'm just now starting to try to get some feedback regarding the macro.

proc import datafile="c:\art\location.xls" out=StdData

    dbms=excel replace;

    getnames=yes;

    sheet=sheet1;

run;

proc import datafile="c:\art\location.xls" out=SchData

    dbms=excel replace;

    getnames=yes;

    sheet=sheet2;

run;

data std_vertical;

  set StdData;

  array sch(*) schcode02-schcode10;

  do i = 1 to 9;

    schcode=sch(i);

    year=2001+i;

    if not missing(schcode) then output;

  end;

  drop schcode02-schcode10 i;

run;

proc sort data=std_vertical;

    by schcode year;

run;

data sch_vertical (keep=schcode

                        year

                        pctFRlunch

                        AvgParEduc

                        PctHisp

                        PctTotELs

                        PctELSpan

                        PctELsPLS

                        PctELsL1);

  set SchData;

  array pctFRlunc(*) pctFRlunch_02-pctFRlunch_04; /*-pctFRlunch_10*/;

  array AvgParEdu(*) AvgParEduc_02-AvgParEduc_04; /*-AvgParEduc_10*/;

  array PctHis(*) PctHisp_02-PctHisp_04; /*-PctHisp_10*/;

  array PctTotEL(*) PctTotELs_02-PctTotELs_04; /*-PctTotELs_10*/;

  array PctELSpa(*) PctELSpan_02-PctELSpan_04; /*-PctELSpan_10*/;

  array PctELsPL(*) PctELsPLS_02-PctELsPLS_04; /*-PctELsPLS_10*/;

  array PctELsL(*) PctELsL1_02-PctELsL1_04; /*-PctELsL1_10*/;

  do i = 1 to 3; /*9*/

    pctFRlunch=pctFRlunc(i);

  AvgParEduc=AvgParEdu(i);

    PctHisp=PctHis(i);

    PctTotELs=PctTotEL(i);

    PctELSpan=PctELSpa(i);

    PctELsPLS=PctELsPL(i);

    PctELsL1=PctELsL(i);

    year=2001+i;

    output;

  end;

run;

proc sort data=sch_vertical;

    by schcode year;

run;

data student_school;

    merge sch_vertical (in=a) Std_vertical (in=b);

    by schcode year;

    if a and b;

run;

proc sort data=student_school;

  by studentID year;

run;

%macro transpose(libname_in=work,

                 libname_out=work,

                 filename_in =have,

                 filename_out=want,

                 by_variable=id,

                 vars=ind1-ind4,

                 prefix=,

                 idlabel=date,

                 idlabel_format=yymon7.,

                 idlabel_prefix=,

                 guessingrows=1000);

  data _temp;

    set &libname_in..&filename_in. (obs=1 keep=&vars.);

  run;

  %let vars_char="";

  %let varlist_char="";

  %let vars_num="";

  %let varlist_num="";

  proc sql noprint;

    select name

      into :vars_char separated by " "

        from dictionary.columns

          where libname="WORK" and

              memname="_TEMP" and

              type="char"

    ;

    select name

      into :vars_num separated by " "

        from dictionary.columns

          where libname="WORK" and

              memname="_TEMP" and

              type="num"

    ;

  %if &vars_char ne "" %then %do;

    select distinct

    %do i=1 %to %sysfunc(countw("&vars_char."));

      %if &i. lt %sysfunc(countw("&vars_char.")) %then %do;

        " "||"&prefix."||strip(scan("&vars_char.",&i.))||"_"||

          "&idlabel_prefix."||

        put(&idlabel.,&idlabel_format)||

      %end;

      %else %do;

        " "||"&prefix."||strip(scan("&vars_char.",&i.))||"_"||

          "&idlabel_prefix."||

        put(&idlabel.,&idlabel_format),

      %end;

    %end;

        &idlabel.

          into :varlist_char separated by " ",

               :junk

        from &libname_in..&filename_in. (obs=&guessingrows.)

          order by &idlabel.

    ;

    %let num_charvars=&sqlobs.;

  %end;

  %if &vars_num ne "" %then %do;

    select distinct

    %do i=1 %to %sysfunc(countw("&vars_num."));

      %if &i. lt %sysfunc(countw("&vars_num.")) %then %do;

        " "||"&prefix."||strip(scan("&vars_num.",&i.))||"_"||

          "&idlabel_prefix."||

        put(&idlabel.,&idlabel_format)||

      %end;

      %else %do;

        " "||"&prefix."||strip(scan("&vars_num.",&i.))||"_"||

          "&idlabel_prefix."||

        put(&idlabel.,&idlabel_format),

      %end;

    %end;

        &idlabel.

          into :varlist_num separated by " ",

               :junk

        from &libname_in..&filename_in. (obs=&guessingrows.)

          order by &idlabel.

    ;

    %let num_numvars=&sqlobs.;

  %end;

  create table _for_format as

    select distinct &idlabel. as start

      from &libname_in..&filename_in. (obs=&guessingrows.)

        order by &idlabel.

    ;

  %let num_numlabels=&sqlobs.;

  quit;

  data _for_format;

    set _for_format;

    retain fmtname "labelfmt" type "N";

    label=_n_-1;

  run;

  proc format cntlin = _for_format;

  run ;

  filename sascode temp;

  data _null_;

    file sascode;

    length var $32;

    put  "data &libname_out..&filename_out.;";

    put  "  set &libname_in..&filename_in.;";

    put  "  by &by_variable.;";

    %if &vars_char. ne "" %then %do;

      put  "    array want_char(*) $";

      %do i=1 %to %eval(&num_numlabels.*%sysfunc(countw("&vars_char.")));

        var=scan("&varlist_char.",&i.);

        put  @7 var;

      %end;

      put @5 ";";

      put  "    array have_char(*) $ &vars_char.;";

      put  "    retain want_char;";

      put  "    if first.&by_variable. then call missing(of want_char(*));";

      put  "    _nchar=put(&idlabel.,labelfmt.)*dim(have_char);";

      put  "    do _i=1 to dim(have_char);";

      put  "      want_char(_nchar+_i)=have_char(_i);";

      put  "    end;";

    %end;

    %if &vars_num. ne "" %then %do;

      put  "    array want_num(*) ";

      %do i=1 %to %eval(&num_numlabels.*%sysfunc(countw("&vars_num.")));

        var=scan("&varlist_num.",&i.);

        put  @7 var;

      %end;

      put @5 ";";

      put  "    array have_num(*) &vars_num.;";

      put  "    retain want_num;";

      put  "    if first.&by_variable. then call missing(of want_num(*));";

      put  "    _nnum=put(&idlabel.,labelfmt.)*dim(have_num);";

      put  "    do _i=1 to dim(have_num);";

      put  "      want_num(_nnum+_i)=have_num(_i);";

      put  "    end;";

    %end;

      put  "  drop &idlabel. _: &vars.;";

      put  "  if last.&by_variable. then output;";

      put  "run;";

  run;

  %include sascode;

%mend transpose;

options NOQUOTELENMAX;

%transpose(filename_in =student_school,

           by_variable=StudentId,

           vars=SchCode pctFRlunch--PctELsL1,

           idlabel=year,

           idlabel_format=4.,

           guessingrows=1000)

View solution in original post


All Replies
Trusted Advisor
Posts: 1,128

Re: Combining two longitudinal datasets (one student-level and one school-level)

Hi,

Please use the below code to get the output you desire. I have given the comments which code does what. Hope it will help you to understand the code.

Please try and let me know if it works.

%*------------------------------------------------------------*;

%* to import the data from xls, incase data is

               is in the excel format*;

%*------------------------------------------------------------*;

proc import datafile="location.xls" out=StdData

    dbms=excel replace;

    getnames=yes;

    sheet=sheet1;

run;

proc import datafile="location.xls" out=SchData

    dbms=excel replace;

    getnames=yes;

    sheet=sheet2;

run;

%*------------------------------------------------------------*;

%* to convert the horizontal student data to vertical *;

%*------------------------------------------------------------*;

data vertical;

    set StdData;

    array sch(9) schcode02-schcode10;

    do i = 1 to 9;

    schcode=sch(i);

    output;

    end;

    drop schcode02-schcode10 i;

run;

%*------------------------------------------------------------*;

%* to remove the duplicate records from vertical*;

%*------------------------------------------------------------*;

proc sort data=vertical nodup;

    by schcode;

run;

proc sort data=SchData;

    by schcode;

run;

%*------------------------------------------------------------*;

%* to get the records of school from schdata *;

%*------------------------------------------------------------*;

data student_school(rename=(schcode=schcode02));

    merge vertical(in=a) SchData(in=b);

    by schcode;

    if a;

run;

%*------------------------------------------------------------*;

%* now we have the studentid with specific school codes *;

%*------------------------------------------------------------*;

proc sort data=StdData out=StdData;

    by studentid schcode02;

run;

proc sort data=student_school out=student_school_;

    by studentid schcode02;

run;

%*------------------------------------------------------------*;

%* to get the final dataset *;

%*------------------------------------------------------------*;

data want;

    merge StdData(in=a) student_school_(in=b);

    by studentid schcode02;

    if a;

run;

%*------------------------------------------------------------*;

%* end *;

%*------------------------------------------------------------*;

Thanks,

Jagadish

Thanks,
Jag
Occasional Contributor
Posts: 7

Re: Combining two longitudinal datasets (one student-level and one school-level)

Thanks very much for your response. I appreciate it.  There was a problem with the code.  Everything ran without error; however, the resulting file is incorrect. The school-level data is incorrect.  In the final dataset, there are a lot of missing data in places where the data are not actually missing (in the original file).  And the school-level data that is there, is incorrect.

I think at least one problem is with the vertical step and the subsequent elimination of duplicates. It seems this step doesn't account for missing data.

Thoughts?

Solution
‎01-20-2013 08:20 PM
PROC Star
Posts: 7,363

Re: Combining two longitudinal datasets (one student-level and one school-level)

I think that Jagadish simply didn't carry the idea out far enough.  The following uses a macro that I and some others just finished developing and will be presenting at some upcoming SAS conferences.

The bulk of the work was done in making the file tall.  The macro is simply a way of doing the transpose in one step.  The only difference between its result and your desired result is the order of the output variables.  You could use proc transpose do make the file wide, but that would require a number of steps.

Let me know if this works for you, as I'm just now starting to try to get some feedback regarding the macro.

proc import datafile="c:\art\location.xls" out=StdData

    dbms=excel replace;

    getnames=yes;

    sheet=sheet1;

run;

proc import datafile="c:\art\location.xls" out=SchData

    dbms=excel replace;

    getnames=yes;

    sheet=sheet2;

run;

data std_vertical;

  set StdData;

  array sch(*) schcode02-schcode10;

  do i = 1 to 9;

    schcode=sch(i);

    year=2001+i;

    if not missing(schcode) then output;

  end;

  drop schcode02-schcode10 i;

run;

proc sort data=std_vertical;

    by schcode year;

run;

data sch_vertical (keep=schcode

                        year

                        pctFRlunch

                        AvgParEduc

                        PctHisp

                        PctTotELs

                        PctELSpan

                        PctELsPLS

                        PctELsL1);

  set SchData;

  array pctFRlunc(*) pctFRlunch_02-pctFRlunch_04; /*-pctFRlunch_10*/;

  array AvgParEdu(*) AvgParEduc_02-AvgParEduc_04; /*-AvgParEduc_10*/;

  array PctHis(*) PctHisp_02-PctHisp_04; /*-PctHisp_10*/;

  array PctTotEL(*) PctTotELs_02-PctTotELs_04; /*-PctTotELs_10*/;

  array PctELSpa(*) PctELSpan_02-PctELSpan_04; /*-PctELSpan_10*/;

  array PctELsPL(*) PctELsPLS_02-PctELsPLS_04; /*-PctELsPLS_10*/;

  array PctELsL(*) PctELsL1_02-PctELsL1_04; /*-PctELsL1_10*/;

  do i = 1 to 3; /*9*/

    pctFRlunch=pctFRlunc(i);

  AvgParEduc=AvgParEdu(i);

    PctHisp=PctHis(i);

    PctTotELs=PctTotEL(i);

    PctELSpan=PctELSpa(i);

    PctELsPLS=PctELsPL(i);

    PctELsL1=PctELsL(i);

    year=2001+i;

    output;

  end;

run;

proc sort data=sch_vertical;

    by schcode year;

run;

data student_school;

    merge sch_vertical (in=a) Std_vertical (in=b);

    by schcode year;

    if a and b;

run;

proc sort data=student_school;

  by studentID year;

run;

%macro transpose(libname_in=work,

                 libname_out=work,

                 filename_in =have,

                 filename_out=want,

                 by_variable=id,

                 vars=ind1-ind4,

                 prefix=,

                 idlabel=date,

                 idlabel_format=yymon7.,

                 idlabel_prefix=,

                 guessingrows=1000);

  data _temp;

    set &libname_in..&filename_in. (obs=1 keep=&vars.);

  run;

  %let vars_char="";

  %let varlist_char="";

  %let vars_num="";

  %let varlist_num="";

  proc sql noprint;

    select name

      into :vars_char separated by " "

        from dictionary.columns

          where libname="WORK" and

              memname="_TEMP" and

              type="char"

    ;

    select name

      into :vars_num separated by " "

        from dictionary.columns

          where libname="WORK" and

              memname="_TEMP" and

              type="num"

    ;

  %if &vars_char ne "" %then %do;

    select distinct

    %do i=1 %to %sysfunc(countw("&vars_char."));

      %if &i. lt %sysfunc(countw("&vars_char.")) %then %do;

        " "||"&prefix."||strip(scan("&vars_char.",&i.))||"_"||

          "&idlabel_prefix."||

        put(&idlabel.,&idlabel_format)||

      %end;

      %else %do;

        " "||"&prefix."||strip(scan("&vars_char.",&i.))||"_"||

          "&idlabel_prefix."||

        put(&idlabel.,&idlabel_format),

      %end;

    %end;

        &idlabel.

          into :varlist_char separated by " ",

               :junk

        from &libname_in..&filename_in. (obs=&guessingrows.)

          order by &idlabel.

    ;

    %let num_charvars=&sqlobs.;

  %end;

  %if &vars_num ne "" %then %do;

    select distinct

    %do i=1 %to %sysfunc(countw("&vars_num."));

      %if &i. lt %sysfunc(countw("&vars_num.")) %then %do;

        " "||"&prefix."||strip(scan("&vars_num.",&i.))||"_"||

          "&idlabel_prefix."||

        put(&idlabel.,&idlabel_format)||

      %end;

      %else %do;

        " "||"&prefix."||strip(scan("&vars_num.",&i.))||"_"||

          "&idlabel_prefix."||

        put(&idlabel.,&idlabel_format),

      %end;

    %end;

        &idlabel.

          into :varlist_num separated by " ",

               :junk

        from &libname_in..&filename_in. (obs=&guessingrows.)

          order by &idlabel.

    ;

    %let num_numvars=&sqlobs.;

  %end;

  create table _for_format as

    select distinct &idlabel. as start

      from &libname_in..&filename_in. (obs=&guessingrows.)

        order by &idlabel.

    ;

  %let num_numlabels=&sqlobs.;

  quit;

  data _for_format;

    set _for_format;

    retain fmtname "labelfmt" type "N";

    label=_n_-1;

  run;

  proc format cntlin = _for_format;

  run ;

  filename sascode temp;

  data _null_;

    file sascode;

    length var $32;

    put  "data &libname_out..&filename_out.;";

    put  "  set &libname_in..&filename_in.;";

    put  "  by &by_variable.;";

    %if &vars_char. ne "" %then %do;

      put  "    array want_char(*) $";

      %do i=1 %to %eval(&num_numlabels.*%sysfunc(countw("&vars_char.")));

        var=scan("&varlist_char.",&i.);

        put  @7 var;

      %end;

      put @5 ";";

      put  "    array have_char(*) $ &vars_char.;";

      put  "    retain want_char;";

      put  "    if first.&by_variable. then call missing(of want_char(*));";

      put  "    _nchar=put(&idlabel.,labelfmt.)*dim(have_char);";

      put  "    do _i=1 to dim(have_char);";

      put  "      want_char(_nchar+_i)=have_char(_i);";

      put  "    end;";

    %end;

    %if &vars_num. ne "" %then %do;

      put  "    array want_num(*) ";

      %do i=1 %to %eval(&num_numlabels.*%sysfunc(countw("&vars_num.")));

        var=scan("&varlist_num.",&i.);

        put  @7 var;

      %end;

      put @5 ";";

      put  "    array have_num(*) &vars_num.;";

      put  "    retain want_num;";

      put  "    if first.&by_variable. then call missing(of want_num(*));";

      put  "    _nnum=put(&idlabel.,labelfmt.)*dim(have_num);";

      put  "    do _i=1 to dim(have_num);";

      put  "      want_num(_nnum+_i)=have_num(_i);";

      put  "    end;";

    %end;

      put  "  drop &idlabel. _: &vars.;";

      put  "  if last.&by_variable. then output;";

      put  "run;";

  run;

  %include sascode;

%mend transpose;

options NOQUOTELENMAX;

%transpose(filename_in =student_school,

           by_variable=StudentId,

           vars=SchCode pctFRlunch--PctELsL1,

           idlabel=year,

           idlabel_format=4.,

           guessingrows=1000)

Occasional Contributor
Posts: 7

Re: Combining two longitudinal datasets (one student-level and one school-level)

Thank you very much for this.  Again, I really appreciate it.

This code worked in that within the ‘want’ file, students are matched up with the correct, year-specific school variables.  This worked very well.  However, there are only 81,687 observations (students) in the ‘want’ file, and in the original ‘StdData’ file there are 139,909 students.  Also, in the ‘want’ file only the school variables 2002 through 2004 are included (rather than through 2010).

The error(s) could be mine.  I created several more school variables and included them in the ‘SchData’ file.  I modified your code to include these variables. The only other changes I made were to call the unique student identifier “StdPseudoId” rather than “StudentId” as this is how it is in the original file (I mistakenly wrote ‘StudentId’ in my posted question); and a couple of other variable names were slightly different in my file from what I wrote in the post, so I changed the code in those places as well. It does not seem that variable names are the issue, though, because for 2002, 2003, 2004 all variables are perfect in the ‘want’ file.

Could you take a look at my log (below) and let me know if you understand the error?  Thanks very much for your help.

NOTE: The import data set has 139909 observations and 10 variables.

NOTE: Compressing data set WORK.STDDATA decreased size by 17.89 percent.

Compressed is 1138 pages; un-compressed would require 1386 pages.

NOTE: WORK.STDDATA data set was successfully created.

NOTE: PROCEDURE IMPORT used (Total process time):

real time           0.35 seconds

user cpu time       0.04 seconds

system cpu time     0.20 seconds

memory              225.01k

OS Memory           18108.00k

Timestamp           01/21/2013 05:06:15 PM

NOTE: PROCEDURE CONTENTS used (Total process time):

real time           0.06 seconds

user cpu time       0.06 seconds

system cpu time     0.01 seconds

memory              296.54k

OS Memory           18108.00k

Timestamp           01/21/2013 05:06:22 PM

NOTE: The import data set has 599 observations and 170 variables.

NOTE: Compressing data set WORK.SCHDATA decreased size by 17.31 percent.

Compressed is 43 pages; un-compressed would require 52 pages.

NOTE: WORK.SCHDATA data set was successfully created.

NOTE: PROCEDURE IMPORT used (Total process time):

real time           0.05 seconds

user cpu time       0.03 seconds

system cpu time     0.01 seconds

memory              575.53k

OS Memory           18108.00k

Timestamp           01/21/2013 05:06:41 PM

     PROC CONTENTS DATA = StdData VARNUM; run;

     PROC CONTENTS DATA = SchData VARNUM; run;

NOTE: PROCEDURE CONTENTS used (Total process time):

real time           0.16 seconds

user cpu time       0.06 seconds

system cpu time     0.03 seconds

memory              416.18k

OS Memory           18108.00k

Timestamp           01/21/2013 05:06:49 PM

509 data std_vertical;

510 set StdData;

511 array sch(*) schcode02-schcode10;

512 do i = 1 to 9;

513 schcode=sch(i);

514 year=2001+i;

515 if not missing(schcode) then output;

516 end;

517 drop schcode02-schcode10 i;

518 run;

NOTE: There were 139909 observations read from the data set WORK.STDDATA.

NOTE: The data set WORK.STD_VERTICAL has 810540 observations and 3 variables.

NOTE: Compressing data set WORK.STD_VERTICAL increased size by 27.96 percent.

Compressed is 6174 pages; un-compressed would require 4825 pages.

NOTE: DATA statement used (Total process time):

real time           0.91 seconds

user cpu time       0.56 seconds

system cpu time     0.25 seconds

memory              301.56k

OS Memory           18108.00k

Timestamp           01/21/2013 05:07:53 PM

519

520 proc sort data=std_vertical;

521 by schcode year;

522 run;

NOTE: There were 810540 observations read from the data set WORK.STD_VERTICAL.

NOTE: The data set WORK.STD_VERTICAL has 810540 observations and 3 variables.

NOTE: Compressing data set WORK.STD_VERTICAL increased size by 27.96 percent.

Compressed is 6174 pages; un-compressed would require 4825 pages.

NOTE: PROCEDURE SORT used (Total process time):

real time           11.29 seconds

user cpu time       1.88 seconds

system cpu time     1.54 seconds

memory              45409.18k

OS Memory           64188.00k

Timestamp           01/21/2013 05:08:04 PM

523

524 data sch_vertical (keep=schcode

525 year

526 PctFRlunch

527 PctRespParEd

528 AvgParEd

529 SCI

530 TotEnr

531 AfrAmerPct

532 HispPct

533 WhitePct

534 PctTotELs

535 PctELsSpan

536 PctELsELDorSDAIE

537 PctELsPLS

538 PctELsL1

539 SpanTstoSsRatio

540 SpanAidestoSsRatio

541 SpanTsandAidestoSsRatio

542 EngTstoSsRatio

543 PctTotReclassYr);

544 set SchData;

545 array PctFRlunc(*) PctFRlunch_02-PctFRlunch_04; /*-PctFRlunch_10*/;

546 array PctRespParE(*) PctRespParEd_02-PctRespParEd_04; /*-PctRespParEd_10*/;

547 array AvgParE(*) AvgParEd_02-AvgParEd_04; /*-AvgParEd_10*/;

548 array SC(*) SCI_02-SCI_04; /*-SCI_10*/;

549 array TotEn(*) TotEnr_02-TotEnr_04; /*-TotEnr_10*/;

550   array AfrAmerPc(*) AfrAmerPct_02-AfrAmerPct_04; /*-AfrAmerPct_10*/;

551 array HispPc(*) HispPct_02-HispPct_04; /*-HispPct_10*/;

552 array WhitePc(*) WhitePct_02-WhitePct_04; /*-WhitePct_10*/;

553 array PctTotEL(*) PctTotELs_02-PctTotELs_04; /*-PctTotELs_10*/;

554 array PctELsSpa(*) PctELsSpan_02-PctELsSpan_04; /*-PctELsSpan_10*/;

555 array PctELsELDorSDAI(*) PctELsELDorSDAIE_02-PctELsELDorSDAIE_04; /*-PctELsELDorSDAIE_10*/;

556 array PctELsPL(*) PctELsPLS_02-PctELsPLS_04; /*-PctELsPLS_10*/;

557 array PctELsL(*) PctELsL1_02-PctELsL1_04; /*-PctELsL1_10*/;

558 array SpanTstoSsRati(*) SpanTstoSsRatio_02-SpanTstoSsRatio_04; /*-SpanTstoSsRatio_10*/;

559 array SpanAidestoSsRati(*) SpanAidestoSsRatio_02-SpanAidestoSsRatio_04;

559! /*-SpanAidestoSsRatio_10*/;

560 array SpanTsandAidestoSsRati(*) SpanTsandAidestoSsRatio_02-SpanTsandAidestoSsRatio_04;

560! /*-SpanTsandAidestoSsRatio_10*/;

561 array EngTstoSsRati(*) EngTstoSsRatio_02-EngTstoSsRatio_04; /*-EngTstoSsRatio_10*/;

562 array PctTotReclassY(*) PctTotReclassYr_02-PctTotReclassYr_04; /*-PctTotReclassYr_10*/;

563 do i = 1 to 3; /*9*/

564 PctFRlunch=PctFRlunc(i);

565 PctRespParEd=PctRespParE(i);

566 AvgParEd=AvgParE(i);

567 SCI=SC(i);

568 TotEnr=TotEn(i);

569 AfrAmerPct=AfrAmerPc(i);

570 HispPct=HispPc(i);

571 WhitePct=WhitePc(i);

572 PctTotELs=PctTotEL(i);

573 PctELsSpan=PctELsSpa(i);

574 PctELsELDorSDAIE=PctELsELDorSDAI(i);

575 PctELsPLS=PctELsPL(i);

576 PctELsL1=PctELsL(i);

577 SpanTstoSsRatio=SpanTstoSsRati(i);

578 SpanAidestoSsRatio=SpanAidestoSsRati(i);

579 SpanTsandAidestoSsRatio=SpanTsandAidestoSsRati(i);

580 EngTstoSsRatio=EngTstoSsRati(i);

581 PctTotReclassYr=PctTotReclassY(i);

582 year=2001+i;

583 output;

584 end;

585 run;

NOTE: There were 599 observations read from the data set WORK.SCHDATA.

NOTE: The data set WORK.SCH_VERTICAL has 1797 observations and 20 variables.

NOTE: Compressing data set WORK.SCH_VERTICAL decreased size by 11.11 percent.

Compressed is 32 pages; un-compressed would require 36 pages.

NOTE: DATA statement used (Total process time):

real time           0.14 seconds

user cpu time       0.04 seconds

system cpu time     0.07 seconds

memory              606.57k

OS Memory           18364.00k

Timestamp           01/21/2013 05:08:04 PM

586

587 proc sort data=sch_vertical;

588 by schcode year;

589 run;

NOTE: There were 1797 observations read from the data set WORK.SCH_VERTICAL.

NOTE: The data set WORK.SCH_VERTICAL has 1797 observations and 20 variables.

NOTE: Compressing data set WORK.SCH_VERTICAL decreased size by 11.11 percent.

Compressed is 32 pages; un-compressed would require 36 pages.

NOTE: PROCEDURE SORT used (Total process time):

real time           0.02 seconds

user cpu time       0.01 seconds

system cpu time     0.00 seconds

memory              561.84k

OS Memory           18364.00k

Timestamp           01/21/2013 05:08:04 PM

590

591 data student_school;

592 merge sch_vertical (in=a) Std_vertical (in=b);

593 by schcode year;

594 if a and b;

595 run;

NOTE: There were 1797 observations read from the data set WORK.SCH_VERTICAL.

NOTE: There were 810540 observations read from the data set WORK.STD_VERTICAL.

NOTE: The data set WORK.STUDENT_SCHOOL has 170700 observations and 21 variables.

NOTE: Compressing data set WORK.STUDENT_SCHOOL decreased size by 9.28 percent.

Compressed is 3227 pages; un-compressed would require 3557 pages.

NOTE: DATA statement used (Total process time):

real time           0.81 seconds

user cpu time       0.48 seconds

system cpu time     0.28 seconds

memory              543.12k

OS Memory           18364.00k

Timestamp           01/21/2013 05:08:05 PM

596

597 proc sort data=student_school;

598 by StdPseudoId year;

599 run;

NOTE: There were 170700 observations read from the data set WORK.STUDENT_SCHOOL.

NOTE: The data set WORK.STUDENT_SCHOOL has 170700 observations and 21 variables.

NOTE: Compressing data set WORK.STUDENT_SCHOOL decreased size by 9.28 percent.

Compressed is 3227 pages; un-compressed would require 3557 pages.

NOTE: PROCEDURE SORT used (Total process time):

real time           1.43 seconds

user cpu time       0.51 seconds

system cpu time     0.51 seconds

memory              34453.26k

OS Memory           51388.00k

Timestamp           01/21/2013 05:08:07 PM

600

601 %macro transpose(libname_in=work,

602                   libname_out=work,

603                   filename_in =have,

604                   filename_out=want,

605                   by_variable=id,

606                   vars=ind1-ind4,

607                   prefix=,

608                   idlabel=date,

609                   idlabel_format=yymon7.,

610                   idlabel_prefix=,

611                   guessingrows=1000);

612

613 data _temp;

614 set &libname_in..&filename_in. (obs=1 keep=&vars.);

615 run;

616

617 %let vars_char="";

618 %let varlist_char="";

619 %let vars_num="";

620 %let varlist_num="";

621

622 proc sql noprint;

623 select name

624 into :vars_char separated by " "

625          from dictionary.columns

626            where libname="WORK" and

627                memname="_TEMP" and

628                type="char"

629 ;

630 select name

631 into :vars_num separated by " "

632          from dictionary.columns

633            where libname="WORK" and

634                memname="_TEMP" and

635                type="num"

636 ;

637

638 %if &vars_char ne "" %then %do;

639 select distinct

640 %do i=1 %to %sysfunc(countw("&vars_char."));

641 %if &i. lt %sysfunc(countw("&vars_char.")) %then %do;

642          " "||"&prefix."||strip(scan("&vars_char.",&i.))||"_"||

643            "&idlabel_prefix."||

644          put(&idlabel.,&idlabel_format)||

645 %end;

646 %else %do;

647          " "||"&prefix."||strip(scan("&vars_char.",&i.))||"_"||

648            "&idlabel_prefix."||

649 put(&idlabel.,&idlabel_format),

650 %end;

651 %end;

652          &idlabel.

653            into :varlist_char separated by " ",

654                 :junk

655          from &libname_in..&filename_in. (obs=&guessingrows.)

656            order by &idlabel.

657 ;

658 %let num_charvars=&sqlobs.;

659 %end;

660

661 %if &vars_num ne "" %then %do;

662 select distinct

663 %do i=1 %to %sysfunc(countw("&vars_num."));

664 %if &i. lt %sysfunc(countw("&vars_num.")) %then %do;

665          " "||"&prefix."||strip(scan("&vars_num.",&i.))||"_"||

666            "&idlabel_prefix."||

667 put(&idlabel.,&idlabel_format)||

668 %end;

669 %else %do;

670          " "||"&prefix."||strip(scan("&vars_num.",&i.))||"_"||

671            "&idlabel_prefix."||

672          put(&idlabel.,&idlabel_format),

673 %end;

674 %end;

675          &idlabel.

676            into :varlist_num separated by " ",

677                 :junk

678          from &libname_in..&filename_in. (obs=&guessingrows.)

679            order by &idlabel.

680 ;

681 %let num_numvars=&sqlobs.;

682 %end;

683

684 create table _for_format as

685 select distinct &idlabel. as start

686 from &libname_in..&filename_in. (obs=&guessingrows.)

687          order by &idlabel.

688 ;

689 %let num_numlabels=&sqlobs.;

690

691 quit;

692

693 data _for_format;

694 set _for_format;

695 retain fmtname "labelfmt" type "N";

696 label=_n_-1;

697 run;

698

699

700 proc format cntlin = _for_format;

701 run ;

702

703 filename sascode temp;

704 data _null_;

705 file sascode;

706 length var $32;

707 put  "data &libname_out..&filename_out.;";

708 put  "  set &libname_in..&filename_in.;";

709 put  "  by &by_variable.;";

710 %if &vars_char. ne "" %then %do;

711 put  "    array want_char(*) $";

712 %do i=1 %to %eval(&num_numlabels.*%sysfunc(countw("&vars_char.")));

713 var=scan("&varlist_char.",&i.);

714          put @7 var;

715 %end;

716 put @5 ";";

717 put  "    array have_char(*) $ &vars_char.;";

718 put  "    retain want_char;";

719 put  "    if first.&by_variable. then call missing(of want_char(*));";

720 put  "    _nchar=put(&idlabel.,labelfmt.)*dim(have_char);";

721 put  "    do _i=1 to dim(have_char);";

722 put  " want_char(_nchar+_i)=have_char(_i);";

723 put  "    end;";

724 %end;

725 %if &vars_num. ne "" %then %do;

726 put  "    array want_num(*) ";

727        %do i=1 %to %eval(&num_numlabels.*%sysfunc(countw("&vars_num.")));

728 var=scan("&varlist_num.",&i.);

729          put @7 var;

730 %end;

731 put @5 ";";

732 put  "    array have_num(*) &vars_num.;";

733 put  "    retain want_num;";

734 put  "    if first.&by_variable. then call missing(of want_num(*));";

735 put  " _nnum=put(&idlabel.,labelfmt.)*dim(have_num);";

736 put  "    do _i=1 to dim(have_num);";

737 put  "      want_num(_nnum+_i)=have_num(_i);";

738 put  "    end;";

739 %end;

740 put  "  drop &idlabel. _: &vars.;";

741 put  "  if last.&by_variable. then output;";

742 put  "run;";

743 run;

744 %include sascode;

745 %mend transpose;

746

747 options NOQUOTELENMAX;

748 %transpose(filename_in =student_school,

749             by_variable=StdPseudoId,

750             vars=SchCode PctFRlunch--PctTotReclassYr,

751             idlabel=year,

752             idlabel_format=4.,

753             guessingrows=1000)

NOTE: There were 1 observations read from the data set WORK.STUDENT_SCHOOL.

NOTE: The data set WORK._TEMP has 1 observations and 19 variables.

NOTE: Compressing data set WORK._TEMP increased size by 100.00 percent.

Compressed is 2 pages; un-compressed would require 1 pages.

NOTE: DATA statement used (Total process time):

real time           0.22 seconds

user cpu time       0.00 seconds

system cpu time     0.01 seconds

memory              349.00k

OS Memory           18364.00k

Timestamp           01/21/2013 05:08:08 PM

NOTE: No rows were selected.

NOTE: Compression was disabled for data set WORK._FOR_FORMAT because compression overhead would

increase the size of the data set.

NOTE: Table WORK._FOR_FORMAT created, with 3 rows and 1 columns.

NOTE: PROCEDURE SQL used (Total process time):

real time           0.20 seconds

user cpu time       0.07 seconds

system cpu time     0.03 seconds

memory              1460.71k

OS Memory           18364.00k

Timestamp           01/21/2013 05:08:09 PM

NOTE: There were 3 observations read from the data set WORK._FOR_FORMAT.

NOTE: The data set WORK._FOR_FORMAT has 3 observations and 4 variables.

NOTE: Compressing data set WORK._FOR_FORMAT increased size by 100.00 percent.

Compressed is 2 pages; un-compressed would require 1 pages.

NOTE: DATA statement used (Total process time):

real time           0.04 seconds

user cpu time       0.01 seconds

system cpu time     0.00 seconds

memory              299.71k

OS Memory           18364.00k

Timestamp           01/21/2013 05:08:09 PM

NOTE: Format LABELFMT has been output.

NOTE: PROCEDURE FORMAT used (Total process time):

real time           0.06 seconds

user cpu time       0.01 seconds

system cpu time     0.01 seconds

memory              169.15k

OS Memory           18364.00k

Timestamp           01/21/2013 05:08:09 PM

NOTE: There were 3 observations read from the data set WORK._FOR_FORMAT.

NOTE: 72 records were written to the file SASCODE.

The minimum record length was 4.

The maximum record length was 249.

NOTE: DATA statement used (Total process time):

real time           0.04 seconds

user cpu time       0.03 seconds

system cpu time     0.01 seconds

memory              963.56k

OS Memory           19436.00k

Timestamp           01/21/2013 05:08:09 PM

NOTE: Character values have been converted to numeric values at the places given by:

(Line)Smiley SadColumn).

819:11

NOTE: There were 170700 observations read from the data set WORK.STUDENT_SCHOOL.

NOTE: The data set WORK.WANT has 81687 observations and 58 variables.

NOTE: Compressing data set WORK.WANT decreased size by 18.97 percent.

Compressed is 1892 pages; un-compressed would require 2335 pages.

NOTE: DATA statement used (Total process time):

real time           0.87 seconds

user cpu time       0.37 seconds

system cpu time     0.26 seconds

memory              548.03k

OS Memory           18876.00k

Timestamp           01/21/2013 05:08:10 PM

PROC Star
Posts: 7,363

Re: Combining two longitudinal datasets (one student-level and one school-level)

You didn't change all of the code to reflect 10 years of data.  Since your example only had four years of school data, my code only reflected those four years.  I tried to show, in comments, where they "should" have been changed to 9 or 10 based on how they were applied.

However, the code you just posted, didn't reflect those changes, thus you are only getting 4 years of results.

Try it by running the following.  BTW, the macro (while included below) hasn't changed from my previous post, just the datasteps used to prepare the data:

data std_vertical;

  set StdData;

  array sch(*) schcode02-schcode10;

  do i = 1 to 9;

    schcode=sch(i);

    year=2001+i;

    if not missing(schcode) then output;

  end;

  drop schcode02-schcode10 i;

run;

proc sort data=std_vertical;

    by schcode year;

run;

data sch_vertical (keep=schcode

                        year

                        pctFRlunch

                        AvgParEduc

                        PctHisp

                        PctTotELs

                        PctELSpan

                        PctELsPLS

                        PctELsL1);

  set SchData;

  array pctFRlunc(*) pctFRlunch_02-pctFRlunch_10;

  array AvgParEdu(*) AvgParEduc_02-AvgParEduc_10;

  array PctHis(*) PctHisp_02-PctHisp_10;

  array PctTotEL(*) PctTotELs_02-PctTotELs_10;

  array PctELSpa(*) PctELSpan_02-PctELSpan_10;

  array PctELsPL(*) PctELsPLS_02-PctELsPLS_10;

  array PctELsL(*) PctELsL1_02-PctELsL1_10;

  do i = 1 to 9;

    pctFRlunch=pctFRlunc(i);

    AvgParEduc=AvgParEdu(i);

    PctHisp=PctHis(i);

    PctTotELs=PctTotEL(i);

    PctELSpan=PctELSpa(i);

    PctELsPLS=PctELsPL(i);

    PctELsL1=PctELsL(i);

    year=2001+i;

    output;

  end;

run;

proc sort data=sch_vertical;

    by schcode year;

run;

data student_school;

    merge sch_vertical (in=a) Std_vertical (in=b);

    by schcode year;

    if a and b;

run;

proc sort data=student_school;

  by studentID year;

run;

%macro transpose(libname_in=work,

                 libname_out=work,

                 filename_in =have,

                 filename_out=want,

                 by_variable=id,

                 vars=ind1-ind4,

                 prefix=,

                 idlabel=date,

                 idlabel_format=yymon7.,

                 idlabel_prefix=,

                 guessingrows=1000);

  data _temp;

    set &libname_in..&filename_in. (obs=1 keep=&vars.);

  run;

  %let vars_char="";

  %let varlist_char="";

  %let vars_num="";

  %let varlist_num="";

  proc sql noprint;

    select name

      into :vars_char separated by " "

        from dictionary.columns

          where libname="WORK" and

              memname="_TEMP" and

              type="char"

    ;

    select name

      into :vars_num separated by " "

        from dictionary.columns

          where libname="WORK" and

              memname="_TEMP" and

              type="num"

    ;

  %if &vars_char ne "" %then %do;

    select distinct

    %do i=1 %to %sysfunc(countw("&vars_char."));

      %if &i. lt %sysfunc(countw("&vars_char.")) %then %do;

        " "||"&prefix."||strip(scan("&vars_char.",&i.))||"_"||

          "&idlabel_prefix."||

        put(&idlabel.,&idlabel_format)||

      %end;

      %else %do;

        " "||"&prefix."||strip(scan("&vars_char.",&i.))||"_"||

          "&idlabel_prefix."||

        put(&idlabel.,&idlabel_format),

      %end;

    %end;

        &idlabel.

          into :varlist_char separated by " ",

               :junk

        from &libname_in..&filename_in. (obs=&guessingrows.)

          order by &idlabel.

    ;

    %let num_charvars=&sqlobs.;

  %end;

  %if &vars_num ne "" %then %do;

    select distinct

    %do i=1 %to %sysfunc(countw("&vars_num."));

      %if &i. lt %sysfunc(countw("&vars_num.")) %then %do;

        " "||"&prefix."||strip(scan("&vars_num.",&i.))||"_"||

          "&idlabel_prefix."||

        put(&idlabel.,&idlabel_format)||

      %end;

      %else %do;

        " "||"&prefix."||strip(scan("&vars_num.",&i.))||"_"||

          "&idlabel_prefix."||

        put(&idlabel.,&idlabel_format),

      %end;

    %end;

        &idlabel.

          into :varlist_num separated by " ",

               :junk

        from &libname_in..&filename_in. (obs=&guessingrows.)

          order by &idlabel.

    ;

    %let num_numvars=&sqlobs.;

  %end;

  create table _for_format as

    select distinct &idlabel. as start

      from &libname_in..&filename_in. (obs=&guessingrows.)

        order by &idlabel.

    ;

  %let num_numlabels=&sqlobs.;

  quit;

  data _for_format;

    set _for_format;

    retain fmtname "labelfmt" type "N";

    label=_n_-1;

  run;

  proc format cntlin = _for_format;

  run ;

  filename sascode temp;

  data _null_;

    file sascode;

    length var $32;

    put  "data &libname_out..&filename_out.;";

    put  "  set &libname_in..&filename_in.;";

    put  "  by &by_variable.;";

    %if &vars_char. ne "" %then %do;

      put  "    array want_char(*) $";

      %do i=1 %to %eval(&num_numlabels.*%sysfunc(countw("&vars_char.")));

        var=scan("&varlist_char.",&i.);

        put  @7 var;

      %end;

      put @5 ";";

      put  "    array have_char(*) $ &vars_char.;";

      put  "    retain want_char;";

      put  "    if first.&by_variable. then call missing(of want_char(*));";

      put  "    _nchar=put(&idlabel.,labelfmt.)*dim(have_char);";

      put  "    do _i=1 to dim(have_char);";

      put  "      want_char(_nchar+_i)=have_char(_i);";

      put  "    end;";

    %end;

    %if &vars_num. ne "" %then %do;

      put  "    array want_num(*) ";

      %do i=1 %to %eval(&num_numlabels.*%sysfunc(countw("&vars_num.")));

        var=scan("&varlist_num.",&i.);

        put  @7 var;

      %end;

      put @5 ";";

      put  "    array have_num(*) &vars_num.;";

      put  "    retain want_num;";

      put  "    if first.&by_variable. then call missing(of want_num(*));";

      put  "    _nnum=put(&idlabel.,labelfmt.)*dim(have_num);";

      put  "    do _i=1 to dim(have_num);";

      put  "      want_num(_nnum+_i)=have_num(_i);";

      put  "    end;";

    %end;

      put  "  drop &idlabel. _: &vars.;";

      put  "  if last.&by_variable. then output;";

      put  "run;";

  run;

  %include sascode;

%mend transpose;

options NOQUOTELENMAX;

%transpose(filename_in =student_school,

           by_variable=StudentId,

           vars=SchCode pctFRlunch--PctELsL1,

           idlabel=year,

           idlabel_format=4.,

           guessingrows=1000)


https://communities.sas.com/message/151025/edit

Occasional Contributor
Posts: 7

Re: Combining two longitudinal datasets (one student-level and one school-level)

ah! I had made my edits to the code (regarding additional variables) in Word… and then read through the log to try and figure out the error...  and I mistook the comments for code. An embarrassing mistake! I should have examined the code in the editor window – the color-coding would have helped me.

Your comments were actually very clear. I see now.  And this time the resulting 'want' dataset was correct.

There were, however, 139,616 observations rather than the original 139,909.  I figured out this is because 293 students attended schools that are, for some reason, not included in the school file. So those students were dropped from the ‘want’ dataset.  This isn’t a problem, as I can just add them back in, but I figured I’d let you know in case it’s relevant for your macro.

Thank you!!

PROC Star
Posts: 7,363

Re: Combining two longitudinal datasets (one student-level and one school-level)

I would be interested to know how much real and CPU time the process required for your data.  In our tests, thus far, the %transpose macro runs almost 40 times faster than doing similar complex jobs using PROC TRANSPOSE.

Occasional Contributor
Posts: 7

Re: Combining two longitudinal datasets (one student-level and one school-level)

It was fast.  I've pasted the log below so you can see the time for the whole process. But, you should know I ran this on a MacBook Pro running SAS via Virtual Machine. Had I booted up into Windows directly (my hard drive is partitioned), I imagine it would have been faster still.

1818

1819

1820   data std_vertical;

1821     set StdData;

1822     array sch(*) schcode02-schcode10;

1823     do i = 1 to 9;

1824       schcode=sch(i);

1825       year=2001+i;

1826       if not missing(schcode) then output;

1827     end;

1828     drop schcode02-schcode10 i;

1829   run;

NOTE: There were 139909 observations read from the data set WORK.STDDATA.

NOTE: The data set WORK.STD_VERTICAL has 810540 observations and 3 variables.

NOTE: Compressing data set WORK.STD_VERTICAL increased size by 27.96 percent.

      Compressed is 6174 pages; un-compressed would require 4825 pages.

NOTE: DATA statement used (Total process time):

      real time           0.93 seconds

      user cpu time       0.29 seconds

      system cpu time     0.32 seconds

      memory              346.03k

      OS Memory           18620.00k

      Timestamp           01/21/2013 11:31:01 PM

1830

1831   proc sort data=std_vertical;

1832       by schcode year;

1833   run;

NOTE: There were 810540 observations read from the data set WORK.STD_VERTICAL.

NOTE: The data set WORK.STD_VERTICAL has 810540 observations and 3 variables.

NOTE: Compressing data set WORK.STD_VERTICAL increased size by 27.96 percent.

      Compressed is 6174 pages; un-compressed would require 4825 pages.

NOTE: PROCEDURE SORT used (Total process time):

      real time           1.48 seconds

      user cpu time       0.99 seconds

      system cpu time     0.43 seconds

      memory              45408.96k

      OS Memory           64700.00k

      Timestamp           01/21/2013 11:31:03 PM

1834

1835   data sch_vertical (keep=schcode

1836   year

1837   PctFRlunch

1838   PctRespParEd

1839   AvgParEd

1840   SCI

1841   TotEnr

1842   AfrAmerPct

1843   HispPct

1844   WhitePct

1845   PctTotELs

1846   PctELsSpan

1847   PctELsELDorSDAIE

1848   PctELsPLS

1849   PctELsL1

1850   SpanTstoSsRatio

1851   SpanAidestoSsRatio

1852   SpanTsandAidestoSsRatio

1853   EngTstoSsRatio

1854   PctTotReclassYr);

1855    set SchData;

1856    array PctFRlunc(*) PctFRlunch_02-PctFRlunch_10; /*-PctFRlunch_10*/;

1857    array PctRespParE(*) PctRespParEd_02-PctRespParEd_10; /*-PctRespParEd_10*/;

1858    array AvgParE(*) AvgParEd_02-AvgParEd_10; /*-AvgParEd_10*/;

1859    array SC(*) SCI_02-SCI_10; /*-SCI_10*/;

1860    array TotEn(*) TotEnr_02-TotEnr_10; /*-TotEnr_10*/;

1861    array AfrAmerPc(*) AfrAmerPct_02-AfrAmerPct_10; /*-AfrAmerPct_10*/;

1862    array HispPc(*) HispPct_02-HispPct_10; /*-HispPct_10*/;

1863    array WhitePc(*) WhitePct_02-WhitePct_10; /*-WhitePct_10*/;

1864    array PctTotEL(*) PctTotELs_02-PctTotELs_10; /*-PctTotELs_10*/;

1865    array PctELsSpa(*) PctELsSpan_02-PctELsSpan_10; /*-PctELsSpan_10*/;

1866    array PctELsELDorSDAI(*) PctELsELDorSDAIE_02-PctELsELDorSDAIE_10; /*-PctELsELDorSDAIE_10*/;

1867     array PctELsPL(*) PctELsPLS_02-PctELsPLS_10; /*-PctELsPLS_10*/;

1868     array PctELsL(*) PctELsL1_02-PctELsL1_10; /*-PctELsL1_10*/;

1869     array SpanTstoSsRati(*) SpanTstoSsRatio_02-SpanTstoSsRatio_10; /*-SpanTstoSsRatio_10*/;

1870     array SpanAidestoSsRati(*) SpanAidestoSsRatio_02-SpanAidestoSsRatio_10;

1870 ! /*-SpanAidestoSsRatio_10*/;

1871     array SpanTsandAidestoSsRati(*) SpanTsandAidestoSsRatio_02-SpanTsandAidestoSsRatio_10;

1871 ! /*-SpanTsandAidestoSsRatio_10*/;

1872     array EngTstoSsRati(*) EngTstoSsRatio_02-EngTstoSsRatio_10; /*-EngTstoSsRatio_10*/;

1873     array PctTotReclassY(*) PctTotReclassYr_02-PctTotReclassYr_10; /*-PctTotReclassYr_10*/;

1874     do i = 1 to 9; /*9*/

1875       PctFRlunch=PctFRlunc(i);

1876       PctRespParEd=PctRespParE(i);

1877       AvgParEd=AvgParE(i);

1878       SCI=SC(i);

1879       TotEnr=TotEn(i);

1880       AfrAmerPct=AfrAmerPc(i);

1881       HispPct=HispPc(i);

1882       WhitePct=WhitePc(i);

1883       PctTotELs=PctTotEL(i);

1884       PctELsSpan=PctELsSpa(i);

1885       PctELsELDorSDAIE=PctELsELDorSDAI(i);

1886       PctELsPLS=PctELsPL(i);

1887       PctELsL1=PctELsL(i);

1888       SpanTstoSsRatio=SpanTstoSsRati(i);

1889       SpanAidestoSsRatio=SpanAidestoSsRati(i);

1890       SpanTsandAidestoSsRatio=SpanTsandAidestoSsRati(i);

1891       EngTstoSsRatio=EngTstoSsRati(i);

1892       PctTotReclassYr=PctTotReclassY(i);

1893       year=2001+i;

1894       output;

1895     end;

1896   run;

NOTE: There were 599 observations read from the data set WORK.SCHDATA.

NOTE: The data set WORK.SCH_VERTICAL has 5391 observations and 20 variables.

NOTE: Compressing data set WORK.SCH_VERTICAL decreased size by 14.02 percent.

      Compressed is 92 pages; un-compressed would require 107 pages.

NOTE: DATA statement used (Total process time):

      real time           0.06 seconds

      user cpu time       0.01 seconds

      system cpu time     0.04 seconds

      memory              718.37k

      OS Memory           18620.00k

      Timestamp           01/21/2013 11:31:03 PM

1897

1898   proc sort data=sch_vertical;

1899       by schcode year;

1900   run;

NOTE: There were 5391 observations read from the data set WORK.SCH_VERTICAL.

NOTE: The data set WORK.SCH_VERTICAL has 5391 observations and 20 variables.

NOTE: Compressing data set WORK.SCH_VERTICAL decreased size by 14.02 percent.

      Compressed is 92 pages; un-compressed would require 107 pages.

NOTE: PROCEDURE SORT used (Total process time):

      real time           0.01 seconds

      user cpu time       0.01 seconds

      system cpu time     0.00 seconds

      memory              1240.29k

      OS Memory           18620.00k

      Timestamp           01/21/2013 11:31:03 PM

1901

1902   data student_school;

1903       merge sch_vertical (in=a) Std_vertical (in=b);

1904       by schcode year;

1905       if a and b;

1906   run;

NOTE: There were 5391 observations read from the data set WORK.SCH_VERTICAL.

NOTE: There were 810540 observations read from the data set WORK.STD_VERTICAL.

NOTE: The data set WORK.STUDENT_SCHOOL has 804830 observations and 21 variables.

NOTE: Compressing data set WORK.STUDENT_SCHOOL decreased size by 12.13 percent.

      Compressed is 14734 pages; un-compressed would require 16768 pages.

NOTE: DATA statement used (Total process time):

      real time           2.83 seconds

      user cpu time       0.67 seconds

      system cpu time     1.09 seconds

      memory              543.43k

      OS Memory           18620.00k

      Timestamp           01/21/2013 11:31:06 PM

1907

1908   proc sort data=student_school;

1909     by StdPseudoId year;

1910   run;

NOTE: There were 804830 observations read from the data set WORK.STUDENT_SCHOOL.

NOTE: The data set WORK.STUDENT_SCHOOL has 804830 observations and 21 variables.

NOTE: Compressing data set WORK.STUDENT_SCHOOL decreased size by 12.12 percent.

      Compressed is 14736 pages; un-compressed would require 16768 pages.

NOTE: PROCEDURE SORT used (Total process time):

      real time           5.08 seconds

      user cpu time       1.65 seconds

      system cpu time     1.13 seconds

      memory              161260.60k

      OS Memory           181428.00k

      Timestamp           01/21/2013 11:31:11 PM

1911

1912   %macro transpose(libname_in=work,

1913                    libname_out=work,

1914                    filename_in =have,

1915                    filename_out=want,

1916                    by_variable=id,

1917                    vars=ind1-ind4,

1918                    prefix=,

1919                    idlabel=date,

1920                    idlabel_format=yymon7.,

1921                    idlabel_prefix=,

1922                    guessingrows=1000);

1923

1924     data _temp;

1925       set &libname_in..&filename_in. (obs=1 keep=&vars.);

1926     run;

1927

1928     %let vars_char="";

1929     %let varlist_char="";

1930     %let vars_num="";

1931     %let varlist_num="";

1932

1933     proc sql noprint;

1934       select name

1935         into :vars_char separated by " "

1936           from dictionary.columns

1937             where libname="WORK" and

1938                 memname="_TEMP" and

1939                 type="char"

1940       ;

1941       select name

1942         into :vars_num separated by " "

1943           from dictionary.columns

1944             where libname="WORK" and

1945                 memname="_TEMP" and

1946                 type="num"

1947       ;

1948

1949     %if &vars_char ne "" %then %do;

1950       select distinct

1951       %do i=1 %to %sysfunc(countw("&vars_char."));

1952         %if &i. lt %sysfunc(countw("&vars_char.")) %then %do;

1953           " "||"&prefix."||strip(scan("&vars_char.",&i.))||"_"||

1954             "&idlabel_prefix."||

1955           put(&idlabel.,&idlabel_format)||

1956         %end;

1957         %else %do;

1958           " "||"&prefix."||strip(scan("&vars_char.",&i.))||"_"||

1959             "&idlabel_prefix."||

1960           put(&idlabel.,&idlabel_format),

1961         %end;

1962       %end;

1963           &idlabel.

1964             into :varlist_char separated by " ",

1965                  :junk

1966           from &libname_in..&filename_in. (obs=&guessingrows.)

1967             order by &idlabel.

1968       ;

1969       %let num_charvars=&sqlobs.;

1970     %end;

1971

1972     %if &vars_num ne "" %then %do;

1973       select distinct

1974       %do i=1 %to %sysfunc(countw("&vars_num."));

1975         %if &i. lt %sysfunc(countw("&vars_num.")) %then %do;

1976           " "||"&prefix."||strip(scan("&vars_num.",&i.))||"_"||

1977             "&idlabel_prefix."||

1978           put(&idlabel.,&idlabel_format)||

1979         %end;

1980         %else %do;

1981           " "||"&prefix."||strip(scan("&vars_num.",&i.))||"_"||

1982             "&idlabel_prefix."||

1983           put(&idlabel.,&idlabel_format),

1984         %end;

1985       %end;

1986           &idlabel.

1987             into :varlist_num separated by " ",

1988                  :junk

1989           from &libname_in..&filename_in. (obs=&guessingrows.)

1990             order by &idlabel.

1991       ;

1992       %let num_numvars=&sqlobs.;

1993     %end;

1994

1995     create table _for_format as

1996       select distinct &idlabel. as start

1997         from &libname_in..&filename_in. (obs=&guessingrows.)

1998           order by &idlabel.

1999       ;

2000     %let num_numlabels=&sqlobs.;

2001

2002     quit;

2003

2004     data _for_format;

2005       set _for_format;

2006       retain fmtname "labelfmt" type "N";

2007       label=_n_-1;

2008     run;

2009

2010

2011     proc format cntlin = _for_format;

2012     run ;

2013

2014     filename sascode temp;

2015     data _null_;

2016       file sascode;

2017       length var $32;

2018       put  "data &libname_out..&filename_out.;";

2019       put  "  set &libname_in..&filename_in.;";

2020       put  "  by &by_variable.;";

2021       %if &vars_char. ne "" %then %do;

2022         put  "    array want_char(*) $";

2023         %do i=1 %to %eval(&num_numlabels.*%sysfunc(countw("&vars_char.")));

2024           var=scan("&varlist_char.",&i.);

2025           put  @7 var;

2026         %end;

2027         put @5 ";";

2028         put  "    array have_char(*) $ &vars_char.;";

2029         put  "    retain want_char;";

2030         put  "    if first.&by_variable. then call missing(of want_char(*));";

2031         put  "    _nchar=put(&idlabel.,labelfmt.)*dim(have_char);";

2032         put  "    do _i=1 to dim(have_char);";

2033         put  "      want_char(_nchar+_i)=have_char(_i);";

2034         put  "    end;";

2035       %end;

2036       %if &vars_num. ne "" %then %do;

2037         put  "    array want_num(*) ";

2038         %do i=1 %to %eval(&num_numlabels.*%sysfunc(countw("&vars_num.")));

2039           var=scan("&varlist_num.",&i.);

2040           put  @7 var;

2041         %end;

2042         put @5 ";";

2043         put  "    array have_num(*) &vars_num.;";

2044         put  "    retain want_num;";

2045         put  "    if first.&by_variable. then call missing(of want_num(*));";

2046         put  "    _nnum=put(&idlabel.,labelfmt.)*dim(have_num);";

2047         put  "    do _i=1 to dim(have_num);";

2048         put  "      want_num(_nnum+_i)=have_num(_i);";

2049         put  "    end;";

2050       %end;

2051         put  "  drop &idlabel. _: &vars.;";

2052         put  "  if last.&by_variable. then output;";

2053         put  "run;";

2054     run;

2055     %include sascode;

2056   %mend transpose;

2057

2058   options NOQUOTELENMAX;

2059   %transpose(filename_in =student_school,

2060              by_variable=StdPseudoId,

2061              vars=SchCode PctFRlunch--PctTotReclassYr,

2062              idlabel=year,

2063              idlabel_format=4.,

2064              guessingrows=1000)

NOTE: There were 1 observations read from the data set WORK.STUDENT_SCHOOL.

NOTE: The data set WORK._TEMP has 1 observations and 19 variables.

NOTE: Compressing data set WORK._TEMP increased size by 100.00 percent.

      Compressed is 2 pages; un-compressed would require 1 pages.

NOTE: DATA statement used (Total process time):

      real time           0.10 seconds

      user cpu time       0.00 seconds

      system cpu time     0.00 seconds

      memory              349.37k

      OS Memory           18620.00k

      Timestamp           01/21/2013 11:31:11 PM

NOTE: No rows were selected.

NOTE: Compression was disabled for data set WORK._FOR_FORMAT because compression overhead would

      increase the size of the data set.

NOTE: Table WORK._FOR_FORMAT created, with 9 rows and 1 columns.

NOTE: PROCEDURE SQL used (Total process time):

      real time           0.18 seconds

      user cpu time       0.01 seconds

      system cpu time     0.00 seconds

      memory              1446.71k

      OS Memory           18620.00k

      Timestamp           01/21/2013 11:31:11 PM

NOTE: There were 9 observations read from the data set WORK._FOR_FORMAT.

NOTE: The data set WORK._FOR_FORMAT has 9 observations and 4 variables.

NOTE: Compressing data set WORK._FOR_FORMAT increased size by 100.00 percent.

      Compressed is 2 pages; un-compressed would require 1 pages.

NOTE: DATA statement used (Total process time):

      real time           0.01 seconds

      user cpu time       0.00 seconds

      system cpu time     0.00 seconds

      memory              302.03k

      OS Memory           18620.00k

      Timestamp           01/21/2013 11:31:11 PM

NOTE: Format LABELFMT is already on the library.

NOTE: Format LABELFMT has been output.

NOTE: PROCEDURE FORMAT used (Total process time):

      real time           0.00 seconds

      user cpu time       0.00 seconds

      system cpu time     0.00 seconds

      memory              169.37k

      OS Memory           18620.00k

      Timestamp           01/21/2013 11:31:11 PM

NOTE: There were 9 observations read from the data set WORK._FOR_FORMAT.

NOTE: 186 records were written to the file SASCODE.

      The minimum record length was 4.

      The maximum record length was 249.

NOTE: DATA statement used (Total process time):

      real time           0.04 seconds

      user cpu time       0.00 seconds

      system cpu time     0.01 seconds

      memory              2035.60k

      OS Memory           20180.00k

      Timestamp           01/21/2013 11:31:11 PM

NOTE: Character values have been converted to numeric values at the places given by:

      (Line)Smiley SadColumn).

      2244:11

NOTE: There were 804830 observations read from the data set WORK.STUDENT_SCHOOL.

NOTE: The data set WORK.WANT has 139616 observations and 172 variables.

NOTE: Compressing data set WORK.WANT decreased size by 27.67 percent.

      Compressed is 9181 pages; un-compressed would require 12694 pages.

NOTE: DATA statement used (Total process time):

      real time           6.66 seconds

      user cpu time       0.74 seconds

      system cpu time     0.65 seconds

      memory              660.89k

      OS Memory           18620.00k

      Timestamp           01/21/2013 11:31:18 PM

Super Contributor
Posts: 543

Re: Combining two longitudinal datasets (one student-level and one school-level)

Hi.

How about this?

Using some of Jagadish's code:

(the 'vertical' data set that he created)


data long;

    set std;

    array sch(9) schcode02-schcode10;

    do i = 1 to 9;

    schcode=sch(i);

    output;

    end;

    drop schcode02-schcode10 i;

run;


*sort by studentId so we can merge back with the student data;

proc sort data = std;by studentId;

proc sort data = long;by studentId;

data temp;

  merge std long;

by studentId;

run;

*sort by school code, so we can merge;

proc sort data = temp;by schCode;

proc sort data = school;by schCode;

data want;

  merge temp school;

by schCode;

drop schCode02--SchCode10;

if schCode ne .;

run;

proc sort data = want;by studentId;run;

Good luck!

Anca.

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 316 views
  • 0 likes
  • 4 in conversation