Solved: Combining two tables with same variables and % difference - Page 2

data_null__ · Posted 08-24-2012 05:56 PM

Here is the "complete" program I think. You will still need to have proper SAS names when the extra text is added. The technique of creating a set of dummy observations to control how TRANSPOSE create variable names has been discussed here in the past. If you write many SAS program will will probably need it in future.

Edit:

It was brought to my attention that I was using the wrong _TYPE_ for the difference.

data fy11;

input Agency:$10. (Safety_Score Recreation_score Contacts_score)(:percent.);

cards;

Agency1 98% 95% 95%

Agency2 100% 96% 96%

Agency3 94% 91% 91%

Agency4 92% 92% 92%

Agency5 95% 95% 95%

Agency6 96% 83% 83%

Agency7 99% 87% 87%

Agency8 93% 88% 88%

Agency9 93% 86% 86%

Agency10 96% 87% 87%

;;;;

run;

proc sort sortseq=linguistic(numeric_collation=on);

by agency;

run;

data fy12;

input Agency:$10. (Safety_Score Recreation_score Contacts_score)(:percent.);

cards;

Agency1 98% 93% 93%

Agency2 100% 96% 96%

Agency3 94% 93% 91%

Agency4 93% 92% 100%

Agency5 95% 95% 95%

Agency6 96% 83% 83%

Agency7 99% 100% 87%

Agency8 93% 88% 88%

Agency9 93% 86% 93%

Agency10 100% 87% 87%

;;;;

run;

proc sort sortseq=linguistic(numeric_collation=on);

by agency;

run;

%let var=Safety_Score--Contacts_score;

proc compare noprint base=fy11 compare=fy12 out=outall outall;

by agency;

var &var;

run;

proc print;

run;

proc format;

value $tcode 'BASE'='FY11' 'COMPARE'='FY12' 'DIF'='PDIF';

run;

proc transpose data=outall out=tall;

where _type_ ne 'PERCENT';

by agency _type_ notsorted;

var &var;

run;

proc transpose data=outall(obs=0) out=varlist;

var &var;

run;

data varlist;

set varlist;

if 0 then set outall(keep=_type_);

do _type_ = 'BASE','COMPARE','DIF';

output;

end;

run;

data tallV / view=tallV;

set varlist tall;

run;

proc transpose

data=tallV

out=wide(where=(not missing(agency)))

delim=_;

by agency notsorted;

var col1;

id _type_ _name_;

format _type_ $tcode.;

run;

proc print;

run;

Message was edited by: data _null_

MikeZdeb · Posted 08-26-2012 12:32 PM

hi ... here's another method with one data step (assumes data sets are sorted by agency and the variables are in the same order in the two data sets)

it uses the SET BY idea posted by MKEINTZ (and the INPUT buffer idea from PGStats posted a couple weeks ago)

filename nosee dummy;

data ax (drop=x: y: j);

length type $4;

file nosee;

* set up input buffer ... from PGStats;

if _n_ eq 1 then input @;

do until (last.agency);

* set two data sets with by ... from mkeintz;

set fy11 fy12 indsname=ds;

by agency;

type = scan(ds,2);

output;

array z(*) _numeric_;

* write values to output buffer;

put (_numeric_) (:) @;

end;

* read values, compute difference;

_infile_ = _file_;

input @1 x1-x3 y1-y3 @@;

array x(3);

array y(3);

do j=1 to dim(z);

z(j) = y(j) - x(j);

end;

type='PDIF';

output;

* clear output buffer;

put;

datalines;

*

;

* place AGENCY10 at the end ... data _null_'s idea;

proc sort data=ax sortseq=linguistic(numeric_collation=on);

by agency;

run;

partial output ...

Safety_ Recreation_ Contacts_

Obs type agency Score score score

1 FY11 Agency1 0.98 0.95 0.95

2 FY12 Agency1 0.98 0.93 0.93

3 PDIF Agency1 0.00 -0.02 -0.02

4 FY11 Agency2 1.00 0.96 0.96

5 FY12 Agency2 1.00 0.96 0.96

6 PDIF Agency2 0.00 0.00 0.00

7 FY11 Agency3 0.94 0.91 0.91

8 FY12 Agency3 0.94 0.93 0.91

mkeintz · Posted 08-27-2012 12:18 AM

Very interesting, but you still need another step to have one wide row per agency which I think is what the OP was looking for.

But I think that the more important concern about the technique of writing to _FILE_ as a temporary storage location is that you need to be certain that no precision is lost in the PUT statement (but of course you could use the RB8.0 format to preserve precision) In this example of course there's no problem of that sort.

Also the OP mentioned 100's of vars, which I presume means either a much longer INPUT statement, or the creation of a macrovar to use in the INPUT statement. (Struck out this sentence - it is wrong - MK).

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Linlin · Posted 08-23-2012 06:28 PM

Hi,

the line "%put &list;" produced the error messages. it doesn't affect the outcome. I have deleted "%put &list4" from the code.

Ksharp · Posted 08-24-2012 01:50 AM

Since you have so many and so long variables, Maybe you need substr it to fit the requirement of 32 long variable .

data FY11;
input Agency $ (Safety_Score Recreation_score Contacts_score) (: percent.) ;
cards;
Agency1 98% 95% 95%
Agency2 100% 96% 96%
;
run;
data FY12;
input Agency $ (Safety_Score Recreation_score Contacts_score) (: percent.)  ;
cards;
Agency1 98% 93% 93%
Agency2 100% 96% 96%
;
run;
proc sql noprint;
select substr(cats('DIFF_',name),1,32) into : list separated by ' '
 from dictionary.columns
  where libname='WORK' and memname eq 'FY11' and upcase(name) ne 'AGENCY' ;
quit;
data _null_;
 set sashelp.vcolumn(keep=libname memname name where=(libname='WORK' and memname in ('FY11' 'FY12') and upcase(name) ne 'AGENCY')) end=last;
 if _n_ eq 1 then call execute('proc datasets library=work nolist; ');
 if memname ne lag(memname) then call execute('modify '||memname||'; ');
 call execute('rename '||name||'='||substr(cats(memname,'_',name),1,32)||';');
 if last then call execute('quit;');
run;

data want;
 merge FY11 FY12;
 by agency;
 array _a{*} FY11: ;
 array _b{*} FY12: ;
 array _c{*} &list  ;
 do _n_=1 to dim(_a);
  _c{_n_}=_a{_n_}-_b{_n_};
 end;
run;

Ksharp

mkeintz · Posted 08-26-2012 12:46 AM

I think this is taks where a good case can be made for the simplification of using SET with BY, as in:

%let vars=safety_score--contacts_score;

proc transpose data=fy11 (keep=&vars obs=0) out=vnames;
run;

data _null_;

set vnames end=end_of_names;

length allnames $32000;

retain allnames ' ';

allnames=catx(' ',allnames,cats('FY11_',_name_),cats('FY12_',_name_),cats('PDIF_',_name_));

if end_of_names then call symput('allvars',trim(allnames));

run;

data final (drop=&vars);

do until (last.agency);

set fy11 (in=in11) fy12 (in=in12);

by agency;

array av {*} &allvars;

array v {*} &vars;

if in11 then do _n_=1 to dim(v);

av{3*_n_-2}=v{_n_};

end;

else if in12 then do _n_=1 to dim(v);

av{3*_n_-1}=v{_n_};

av{3*_n_} = av{3*_n_-1} - av{3*_n_-2};

end;

run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Re: Combining two tables with same variables and % difference

Re: Combining two tables with same variables and % difference

Re: Combining two tables with same variables and % difference

Re: Combining two tables with same variables and % difference

Re: Combining two tables with same variables and % difference

Re: Combining two tables with same variables and % difference

Re: Combining two tables with same variables and % difference

Re: Combining two tables with same variables and % difference

Re: Combining two tables with same variables and % difference

Re: Combining two tables with same variables and % difference

Re: Combining two tables with same variables and % difference

Re: Combining two tables with same variables and % difference

SAS Innovate 2025: Call for Content

Click image to register for webinar

Classroom Training Available!