Solved: Re: Combining two tables with same variables and % difference

RobertNYC · Posted 08-23-2012 03:09 PM

Hi all—

So, I need a little help regarding a joining two tables together and then rename variables. Any assistance will be greatly appreciated.

I start with two tables that are similar to the examples below. All the variable names are the same in each table but there are over 100 variables in each table.

FY11 SCORES

Agency	Safety_Score	Recreation_score	Contacts_score
Agency1	98%	95%	95%
Agency2	100%	96%	96%
Agency3	94%	91%	91%
Agency4	92%	92%	92%
Agency5	95%	95%	95%
Agency6	96%	83%	83%
Agency7	99%	87%	87%
Agency8	93%	88%	88%
Agency9	93%	86%	86%
Agency10	96%	87%	87%

FY12 SCORES

Agency	Safety_Score	Recreation_score	Contacts_score
Agency1	98%	93%	93%
Agency2	100%	96%	96%
Agency3	94%	93%	91%
Agency4	93%	92%	100%
Agency5	95%	95%	95%
Agency6	96%	83%	83%
Agency7	99%	100%	87%
Agency8	93%	88%	88%
Agency9	93%	86%	93%
Agency10	100%	87%	87%

The final product need should look like this:

Agency	FY11 _Safety_Score	FY12 _Safety_Score	% Difference	FY11_Recreation_score	FY12_Recreation_score	% Difference	FY11_Contacts_score	FY12_Contacts_score	% Difference
Agency1	98%	95%	-3%	98%	96%	-2%	95%	95%	0%
Agency2	100%	96%	-4%	99%	96%	-3%	96%	96%	0%
Agency3	94%	91%	-3%	91%	88%	-3%	91%	91%	0%
Agency4	92%	92%	0%	91%	94%	3%	92%	92%	0%
Agency5	95%	95%	0%	95%	94%	-1%	95%	95%	0%
Agency6	96%	83%	-13%	94%	73%	-21%	83%	83%	0%
Agency7	99%	87%	-12%	98%	83%	-15%	87%	87%	0%
Agency8	93%	88%	-5%	95%	89%	-6%	88%	88%	0%
Agency9	93%	86%	-7%	93%	79%	-14%	86%	86%	0%
Agency10	96%	87%	-9%	92%	84%	-8%	87%	87%	0%

What I need to do is:

Add a FY11_ or FY_12 prefix to all the variables to identify which are they are from.
Combine the tables so I have FY11 and FY12 scores side by side.
Calculate the % cent difference for each paring.

Now, I have a program that adds the prefixes to the varibles (see below) but the issue is some of the variable titles are too long so when the program trys to attach the prefix it doent work and I get error meaages. Is there a way to delete some of the charters in the variable names which are too long?

proc sql;

select cat(name, ' = ', cats('FY11_', name ))

into :renstr separated by ' '

from

dictionary.columns where libname = 'WORK' and memname='FY11_data';

quit;

data Fy11_data_new;

set Fy11_data (rename = (&renstr));

run;

You can see why combine these tables just by cutting and pasting in excel would be very difficult and take forever.

Again , any help with this is greatly appreciated!!!!!!!!!

data_null__ · Posted 08-23-2012 05:31 PM

You have a lot of your data stored in the meta data, the variables names. I would transpose the data and extract the metadata from the variable names into varariables. Then you can easily join(merge) data with many rows but only one column for each fiscal year and compute the %dif. If you want to make it wide again that will be easy and you can construct new var names with the proper attributes. Your var names look like labels.

It is easy to go from tall to wide and tall is much easier to work with.

With regards to you exact question "How to DIFF matching variables from two data sets" that can be easily done with PROC COMPARE.

data fy11;

input Agency:$10. (Safety_Score Recreation_score Contacts_score)(percent.);

cards;

Agency1 98% 95% 95%

Agency2 100% 96% 96%

Agency3 94% 91% 91%

Agency4 92% 92% 92%

Agency5 95% 95% 95%

Agency6 96% 83% 83%

Agency7 99% 87% 87%

Agency8 93% 88% 88%

Agency9 93% 86% 86%

Agency10 96% 87% 87%

;;;;

run;

proc sort sortseq=linguistic(numeric_collation=on);

by agency;

run;

data fy12;

input Agency:$10. (Safety_Score Recreation_score Contacts_score)(percent.);

cards;

Agency1 98% 93% 93%

Agency2 100% 96% 96%

Agency3 94% 93% 91%

Agency4 93% 92% 100%

Agency5 95% 95% 95%

Agency6 96% 83% 83%

Agency7 99% 100% 87%

Agency8 93% 88% 88%

Agency9 93% 86% 93%

Agency10 100% 87% 87%

;;;;

run;

proc sort sortseq=linguistic(numeric_collation=on);

by agency;

run;

proc compare noprint base=fy11 compare=fy12 out=outall outall;

by agency;

var Safety_Score--Contacts_score;

run;

proc format;

value $tcode 'BASE'='FY11' 'COMPARE'='FY12' 'PERCENT'='PDIF';

run;

proc print data=outall(drop=_obs_);

where _type_ ne 'DIF';

format _type_ $tcode.;

run;

Safety_ Recreation_ Contacts_

Obs _TYPE_ Agency Score score score

1 FY11 Agency1 0.98000 0.9500 0.95000

2 FY12 Agency1 0.98000 0.9300 0.93000

4 PDIF Agency1 0.00000 -2.1053 -2.10526

5 FY11 Agency2 1.00000 0.9600 0.96000

6 FY12 Agency2 1.00000 0.9600 0.96000

8 PDIF Agency2 0.00000 0.0000 0.00000

9 FY11 Agency3 0.94000 0.9100 0.91000

10 FY12 Agency3 0.94000 0.9300 0.91000

12 PDIF Agency3 0.00000 2.1978 0.00000

13 FY11 Agency4 0.92000 0.9200 0.92000

14 FY12 Agency4 0.93000 0.9200 1.00000

16 PDIF Agency4 1.08696 0.0000 8.69565

17 FY11 Agency5 0.95000 0.9500 0.95000

18 FY12 Agency5 0.95000 0.9500 0.95000

20 PDIF Agency5 0.00000 0.0000 0.00000

21 FY11 Agency6 0.96000 0.8300 0.83000

22 FY12 Agency6 0.96000 0.8300 0.83000

24 PDIF Agency6 0.00000 0.0000 0.00000

25 FY11 Agency7 0.99000 0.8700 0.87000

26 FY12 Agency7 0.99000 1.0000 0.87000

28 PDIF Agency7 0.00000 14.9425 0.00000

29 FY11 Agency8 0.93000 0.8800 0.88000

30 FY12 Agency8 0.93000 0.8800 0.88000

32 PDIF Agency8 0.00000 0.0000 0.00000

33 FY11 Agency9 0.93000 0.8600 0.86000

34 FY12 Agency9 0.93000 0.8600 0.93000

36 PDIF Agency9 0.00000 0.0000 8.13953

37 FY11 Agency10 0.96000 0.8700 0.87000

38 FY12 Agency10 1.00000 0.8700 0.87000

40 PDIF Agency10 4.16667 0.0000 0.00000

Message was edited by: data _null_

View solution in original post

Linlin · Posted 08-23-2012 03:21 PM

Hi,

try the modified code:

proc sql;

select catx('=',name, cats('FY11_', name ))

into :renstr separated by ' '

from dictionary.columns where libname = 'WORK' and memname='FY11_data';

quit;

data Fy11_data_new;

set Fy11_data (rename = (&renstr));

run;

RobertNYC · Posted 08-23-2012 03:46 PM

Great!!! I will try this out. do you have any advice for combining the tables and calculating the % difference, again I have a large amount of paris of varibles I have to do this for.

THANKS!!!

Astounding · Posted 08-23-2012 04:02 PM

Additional advice: Think about the process of combining and possibly make adjustments now. For example:

Should you add to the WHERE clause: and upcase(name) ne 'AGENCY'

Arrays in a DATA step might be a good tool to accomplish computing the differences.

Do you know for sure that the structure of the two data sets is 100% identical (order of the variables, capitalization of their names, for example)? If not, you might want to pull all changes from one data set. For example, you might pull the renaming of FY12 variables from the FY11 data set to make sure the order of variable names will be preserved when you get to computing differences.

RobertNYC · Posted 08-23-2012 04:03 PM

Hi again Linlin--

I tried the program and I got the same error messages. Any thoughts

ERROR 213-322: Variable name FY11_FP_s_Ability_to_Ensure_Child_Hea is longer than 32 characters.

ERROR 23-7: Invalid value for the RENAME option.

23 ! FP_s_Ability_to_Ensure_Child_Hea=FY11_FP_s_Ability_to_Ensure_Child_Hea Adoption_Support=FY11_Adoption_Support

23 ! Foster_Parent_s_Reported_Relatio=FY11_Foster_Parent_s_Reported_Relatio Foster_Parent_Relationship_with=FY11_Foster_Parent_Relationship_with

-------------------------------------

213

ERROR 213-322: Variable name FY11_Foster_Parent_s_Reported_Relatio is longer than 32 characters.

23 ! Foster_Parent_s_Reported_Relatio=FY11_Foster_Parent_s_Reported_Relatio Foster_Parent_Relationship_with=FY11_Foster_Parent_Relationship_with

------------------------------------

213

ERROR 213-322: Variable name FY11_Foster_Parent_Relationship_with is longer than 32 characters.

NOTE: Line generated by the macro variable "RENSTR".

24 Support_For_Family_to_Family_Mod=FY11_Support_For_Family_to_Family_Mod Foster_Parent_Support_Total=FY11_Foster_Parent_Support_Total

-------------------------------------

213

23

ERROR 213-322: Variable name FY11_Support_For_Family_to_Family_Mod is longer than 32 characters.

ERROR 23-7: Invalid value for the RENAME option.

Linlin · Posted 08-23-2012 04:34 PM

Hi,

I think you have to shorten your variables first.

the example code may be helpful:

data class1;

set sashelp.class;

data class2;

set sashelp.class;

age=age+1;

weight=weight-1;

height=height+2;

run;

proc sql;

select catx('=',name,cats('FY11_',name)) into : list1 separated by ' '

from dictionary.columns

where libname='SASHELP' and memname='CLASS' and type='num';

select catx('=',name,cats('FY12_',name)) into : list2 separated by ' '

from dictionary.columns

where libname='SASHELP' and memname='CLASS' and type='num';

select cat(cats('diff_',name),'=',catx('-',cats('FY12_',name),cats('FY11_',name))) into :list4 separated by ';'

from dictionary.columns

where libname='SASHELP' and memname='CLASS' and type='num';

quit;

proc datasets nolist;

modify class1;

rename &list1;

run;

proc datasets nolist;

modify class2;

rename &list2;

quit;

data want;

merge class1 class2;

by name;

&list4;

proc print data=want;run;

/*log file: */

NOTE: SAS initialization used:

real time 1.12 seconds

cpu time 0.77 seconds

1 data class1;

2 set sashelp.class;

NOTE: There were 19 observations read from the data set SASHELP.CLASS.

NOTE: The data set WORK.CLASS1 has 19 observations and 5 variables.

NOTE: DATA statement used (Total process time):

real time 0.05 seconds

cpu time 0.00 seconds

3 data class2;

4 set sashelp.class;

5 age=age+1;

6 weight=weight-1;

7 height=height+2;

8 run;

NOTE: There were 19 observations read from the data set SASHELP.CLASS.

NOTE: The data set WORK.CLASS2 has 19 observations and 5 variables.

NOTE: DATA statement used (Total process time):

real time 0.01 seconds

cpu time 0.01 seconds

9

10 proc sql;

11 select catx('=',name,cats('FY11_',name)) into : list1 separated by ' '

12 from dictionary.columns

13 where libname='SASHELP' and memname='CLASS' and type='num';

14 select catx('=',name,cats('FY12_',name)) into : list2 separated by ' '

15 from dictionary.columns

16 where libname='SASHELP' and memname='CLASS' and type='num';

17

18 select

18 ! cat(cats('diff_',name),'=',catx('-',cats('FY12_',name),cats('FY11_',name)))

18 ! into :list4 separated by ';'

19 from dictionary.columns

20 where libname='SASHELP' and memname='CLASS' and type='num';

21 quit;

NOTE: PROCEDURE SQL used (Total process time):

real time 0.10 seconds

cpu time 0.06 seconds

22

23 proc datasets nolist;

24 modify class1;

25 rename &list1;

NOTE: Renaming variable Age to FY11_Age.

NOTE: Renaming variable Height to FY11_Height.

NOTE: Renaming variable Weight to FY11_Weight.

26 run;

NOTE: MODIFY was successful for WORK.CLASS1.DATA.

NOTE: PROCEDURE DATASETS used (Total process time):

real time 0.02 seconds

cpu time 0.00 seconds

27 proc datasets nolist;

28 modify class2;

29 rename &list2;

NOTE: Renaming variable Age to FY12_Age.

NOTE: Renaming variable Height to FY12_Height.

NOTE: Renaming variable Weight to FY12_Weight.

30 quit;

NOTE: MODIFY was successful for WORK.CLASS2.DATA.

NOTE: PROCEDURE DATASETS used (Total process time):

real time 0.00 seconds

cpu time 0.00 seconds

31 data want;

32 merge class1 class2;

33 by name;

34 &list4;

NOTE: There were 19 observations read from the data set WORK.CLASS1.

NOTE: There were 19 observations read from the data set WORK.CLASS2.

NOTE: The data set WORK.WANT has 19 observations and 11 variables.

NOTE: DATA statement used (Total process time):

real time 0.00 seconds

cpu time 0.01 seconds

35 proc print data=want;run;

NOTE: There were 19 observations read from the data set WORK.WANT.

NOTE: PROCEDURE PRINT used (Total process time):

real time 0.00 seconds

cpu time 0.01 seconds

Message was edited by: Linlin

RobertNYC · Posted 08-23-2012 04:59 PM

Linlin!! Thanks you so much for your help!!!1 I think we are almost there. I am still getting error messages though.

947 %put &list4;

diff_Age=FY12_Age-FY11_Age diff_Height=FY12_Height-FY11_Height diff_Weight=FY12_Weight-FY11_Weight

948 proc datasets nolist;

949 modify class1;

950 rename &list1;

NOTE: Renaming variable Age to FY11_Age.

NOTE: Renaming variable Height to FY11_Height.

NOTE: Renaming variable Weight to FY11_Weight.

951 run;

NOTE: MODIFY was successful for WORK.CLASS1.DATA.

NOTE: PROCEDURE DATASETS used (Total process time):

real time 15.39 seconds

cpu time 0.09 seconds

952 proc datasets nolist;

953 modify class2;

954 rename &list2;

NOTE: Renaming variable Age to FY12_Age.

NOTE: Renaming variable Height to FY12_Height.

NOTE: Renaming variable Weight to FY12_Weight.

955 quit;

NOTE: MODIFY was successful for WORK.CLASS2.DATA.

NOTE: PROCEDURE DATASETS used (Total process time):

real time 0.06 seconds

cpu time 0.01 seconds

956 data want;

957 merge class1 class2;

958 by name;

959 &list4;

NOTE: Line generated by the macro variable "LIST4".

1 diff_Age=FY12_Age-FY11_Age diff_Height=FY12_Height-FY11_Height diff_Weight=FY12_Weight-FY11_Weight

----------- -----------

22 22

ERROR 22-322: Syntax error, expecting one of the following: !, !!, &, (, *, **, +, -, /, ;, <, <=, <>, =, >, ><, >=, AND, EQ, GE, GT, IN, LE, LT,

MAX, MIN, NE, NG, NL, NOTIN, OR, [, ^=, {, |, ||, ~=.

960 %put &list4;

diff_Age=FY12_Age-FY11_Age diff_Height=FY12_Height-FY11_Height diff_Weight=FY12_Weight-FY11_Weight

NOTE: The SAS System stopped processing this step because of errors.

WARNING: The data set WORK.WANT may be incomplete. When this step was stopped there were 0 observations and 11 variables.

NOTE: DATA statement used (Total process time):

real time 1:05.50

cpu time 0.39 seconds

961 data want;

962 merge class1 class2;

963 by name;

964 &list4;

NOTE: Line generated by the macro variable "LIST4".

1 diff_Age=FY12_Age-FY11_Age diff_Height=FY12_Height-FY11_Height diff_Weight=FY12_Weight-FY11_Weight

----------- -----------

22 22

ERROR 22-322: Syntax error, expecting one of the following: !, !!, &, (, *, **, +, -, /, ;, <, <=, <>, =, >, ><, >=, AND, EQ, GE, GT, IN, LE, LT,

MAX, MIN, NE, NG, NL, NOTIN, OR, [, ^=, {, |, ||, ~=.

Astounding · Posted 08-23-2012 05:24 PM

Linlin's code looks good. As in her example, notice you have to switch on the last SELECT statement from:

separated by ' '

to:

separated by ';'

That should handle the latest error.

data_null__ · Posted 08-23-2012 05:31 PM

You have a lot of your data stored in the meta data, the variables names. I would transpose the data and extract the metadata from the variable names into varariables. Then you can easily join(merge) data with many rows but only one column for each fiscal year and compute the %dif. If you want to make it wide again that will be easy and you can construct new var names with the proper attributes. Your var names look like labels.

It is easy to go from tall to wide and tall is much easier to work with.

With regards to you exact question "How to DIFF matching variables from two data sets" that can be easily done with PROC COMPARE.

data fy11;

input Agency:$10. (Safety_Score Recreation_score Contacts_score)(percent.);

cards;

Agency1 98% 95% 95%

Agency2 100% 96% 96%

Agency3 94% 91% 91%

Agency4 92% 92% 92%

Agency5 95% 95% 95%

Agency6 96% 83% 83%

Agency7 99% 87% 87%

Agency8 93% 88% 88%

Agency9 93% 86% 86%

Agency10 96% 87% 87%

;;;;

run;

proc sort sortseq=linguistic(numeric_collation=on);

by agency;

run;

data fy12;

input Agency:$10. (Safety_Score Recreation_score Contacts_score)(percent.);

cards;

Agency1 98% 93% 93%

Agency2 100% 96% 96%

Agency3 94% 93% 91%

Agency4 93% 92% 100%

Agency5 95% 95% 95%

Agency6 96% 83% 83%

Agency7 99% 100% 87%

Agency8 93% 88% 88%

Agency9 93% 86% 93%

Agency10 100% 87% 87%

;;;;

run;

proc sort sortseq=linguistic(numeric_collation=on);

by agency;

run;

proc compare noprint base=fy11 compare=fy12 out=outall outall;

by agency;

var Safety_Score--Contacts_score;

run;

proc format;

value $tcode 'BASE'='FY11' 'COMPARE'='FY12' 'PERCENT'='PDIF';

run;

proc print data=outall(drop=_obs_);

where _type_ ne 'DIF';

format _type_ $tcode.;

run;

Safety_ Recreation_ Contacts_

Obs _TYPE_ Agency Score score score

1 FY11 Agency1 0.98000 0.9500 0.95000

2 FY12 Agency1 0.98000 0.9300 0.93000

4 PDIF Agency1 0.00000 -2.1053 -2.10526

5 FY11 Agency2 1.00000 0.9600 0.96000

6 FY12 Agency2 1.00000 0.9600 0.96000

8 PDIF Agency2 0.00000 0.0000 0.00000

9 FY11 Agency3 0.94000 0.9100 0.91000

10 FY12 Agency3 0.94000 0.9300 0.91000

12 PDIF Agency3 0.00000 2.1978 0.00000

13 FY11 Agency4 0.92000 0.9200 0.92000

14 FY12 Agency4 0.93000 0.9200 1.00000

16 PDIF Agency4 1.08696 0.0000 8.69565

17 FY11 Agency5 0.95000 0.9500 0.95000

18 FY12 Agency5 0.95000 0.9500 0.95000

20 PDIF Agency5 0.00000 0.0000 0.00000

21 FY11 Agency6 0.96000 0.8300 0.83000

22 FY12 Agency6 0.96000 0.8300 0.83000

24 PDIF Agency6 0.00000 0.0000 0.00000

25 FY11 Agency7 0.99000 0.8700 0.87000

26 FY12 Agency7 0.99000 1.0000 0.87000

28 PDIF Agency7 0.00000 14.9425 0.00000

29 FY11 Agency8 0.93000 0.8800 0.88000

30 FY12 Agency8 0.93000 0.8800 0.88000

32 PDIF Agency8 0.00000 0.0000 0.00000

33 FY11 Agency9 0.93000 0.8600 0.86000

34 FY12 Agency9 0.93000 0.8600 0.93000

36 PDIF Agency9 0.00000 0.0000 8.13953

37 FY11 Agency10 0.96000 0.8700 0.87000

38 FY12 Agency10 1.00000 0.8700 0.87000

40 PDIF Agency10 4.16667 0.0000 0.00000

Message was edited by: data _null_

RobertNYC · Posted 08-24-2012 11:54 AM

Hey Data_null_, thanks so much. Question: I have never seen these statements used in a proc sort statement before.

proc sort sortseq=linguistic(numeric_collation=on);

by agency;

run;

What do they mean and do?

Thanks!

data_null__ · Posted 08-24-2012 12:06 PM

http://support.sas.com/documentation/cdl/en/proc/63079/HTML/default/viewer.htm#p02bhn81rn4u64n1b6l00...

Sorts AGENCY where 10 follows 9 not 1. Very usefull for sorting SAS Enumerated Variable names.

Usually I would not include the redundent "Agency" text in the value of the variable AGENCY.

RobertNYC · Posted 08-24-2012 01:23 PM

Thanks everyone!

Last question: Is there a way to transpose the data set after Data_null_'s step is run, so it looks like this.

Agency	FY11_Safety_Score	FY12_Safety_Score	PDIF	FY11_Recreation_Score	FY12_Recreation_Score	PDIF	FY11_Contacts_Score	FY12_Contacts_Score	PDIF
Agency1	98.00%	98.00%	0.00%	95.00%	93.00%	-2.00%	95.00%	93.00%	-2.00%
Agency2	100.00%	100.00%	0.00%	96.00%	96.00%	0.00%	96.00%	96.00%	0.00%
Agency3	94.00%	94.00%	0.00%	94.00%	94.00%	0.00%	94.00%	95.00%	1.00%

THANKS!!!

data_null__ · Posted 08-24-2012 03:43 PM

proc transpose data=outall out=tall;

where _type_ ne 'DIF';

by agency _type_ notsorted;

var &var;

run;

proc transpose data=tall out=wide delim=_;

by agency notsorted;

var col1;

id _type_ _name_;

format _type_ $tcode.;

run;

data_null__ · Posted 08-24-2012 03:48 PM

These won't be in same order as your example. Is that important? Because it can be done with a bit more code of course.

RobertNYC · Posted 08-24-2012 03:52 PM

Yes, they do actually have to be in that order. Thoughts?

Registration is open

SAS Training: Just a Click Away