Hi all—
So, I need a little help regarding a joining two tables together and then rename variables. Any assistance will be greatly appreciated.
I start with two tables that are similar to the examples below. All the variable names are the same in each table but there are over 100 variables in each table.
FY11 SCORES
Agency | Safety_Score | Recreation_score | Contacts_score |
Agency1 | 98% | 95% | 95% |
Agency2 | 100% | 96% | 96% |
Agency3 | 94% | 91% | 91% |
Agency4 | 92% | 92% | 92% |
Agency5 | 95% | 95% | 95% |
Agency6 | 96% | 83% | 83% |
Agency7 | 99% | 87% | 87% |
Agency8 | 93% | 88% | 88% |
Agency9 | 93% | 86% | 86% |
Agency10 | 96% | 87% | 87% |
FY12 SCORES
Agency | Safety_Score | Recreation_score | Contacts_score |
Agency1 | 98% | 93% | 93% |
Agency2 | 100% | 96% | 96% |
Agency3 | 94% | 93% | 91% |
Agency4 | 93% | 92% | 100% |
Agency5 | 95% | 95% | 95% |
Agency6 | 96% | 83% | 83% |
Agency7 | 99% | 100% | 87% |
Agency8 | 93% | 88% | 88% |
Agency9 | 93% | 86% | 93% |
Agency10 | 100% | 87% | 87% |
The final product need should look like this:
Agency | FY11 _Safety_Score | FY12 _Safety_Score | % Difference | FY11_Recreation_score | FY12_Recreation_score | % Difference | FY11_Contacts_score | FY12_Contacts_score | % Difference |
Agency1 | 98% | 95% | -3% | 98% | 96% | -2% | 95% | 95% | 0% |
Agency2 | 100% | 96% | -4% | 99% | 96% | -3% | 96% | 96% | 0% |
Agency3 | 94% | 91% | -3% | 91% | 88% | -3% | 91% | 91% | 0% |
Agency4 | 92% | 92% | 0% | 91% | 94% | 3% | 92% | 92% | 0% |
Agency5 | 95% | 95% | 0% | 95% | 94% | -1% | 95% | 95% | 0% |
Agency6 | 96% | 83% | -13% | 94% | 73% | -21% | 83% | 83% | 0% |
Agency7 | 99% | 87% | -12% | 98% | 83% | -15% | 87% | 87% | 0% |
Agency8 | 93% | 88% | -5% | 95% | 89% | -6% | 88% | 88% | 0% |
Agency9 | 93% | 86% | -7% | 93% | 79% | -14% | 86% | 86% | 0% |
Agency10 | 96% | 87% | -9% | 92% | 84% | -8% | 87% | 87% | 0% |
What I need to do is:
Now, I have a program that adds the prefixes to the varibles (see below) but the issue is some of the variable titles are too long so when the program trys to attach the prefix it doent work and I get error meaages. Is there a way to delete some of the charters in the variable names which are too long?
proc sql;
select cat(name, ' = ', cats('FY11_', name ))
into :renstr separated by ' '
from
quit;
data Fy11_data_new;
set Fy11_data (rename = (&renstr));
run;
You can see why combine these tables just by cutting and pasting in excel would be very difficult and take forever.
Again , any help with this is greatly appreciated!!!!!!!!!
You have a lot of your data stored in the meta data, the variables names. I would transpose the data and extract the metadata from the variable names into varariables. Then you can easily join(merge) data with many rows but only one column for each fiscal year and compute the %dif. If you want to make it wide again that will be easy and you can construct new var names with the proper attributes. Your var names look like labels.
It is easy to go from tall to wide and tall is much easier to work with.
With regards to you exact question "How to DIFF matching variables from two data sets" that can be easily done with PROC COMPARE.
data fy11;
input Agency:$10. (Safety_Score Recreation_score Contacts_score)(percent.);
cards;
Agency1 98% 95% 95%
Agency2 100% 96% 96%
Agency3 94% 91% 91%
Agency4 92% 92% 92%
Agency5 95% 95% 95%
Agency6 96% 83% 83%
Agency7 99% 87% 87%
Agency8 93% 88% 88%
Agency9 93% 86% 86%
Agency10 96% 87% 87%
;;;;
run;
proc sort sortseq=linguistic(numeric_collation=on);
by agency;
run;
data fy12;
input Agency:$10. (Safety_Score Recreation_score Contacts_score)(percent.);
cards;
Agency1 98% 93% 93%
Agency2 100% 96% 96%
Agency3 94% 93% 91%
Agency4 93% 92% 100%
Agency5 95% 95% 95%
Agency6 96% 83% 83%
Agency7 99% 100% 87%
Agency8 93% 88% 88%
Agency9 93% 86% 93%
Agency10 100% 87% 87%
;;;;
run;
proc sort sortseq=linguistic(numeric_collation=on);
by agency;
run;
proc compare noprint base=fy11 compare=fy12 out=outall outall;
by agency;
var Safety_Score--Contacts_score;
run;
proc format;
value $tcode 'BASE'='FY11' 'COMPARE'='FY12' 'PERCENT'='PDIF';
run;
proc print data=outall(drop=_obs_);
where _type_ ne 'DIF';
format _type_ $tcode.;
run;
Safety_ Recreation_ Contacts_
Obs _TYPE_ Agency Score score score
1 FY11 Agency1 0.98000 0.9500 0.95000
2 FY12 Agency1 0.98000 0.9300 0.93000
4 PDIF Agency1 0.00000 -2.1053 -2.10526
5 FY11 Agency2 1.00000 0.9600 0.96000
6 FY12 Agency2 1.00000 0.9600 0.96000
8 PDIF Agency2 0.00000 0.0000 0.00000
9 FY11 Agency3 0.94000 0.9100 0.91000
10 FY12 Agency3 0.94000 0.9300 0.91000
12 PDIF Agency3 0.00000 2.1978 0.00000
13 FY11 Agency4 0.92000 0.9200 0.92000
14 FY12 Agency4 0.93000 0.9200 1.00000
16 PDIF Agency4 1.08696 0.0000 8.69565
17 FY11 Agency5 0.95000 0.9500 0.95000
18 FY12 Agency5 0.95000 0.9500 0.95000
20 PDIF Agency5 0.00000 0.0000 0.00000
21 FY11 Agency6 0.96000 0.8300 0.83000
22 FY12 Agency6 0.96000 0.8300 0.83000
24 PDIF Agency6 0.00000 0.0000 0.00000
25 FY11 Agency7 0.99000 0.8700 0.87000
26 FY12 Agency7 0.99000 1.0000 0.87000
28 PDIF Agency7 0.00000 14.9425 0.00000
29 FY11 Agency8 0.93000 0.8800 0.88000
30 FY12 Agency8 0.93000 0.8800 0.88000
32 PDIF Agency8 0.00000 0.0000 0.00000
33 FY11 Agency9 0.93000 0.8600 0.86000
34 FY12 Agency9 0.93000 0.8600 0.93000
36 PDIF Agency9 0.00000 0.0000 8.13953
37 FY11 Agency10 0.96000 0.8700 0.87000
38 FY12 Agency10 1.00000 0.8700 0.87000
40 PDIF Agency10 4.16667 0.0000 0.00000
Message was edited by: data _null_
Hi,
try the modified code:
proc sql;
select catx('=',name, cats('FY11_', name ))
into :renstr separated by ' '
from dictionary.columns where libname = 'WORK' and memname='FY11_data';
quit;
data Fy11_data_new;
set Fy11_data (rename = (&renstr));
run;
Great!!! I will try this out. do you have any advice for combining the tables and calculating the % difference, again I have a large amount of paris of varibles I have to do this for.
THANKS!!!
Additional advice: Think about the process of combining and possibly make adjustments now. For example:
Should you add to the WHERE clause: and upcase(name) ne 'AGENCY'
Arrays in a DATA step might be a good tool to accomplish computing the differences.
Do you know for sure that the structure of the two data sets is 100% identical (order of the variables, capitalization of their names, for example)? If not, you might want to pull all changes from one data set. For example, you might pull the renaming of FY12 variables from the FY11 data set to make sure the order of variable names will be preserved when you get to computing differences.
Hi again Linlin--
I tried the program and I got the same error messages. Any thoughts
ERROR 213-322: Variable name FY11_FP_s_Ability_to_Ensure_Child_Hea is longer than 32 characters.
ERROR 23-7: Invalid value for the RENAME option.
23 ! FP_s_Ability_to_Ensure_Child_Hea=FY11_FP_s_Ability_to_Ensure_Child_Hea Adoption_Support=FY11_Adoption_Support
23 ! Foster_Parent_s_Reported_Relatio=FY11_Foster_Parent_s_Reported_Relatio Foster_Parent_Relationship_with=FY11_Foster_Parent_Relationship_with
-------------------------------------
213
ERROR 213-322: Variable name FY11_Foster_Parent_s_Reported_Relatio is longer than 32 characters.
23 ! Foster_Parent_s_Reported_Relatio=FY11_Foster_Parent_s_Reported_Relatio Foster_Parent_Relationship_with=FY11_Foster_Parent_Relationship_with
------------------------------------
213
ERROR 213-322: Variable name FY11_Foster_Parent_Relationship_with is longer than 32 characters.
NOTE: Line generated by the macro variable "RENSTR".
24 Support_For_Family_to_Family_Mod=FY11_Support_For_Family_to_Family_Mod Foster_Parent_Support_Total=FY11_Foster_Parent_Support_Total
-------------------------------------
213
23
ERROR 213-322: Variable name FY11_Support_For_Family_to_Family_Mod is longer than 32 characters.
ERROR 23-7: Invalid value for the RENAME option.
Hi,
I think you have to shorten your variables first.
the example code may be helpful:
data class1;
set sashelp.class;
data class2;
set sashelp.class;
age=age+1;
weight=weight-1;
height=height+2;
run;
proc sql;
select catx('=',name,cats('FY11_',name)) into : list1 separated by ' '
from dictionary.columns
where libname='SASHELP' and memname='CLASS' and type='num';
select catx('=',name,cats('FY12_',name)) into : list2 separated by ' '
from dictionary.columns
where libname='SASHELP' and memname='CLASS' and type='num';
select cat(cats('diff_',name),'=',catx('-',cats('FY12_',name),cats('FY11_',name))) into :list4 separated by ';'
from dictionary.columns
where libname='SASHELP' and memname='CLASS' and type='num';
quit;
proc datasets nolist;
modify class1;
rename &list1;
run;
proc datasets nolist;
modify class2;
rename &list2;
quit;
data want;
merge class1 class2;
by name;
&list4;
proc print data=want;run;
/*log file: */
NOTE: SAS initialization used:
real time 1.12 seconds
cpu time 0.77 seconds
1 data class1;
2 set sashelp.class;
NOTE: There were 19 observations read from the data set SASHELP.CLASS.
NOTE: The data set WORK.CLASS1 has 19 observations and 5 variables.
NOTE: DATA statement used (Total process time):
real time 0.05 seconds
cpu time 0.00 seconds
3 data class2;
4 set sashelp.class;
5 age=age+1;
6 weight=weight-1;
7 height=height+2;
8 run;
NOTE: There were 19 observations read from the data set SASHELP.CLASS.
NOTE: The data set WORK.CLASS2 has 19 observations and 5 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
9
10 proc sql;
11 select catx('=',name,cats('FY11_',name)) into : list1 separated by ' '
12 from dictionary.columns
13 where libname='SASHELP' and memname='CLASS' and type='num';
14 select catx('=',name,cats('FY12_',name)) into : list2 separated by ' '
15 from dictionary.columns
16 where libname='SASHELP' and memname='CLASS' and type='num';
17
18 select
18 ! cat(cats('diff_',name),'=',catx('-',cats('FY12_',name),cats('FY11_',name)))
18 ! into :list4 separated by ';'
19 from dictionary.columns
20 where libname='SASHELP' and memname='CLASS' and type='num';
21 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 0.10 seconds
cpu time 0.06 seconds
22
23 proc datasets nolist;
24 modify class1;
25 rename &list1;
NOTE: Renaming variable Age to FY11_Age.
NOTE: Renaming variable Height to FY11_Height.
NOTE: Renaming variable Weight to FY11_Weight.
26 run;
NOTE: MODIFY was successful for WORK.CLASS1.DATA.
NOTE: PROCEDURE DATASETS used (Total process time):
real time 0.02 seconds
cpu time 0.00 seconds
27 proc datasets nolist;
28 modify class2;
29 rename &list2;
NOTE: Renaming variable Age to FY12_Age.
NOTE: Renaming variable Height to FY12_Height.
NOTE: Renaming variable Weight to FY12_Weight.
30 quit;
NOTE: MODIFY was successful for WORK.CLASS2.DATA.
NOTE: PROCEDURE DATASETS used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
31 data want;
32 merge class1 class2;
33 by name;
34 &list4;
NOTE: There were 19 observations read from the data set WORK.CLASS1.
NOTE: There were 19 observations read from the data set WORK.CLASS2.
NOTE: The data set WORK.WANT has 19 observations and 11 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.01 seconds
35 proc print data=want;run;
NOTE: There were 19 observations read from the data set WORK.WANT.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.00 seconds
cpu time 0.01 seconds
Message was edited by: Linlin
Linlin!! Thanks you so much for your help!!!1 I think we are almost there. I am still getting error messages though.
947 %put &list4;
diff_Age=FY12_Age-FY11_Age diff_Height=FY12_Height-FY11_Height diff_Weight=FY12_Weight-FY11_Weight
948 proc datasets nolist;
949 modify class1;
950 rename &list1;
NOTE: Renaming variable Age to FY11_Age.
NOTE: Renaming variable Height to FY11_Height.
NOTE: Renaming variable Weight to FY11_Weight.
951 run;
NOTE: MODIFY was successful for WORK.CLASS1.DATA.
NOTE: PROCEDURE DATASETS used (Total process time):
real time 15.39 seconds
cpu time 0.09 seconds
952 proc datasets nolist;
953 modify class2;
954 rename &list2;
NOTE: Renaming variable Age to FY12_Age.
NOTE: Renaming variable Height to FY12_Height.
NOTE: Renaming variable Weight to FY12_Weight.
955 quit;
NOTE: MODIFY was successful for WORK.CLASS2.DATA.
NOTE: PROCEDURE DATASETS used (Total process time):
real time 0.06 seconds
cpu time 0.01 seconds
956 data want;
957 merge class1 class2;
958 by name;
959 &list4;
NOTE: Line generated by the macro variable "LIST4".
1 diff_Age=FY12_Age-FY11_Age diff_Height=FY12_Height-FY11_Height diff_Weight=FY12_Weight-FY11_Weight
----------- -----------
22 22
ERROR 22-322: Syntax error, expecting one of the following: !, !!, &, (, *, **, +, -, /, ;, <, <=, <>, =, >, ><, >=, AND, EQ, GE, GT, IN, LE, LT,
MAX, MIN, NE, NG, NL, NOTIN, OR, [, ^=, {, |, ||, ~=.
960 %put &list4;
diff_Age=FY12_Age-FY11_Age diff_Height=FY12_Height-FY11_Height diff_Weight=FY12_Weight-FY11_Weight
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.WANT may be incomplete. When this step was stopped there were 0 observations and 11 variables.
NOTE: DATA statement used (Total process time):
real time 1:05.50
cpu time 0.39 seconds
961 data want;
962 merge class1 class2;
963 by name;
964 &list4;
NOTE: Line generated by the macro variable "LIST4".
1 diff_Age=FY12_Age-FY11_Age diff_Height=FY12_Height-FY11_Height diff_Weight=FY12_Weight-FY11_Weight
----------- -----------
22 22
ERROR 22-322: Syntax error, expecting one of the following: !, !!, &, (, *, **, +, -, /, ;, <, <=, <>, =, >, ><, >=, AND, EQ, GE, GT, IN, LE, LT,
MAX, MIN, NE, NG, NL, NOTIN, OR, [, ^=, {, |, ||, ~=.
Linlin's code looks good. As in her example, notice you have to switch on the last SELECT statement from:
separated by ' '
to:
separated by ';'
That should handle the latest error.
You have a lot of your data stored in the meta data, the variables names. I would transpose the data and extract the metadata from the variable names into varariables. Then you can easily join(merge) data with many rows but only one column for each fiscal year and compute the %dif. If you want to make it wide again that will be easy and you can construct new var names with the proper attributes. Your var names look like labels.
It is easy to go from tall to wide and tall is much easier to work with.
With regards to you exact question "How to DIFF matching variables from two data sets" that can be easily done with PROC COMPARE.
data fy11;
input Agency:$10. (Safety_Score Recreation_score Contacts_score)(percent.);
cards;
Agency1 98% 95% 95%
Agency2 100% 96% 96%
Agency3 94% 91% 91%
Agency4 92% 92% 92%
Agency5 95% 95% 95%
Agency6 96% 83% 83%
Agency7 99% 87% 87%
Agency8 93% 88% 88%
Agency9 93% 86% 86%
Agency10 96% 87% 87%
;;;;
run;
proc sort sortseq=linguistic(numeric_collation=on);
by agency;
run;
data fy12;
input Agency:$10. (Safety_Score Recreation_score Contacts_score)(percent.);
cards;
Agency1 98% 93% 93%
Agency2 100% 96% 96%
Agency3 94% 93% 91%
Agency4 93% 92% 100%
Agency5 95% 95% 95%
Agency6 96% 83% 83%
Agency7 99% 100% 87%
Agency8 93% 88% 88%
Agency9 93% 86% 93%
Agency10 100% 87% 87%
;;;;
run;
proc sort sortseq=linguistic(numeric_collation=on);
by agency;
run;
proc compare noprint base=fy11 compare=fy12 out=outall outall;
by agency;
var Safety_Score--Contacts_score;
run;
proc format;
value $tcode 'BASE'='FY11' 'COMPARE'='FY12' 'PERCENT'='PDIF';
run;
proc print data=outall(drop=_obs_);
where _type_ ne 'DIF';
format _type_ $tcode.;
run;
Safety_ Recreation_ Contacts_
Obs _TYPE_ Agency Score score score
1 FY11 Agency1 0.98000 0.9500 0.95000
2 FY12 Agency1 0.98000 0.9300 0.93000
4 PDIF Agency1 0.00000 -2.1053 -2.10526
5 FY11 Agency2 1.00000 0.9600 0.96000
6 FY12 Agency2 1.00000 0.9600 0.96000
8 PDIF Agency2 0.00000 0.0000 0.00000
9 FY11 Agency3 0.94000 0.9100 0.91000
10 FY12 Agency3 0.94000 0.9300 0.91000
12 PDIF Agency3 0.00000 2.1978 0.00000
13 FY11 Agency4 0.92000 0.9200 0.92000
14 FY12 Agency4 0.93000 0.9200 1.00000
16 PDIF Agency4 1.08696 0.0000 8.69565
17 FY11 Agency5 0.95000 0.9500 0.95000
18 FY12 Agency5 0.95000 0.9500 0.95000
20 PDIF Agency5 0.00000 0.0000 0.00000
21 FY11 Agency6 0.96000 0.8300 0.83000
22 FY12 Agency6 0.96000 0.8300 0.83000
24 PDIF Agency6 0.00000 0.0000 0.00000
25 FY11 Agency7 0.99000 0.8700 0.87000
26 FY12 Agency7 0.99000 1.0000 0.87000
28 PDIF Agency7 0.00000 14.9425 0.00000
29 FY11 Agency8 0.93000 0.8800 0.88000
30 FY12 Agency8 0.93000 0.8800 0.88000
32 PDIF Agency8 0.00000 0.0000 0.00000
33 FY11 Agency9 0.93000 0.8600 0.86000
34 FY12 Agency9 0.93000 0.8600 0.93000
36 PDIF Agency9 0.00000 0.0000 8.13953
37 FY11 Agency10 0.96000 0.8700 0.87000
38 FY12 Agency10 1.00000 0.8700 0.87000
40 PDIF Agency10 4.16667 0.0000 0.00000
Message was edited by: data _null_
Hey Data_null_, thanks so much. Question: I have never seen these statements used in a proc sort statement before.
proc sort sortseq=linguistic(numeric_collation=on);
by agency;
run;
What do they mean and do?
Thanks!
Sorts AGENCY where 10 follows 9 not 1. Very usefull for sorting SAS Enumerated Variable names.
Usually I would not include the redundent "Agency" text in the value of the variable AGENCY.
Thanks everyone!
Last question: Is there a way to transpose the data set after Data_null_'s step is run, so it looks like this.
Agency | FY11_Safety_Score | FY12_Safety_Score | PDIF | FY11_Recreation_Score | FY12_Recreation_Score | PDIF | FY11_Contacts_Score | FY12_Contacts_Score | PDIF |
Agency1 | 98.00% | 98.00% | 0.00% | 95.00% | 93.00% | -2.00% | 95.00% | 93.00% | -2.00% |
Agency2 | 100.00% | 100.00% | 0.00% | 96.00% | 96.00% | 0.00% | 96.00% | 96.00% | 0.00% |
Agency3 | 94.00% | 94.00% | 0.00% | 94.00% | 94.00% | 0.00% | 94.00% | 95.00% | 1.00% |
THANKS!!!
proc transpose data=outall out=tall;
where _type_ ne 'DIF';
by agency _type_ notsorted;
var &var;
run;
proc transpose data=tall out=wide delim=_;
by agency notsorted;
var col1;
id _type_ _name_;
format _type_ $tcode.;
run;
These won't be in same order as your example. Is that important? Because it can be done with a bit more code of course.
Yes, they do actually have to be in that order. Thoughts?
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.