BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
RobertNYC
Obsidian | Level 7

Hi all—


So, I need a little help regarding a joining two tables together and then rename variables. Any assistance will be greatly appreciated.

I start with two tables that are similar to the examples below. All the variable names are the same in each table but there are over 100 variables in each table.

FY11 SCORES

Agency

Safety_Score

Recreation_score

Contacts_score

Agency1

98%

95%

95%

Agency2

100%

96%

96%

Agency3

94%

91%

91%

Agency4

92%

92%

92%

Agency5

95%

95%

95%

Agency6

96%

83%

83%

Agency7

99%

87%

87%

Agency8

93%

88%

88%

Agency9

93%

86%

86%

Agency10

96%

87%

87%

 

FY12 SCORES

Agency

Safety_Score

Recreation_score

Contacts_score

Agency1

98%

93%

93%

Agency2

100%

96%

96%

Agency3

94%

93%

91%

Agency4

93%

92%

100%

Agency5

95%

95%

95%

Agency6

96%

83%

83%

Agency7

99%

100%

87%

Agency8

93%

88%

88%

Agency9

93%

86%

93%

Agency10

100%

87%

87%

The final product need should look like this:

AgencyFY11 _Safety_ScoreFY12 _Safety_Score% DifferenceFY11_Recreation_score FY12_Recreation_score % DifferenceFY11_Contacts_score FY12_Contacts_score % Difference
Agency198%95%-3%98%96%-2%95%95%0%
Agency2100%96%-4%99%96%-3%96%96%0%
Agency394%91%-3%91%88%-3%91%91%0%
Agency492%92%0%91%94%3%92%92%0%
Agency595%95%0%95%94%-1%95%95%0%
Agency696%83%-13%94%73%-21%83%83%0%
Agency799%87%-12%98%83%-15%87%87%0%
Agency893%88%-5%95%89%-6%88%88%0%
Agency993%86%-7%93%79%-14%86%86%0%
Agency1096%87%-9%92%84%-8%87%87%0%

What I need to do is:

  1. Add a FY11_ or FY_12 prefix to all the variables to identify which are they are from.
  2. Combine the tables so I have FY11 and FY12 scores side by side.
  3. Calculate the % cent difference for each paring.

Now, I have a program that adds the prefixes to the varibles (see below) but the issue is some of the variable titles are too long so  when the program trys to attach the  prefix it doent work and I get error meaages. Is there a way to delete some of the charters in the variable names which are too long?  

proc sql;

select cat(name, ' = ', cats('FY11_', name ))

into :renstr separated by ' '

from

  1. dictionary.columns where libname = 'WORK' and memname='FY11_data';

quit;

data Fy11_data_new;

  set Fy11_data (rename = (&renstr));

run;

You can see why combine these tables just by cutting and pasting in excel would be very difficult and take forever.

Again , any help with this is greatly appreciated!!!!!!!!!

1 ACCEPTED SOLUTION

Accepted Solutions
data_null__
Jade | Level 19

You have a lot of your data stored in the meta data, the variables names.  I would transpose the data and extract the metadata from the variable names into varariables.  Then you can easily join(merge) data with many rows but only one column for each fiscal year and compute the %dif.  If you want to make it wide again that will be easy and you can construct new var names with the proper attributes.  Your var names look like labels.

It is easy to go from tall to wide and tall is much easier to work with.

With regards to you exact question "How to DIFF matching variables from two data sets" that can be easily done with PROC COMPARE.

data fy11;

   input Agency:$10.    (Safety_Score     Recreation_score Contacts_score)(percent.);

   cards;

Agency1     98%   95%   95%

Agency2     100%  96%   96%

Agency3     94%   91%   91%

Agency4     92%   92%   92%

Agency5     95%   95%   95%

Agency6     96%   83%   83%

Agency7     99%   87%   87%

Agency8     93%   88%   88%

Agency9     93%   86%   86%

Agency10    96%   87%   87%

;;;;

   run;

proc sort sortseq=linguistic(numeric_collation=on);

   by agency;

   run;

data fy12;

   input Agency:$10.    (Safety_Score     Recreation_score Contacts_score)(percent.);

   cards;

Agency1     98%   93%   93%

Agency2     100%  96%   96%

Agency3     94%   93%   91%

Agency4     93%   92%   100%

Agency5     95%   95%   95%

Agency6     96%   83%   83%

Agency7     99%   100%  87%

Agency8     93%   88%   88%

Agency9     93%   86%   93%

Agency10    100%  87%   87%

;;;;

   run;

proc sort sortseq=linguistic(numeric_collation=on);

   by agency;

   run;

proc compare noprint base=fy11 compare=fy12 out=outall outall;

   by agency;

   var Safety_Score--Contacts_score;

   run;

proc format;

   value $tcode 'BASE'='FY11' 'COMPARE'='FY12' 'PERCENT'='PDIF';

   run;

proc print data=outall(drop=_obs_);

   where _type_ ne 'DIF';

   format _type_ $tcode.;

   run;

                             Safety_    Recreation_    Contacts_

Obs    _TYPE_     Agency      Score        score         score

  1     FY11     Agency1     0.98000        0.9500       0.95000

  2     FY12     Agency1     0.98000        0.9300       0.93000

  4     PDIF     Agency1     0.00000       -2.1053      -2.10526

  5     FY11     Agency2     1.00000        0.9600       0.96000

  6     FY12     Agency2     1.00000        0.9600       0.96000

  8     PDIF     Agency2     0.00000        0.0000       0.00000

  9     FY11     Agency3     0.94000        0.9100       0.91000

10     FY12     Agency3     0.94000        0.9300       0.91000

12     PDIF     Agency3     0.00000        2.1978       0.00000

13     FY11     Agency4     0.92000        0.9200       0.92000

14     FY12     Agency4     0.93000        0.9200       1.00000

16     PDIF     Agency4     1.08696        0.0000       8.69565

17     FY11     Agency5     0.95000        0.9500       0.95000

18     FY12     Agency5     0.95000        0.9500       0.95000

20     PDIF     Agency5     0.00000        0.0000       0.00000

21     FY11     Agency6     0.96000        0.8300       0.83000

22     FY12     Agency6     0.96000        0.8300       0.83000

24     PDIF     Agency6     0.00000        0.0000       0.00000

25     FY11     Agency7     0.99000        0.8700       0.87000

26     FY12     Agency7     0.99000        1.0000       0.87000

28     PDIF     Agency7     0.00000       14.9425       0.00000

29     FY11     Agency8     0.93000        0.8800       0.88000

30     FY12     Agency8     0.93000        0.8800       0.88000

32     PDIF     Agency8     0.00000        0.0000       0.00000

33     FY11     Agency9     0.93000        0.8600       0.86000

34     FY12     Agency9     0.93000        0.8600       0.93000

36     PDIF     Agency9     0.00000        0.0000       8.13953

37     FY11     Agency10    0.96000        0.8700       0.87000

38     FY12     Agency10    1.00000        0.8700       0.87000

40     PDIF     Agency10    4.16667        0.0000       0.00000

Message was edited by: data _null_

View solution in original post

20 REPLIES 20
Linlin
Lapis Lazuli | Level 10

Hi,

try the modified code:

proc sql;

select catx('=',name, cats('FY11_', name ))

into :renstr separated by ' '

from dictionary.columns where libname = 'WORK' and memname='FY11_data';

quit;

 

data Fy11_data_new;

  set Fy11_data (rename = (&renstr));

run;

RobertNYC
Obsidian | Level 7

Great!!! I will try this out. do you have any advice for combining the tables and calculating the % difference, again I have a large amount of paris of varibles I have to do this for.

THANKS!!!

Astounding
PROC Star

Additional advice:  Think about the process of combining and possibly make adjustments now.  For example:

Should you add to the WHERE clause:  and upcase(name) ne 'AGENCY'

Arrays in a DATA step might be a good tool to accomplish computing the differences.

Do you know for sure that the structure of the two data sets is 100% identical (order of the variables, capitalization of their names, for example)?  If not, you might want to pull all changes from one data set.  For example, you might pull the renaming of FY12 variables from the FY11 data set to make sure the order of variable names will be preserved when you get to computing differences.

RobertNYC
Obsidian | Level 7

Hi again Linlin--

I tried the program and I got the same error messages.  Any thoughts

ERROR 213-322: Variable name FY11_FP_s_Ability_to_Ensure_Child_Hea is longer than 32 characters.

ERROR 23-7: Invalid value for the RENAME option.

23 !  FP_s_Ability_to_Ensure_Child_Hea=FY11_FP_s_Ability_to_Ensure_Child_Hea Adoption_Support=FY11_Adoption_Support

23 ! Foster_Parent_s_Reported_Relatio=FY11_Foster_Parent_s_Reported_Relatio Foster_Parent_Relationship_with=FY11_Foster_Parent_Relationship_with

                                      -------------------------------------

                                      213

ERROR 213-322: Variable name FY11_Foster_Parent_s_Reported_Relatio is longer than 32 characters.

23 ! Foster_Parent_s_Reported_Relatio=FY11_Foster_Parent_s_Reported_Relatio Foster_Parent_Relationship_with=FY11_Foster_Parent_Relationship_with

                                                                                                            ------------------------------------

                                                                                                            213

ERROR 213-322: Variable name FY11_Foster_Parent_Relationship_with is longer than 32 characters.

NOTE: Line generated by the macro variable "RENSTR".

24    Support_For_Family_to_Family_Mod=FY11_Support_For_Family_to_Family_Mod Foster_Parent_Support_Total=FY11_Foster_Parent_Support_Total

                                       -------------------------------------

                                       213

                                       23

ERROR 213-322: Variable name FY11_Support_For_Family_to_Family_Mod is longer than 32 characters.

ERROR 23-7: Invalid value for the RENAME option.

Linlin
Lapis Lazuli | Level 10

Hi,

I think you have to shorten your variables first.

the example code may be helpful:

data class1;

  set sashelp.class;

data class2;

  set sashelp.class;

  age=age+1;

  weight=weight-1;

  height=height+2;

run;

proc sql;

  select catx('=',name,cats('FY11_',name)) into : list1 separated by ' '

     from dictionary.columns

    where libname='SASHELP' and memname='CLASS' and type='num';

   select catx('=',name,cats('FY12_',name)) into : list2 separated by ' '

     from dictionary.columns

    where libname='SASHELP' and memname='CLASS' and type='num';

 

    select cat(cats('diff_',name),'=',catx('-',cats('FY12_',name),cats('FY11_',name))) into :list4 separated by ';'

     from dictionary.columns

    where libname='SASHELP' and memname='CLASS' and type='num';

quit;

proc datasets nolist;

modify class1;

rename &list1;

run;

proc datasets nolist;

modify class2;

rename &list2;

quit;

data want;

  merge class1 class2;

  by name;

  &list4;

proc print data=want;run;

/*log file: */

NOTE: SAS initialization used:

      real time           1.12 seconds

      cpu time            0.77 seconds

1    data class1;

2      set sashelp.class;

NOTE: There were 19 observations read from the data set SASHELP.CLASS.

NOTE: The data set WORK.CLASS1 has 19 observations and 5 variables.

NOTE: DATA statement used (Total process time):

      real time           0.05 seconds

      cpu time            0.00 seconds

3    data class2;

4      set sashelp.class;

5      age=age+1;

6      weight=weight-1;

7      height=height+2;

8    run;

NOTE: There were 19 observations read from the data set SASHELP.CLASS.

NOTE: The data set WORK.CLASS2 has 19 observations and 5 variables.

NOTE: DATA statement used (Total process time):

      real time           0.01 seconds

      cpu time            0.01 seconds

9

10   proc sql;

11     select catx('=',name,cats('FY11_',name)) into : list1 separated by ' '

12        from dictionary.columns

13       where libname='SASHELP' and memname='CLASS' and type='num';

14      select catx('=',name,cats('FY12_',name)) into : list2 separated by ' '

15        from dictionary.columns

16       where libname='SASHELP' and memname='CLASS' and type='num';

17

18       select

18 ! cat(cats('diff_',name),'=',catx('-',cats('FY12_',name),cats('FY11_',name)))

18 ! into :list4 separated by ';'

19        from dictionary.columns

20       where libname='SASHELP' and memname='CLASS' and type='num';

21   quit;

NOTE: PROCEDURE SQL used (Total process time):

      real time           0.10 seconds

      cpu time            0.06 seconds

22

23   proc datasets nolist;

24   modify class1;

25   rename &list1;

NOTE: Renaming variable Age to FY11_Age.

NOTE: Renaming variable Height to FY11_Height.

NOTE: Renaming variable Weight to FY11_Weight.

26   run;

NOTE: MODIFY was successful for WORK.CLASS1.DATA.

NOTE: PROCEDURE DATASETS used (Total process time):

      real time           0.02 seconds

      cpu time            0.00 seconds

27   proc datasets nolist;

28   modify class2;

29   rename &list2;

NOTE: Renaming variable Age to FY12_Age.

NOTE: Renaming variable Height to FY12_Height.

NOTE: Renaming variable Weight to FY12_Weight.

30   quit;

NOTE: MODIFY was successful for WORK.CLASS2.DATA.

NOTE: PROCEDURE DATASETS used (Total process time):

      real time           0.00 seconds

      cpu time            0.00 seconds

31   data want;

32     merge class1 class2;

33     by name;

34     &list4;

NOTE: There were 19 observations read from the data set WORK.CLASS1.

NOTE: There were 19 observations read from the data set WORK.CLASS2.

NOTE: The data set WORK.WANT has 19 observations and 11 variables.

NOTE: DATA statement used (Total process time):

      real time           0.00 seconds

      cpu time            0.01 seconds

35   proc print data=want;run;

NOTE: There were 19 observations read from the data set WORK.WANT.

NOTE: PROCEDURE PRINT used (Total process time):

      real time           0.00 seconds

      cpu time            0.01 seconds

Message was edited by: Linlin

RobertNYC
Obsidian | Level 7

Linlin!! Thanks you so much for your help!!!1 I think we are almost there. I am still getting error messages though.

947  %put &list4;

diff_Age=FY12_Age-FY11_Age diff_Height=FY12_Height-FY11_Height diff_Weight=FY12_Weight-FY11_Weight

948  proc datasets nolist;

949  modify class1;

950  rename &list1;

NOTE: Renaming variable Age to FY11_Age.

NOTE: Renaming variable Height to FY11_Height.

NOTE: Renaming variable Weight to FY11_Weight.

951  run;

NOTE: MODIFY was successful for WORK.CLASS1.DATA.

NOTE: PROCEDURE DATASETS used (Total process time):

      real time           15.39 seconds

      cpu time            0.09 seconds

952  proc datasets nolist;

953  modify class2;

954  rename &list2;

NOTE: Renaming variable Age to FY12_Age.

NOTE: Renaming variable Height to FY12_Height.

NOTE: Renaming variable Weight to FY12_Weight.

955  quit;

NOTE: MODIFY was successful for WORK.CLASS2.DATA.

NOTE: PROCEDURE DATASETS used (Total process time):

      real time           0.06 seconds

      cpu time            0.01 seconds

956  data want;

957    merge class1 class2;

958    by name;

959    &list4;

NOTE: Line generated by the macro variable "LIST4".

1       diff_Age=FY12_Age-FY11_Age diff_Height=FY12_Height-FY11_Height diff_Weight=FY12_Weight-FY11_Weight

                                   -----------                         -----------

                                   22                                  22

ERROR 22-322: Syntax error, expecting one of the following: !, !!, &, (, *, **, +, -, /, ;, <, <=, <>, =, >, ><, >=, AND, EQ, GE, GT, IN, LE, LT,

              MAX, MIN, NE, NG, NL, NOTIN, OR, [, ^=, {, |, ||, ~=.

960  %put &list4;

diff_Age=FY12_Age-FY11_Age diff_Height=FY12_Height-FY11_Height diff_Weight=FY12_Weight-FY11_Weight

NOTE: The SAS System stopped processing this step because of errors.

WARNING: The data set WORK.WANT may be incomplete.  When this step was stopped there were 0 observations and 11 variables.

NOTE: DATA statement used (Total process time):

      real time           1:05.50

      cpu time            0.39 seconds

961  data want;

962    merge class1 class2;

963    by name;

964    &list4;

NOTE: Line generated by the macro variable "LIST4".

1       diff_Age=FY12_Age-FY11_Age diff_Height=FY12_Height-FY11_Height diff_Weight=FY12_Weight-FY11_Weight

                                   -----------                         -----------

                                   22                                  22

ERROR 22-322: Syntax error, expecting one of the following: !, !!, &, (, *, **, +, -, /, ;, <, <=, <>, =, >, ><, >=, AND, EQ, GE, GT, IN, LE, LT,

              MAX, MIN, NE, NG, NL, NOTIN, OR, [, ^=, {, |, ||, ~=.

Astounding
PROC Star

Linlin's code looks good.  As in her example, notice you have to switch on the last SELECT statement from:

separated by ' '

to:

separated by ';'

That should handle the latest error.

data_null__
Jade | Level 19

You have a lot of your data stored in the meta data, the variables names.  I would transpose the data and extract the metadata from the variable names into varariables.  Then you can easily join(merge) data with many rows but only one column for each fiscal year and compute the %dif.  If you want to make it wide again that will be easy and you can construct new var names with the proper attributes.  Your var names look like labels.

It is easy to go from tall to wide and tall is much easier to work with.

With regards to you exact question "How to DIFF matching variables from two data sets" that can be easily done with PROC COMPARE.

data fy11;

   input Agency:$10.    (Safety_Score     Recreation_score Contacts_score)(percent.);

   cards;

Agency1     98%   95%   95%

Agency2     100%  96%   96%

Agency3     94%   91%   91%

Agency4     92%   92%   92%

Agency5     95%   95%   95%

Agency6     96%   83%   83%

Agency7     99%   87%   87%

Agency8     93%   88%   88%

Agency9     93%   86%   86%

Agency10    96%   87%   87%

;;;;

   run;

proc sort sortseq=linguistic(numeric_collation=on);

   by agency;

   run;

data fy12;

   input Agency:$10.    (Safety_Score     Recreation_score Contacts_score)(percent.);

   cards;

Agency1     98%   93%   93%

Agency2     100%  96%   96%

Agency3     94%   93%   91%

Agency4     93%   92%   100%

Agency5     95%   95%   95%

Agency6     96%   83%   83%

Agency7     99%   100%  87%

Agency8     93%   88%   88%

Agency9     93%   86%   93%

Agency10    100%  87%   87%

;;;;

   run;

proc sort sortseq=linguistic(numeric_collation=on);

   by agency;

   run;

proc compare noprint base=fy11 compare=fy12 out=outall outall;

   by agency;

   var Safety_Score--Contacts_score;

   run;

proc format;

   value $tcode 'BASE'='FY11' 'COMPARE'='FY12' 'PERCENT'='PDIF';

   run;

proc print data=outall(drop=_obs_);

   where _type_ ne 'DIF';

   format _type_ $tcode.;

   run;

                             Safety_    Recreation_    Contacts_

Obs    _TYPE_     Agency      Score        score         score

  1     FY11     Agency1     0.98000        0.9500       0.95000

  2     FY12     Agency1     0.98000        0.9300       0.93000

  4     PDIF     Agency1     0.00000       -2.1053      -2.10526

  5     FY11     Agency2     1.00000        0.9600       0.96000

  6     FY12     Agency2     1.00000        0.9600       0.96000

  8     PDIF     Agency2     0.00000        0.0000       0.00000

  9     FY11     Agency3     0.94000        0.9100       0.91000

10     FY12     Agency3     0.94000        0.9300       0.91000

12     PDIF     Agency3     0.00000        2.1978       0.00000

13     FY11     Agency4     0.92000        0.9200       0.92000

14     FY12     Agency4     0.93000        0.9200       1.00000

16     PDIF     Agency4     1.08696        0.0000       8.69565

17     FY11     Agency5     0.95000        0.9500       0.95000

18     FY12     Agency5     0.95000        0.9500       0.95000

20     PDIF     Agency5     0.00000        0.0000       0.00000

21     FY11     Agency6     0.96000        0.8300       0.83000

22     FY12     Agency6     0.96000        0.8300       0.83000

24     PDIF     Agency6     0.00000        0.0000       0.00000

25     FY11     Agency7     0.99000        0.8700       0.87000

26     FY12     Agency7     0.99000        1.0000       0.87000

28     PDIF     Agency7     0.00000       14.9425       0.00000

29     FY11     Agency8     0.93000        0.8800       0.88000

30     FY12     Agency8     0.93000        0.8800       0.88000

32     PDIF     Agency8     0.00000        0.0000       0.00000

33     FY11     Agency9     0.93000        0.8600       0.86000

34     FY12     Agency9     0.93000        0.8600       0.93000

36     PDIF     Agency9     0.00000        0.0000       8.13953

37     FY11     Agency10    0.96000        0.8700       0.87000

38     FY12     Agency10    1.00000        0.8700       0.87000

40     PDIF     Agency10    4.16667        0.0000       0.00000

Message was edited by: data _null_

RobertNYC
Obsidian | Level 7

Hey Data_null_, thanks so much. Question: I have never seen these statements used in a proc sort statement before.  


proc sort sortseq=linguistic(numeric_collation=on);

   by agency;

   run;


What do they mean and do?


Thanks!

data_null__
Jade | Level 19

http://support.sas.com/documentation/cdl/en/proc/63079/HTML/default/viewer.htm#p02bhn81rn4u64n1b6l00...

Sorts AGENCY where 10 follows 9 not 1.    Very usefull for sorting SAS Enumerated Variable names.

Usually I would not include the redundent "Agency" text in the value of the variable AGENCY.

RobertNYC
Obsidian | Level 7

Thanks everyone!

Last question: Is there a way to transpose the data set after Data_null_'s step is run, so it looks like this.

Agency FY11_Safety_Score      FY12_Safety_Score      PDIF     FY11_Recreation_ScoreFY12_Recreation_ScorePDIF     FY11_Contacts_ScoreFY12_Contacts_ScorePDIF    
Agency1 98.00%98.00%0.00%95.00%93.00%-2.00%95.00%93.00%-2.00%
Agency2 100.00%100.00%0.00%96.00%96.00%0.00%96.00%96.00%0.00%
Agency3 94.00%94.00%0.00%94.00%94.00%0.00%94.00%95.00%1.00%

THANKS!!!

data_null__
Jade | Level 19

proc transpose data=outall out=tall;

   where _type_ ne 'DIF';

   by agency _type_ notsorted;

   var &var;

   run;

proc transpose data=tall out=wide delim=_;

   by agency notsorted;

   var col1;

   id _type_ _name_;

   format _type_ $tcode.;

   run;

data_null__
Jade | Level 19

These won't be in same order as your example.  Is that important?  Because it can be done with a bit more code of course.

RobertNYC
Obsidian | Level 7

Yes, they do actually have to be in that order. Thoughts?

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 20 replies
  • 4362 views
  • 0 likes
  • 7 in conversation