BookmarkSubscribeRSS Feed
altadata1
Obsidian | Level 7

Hello all, 

 

I'm using this macro syntax which works very well :

 

%macro QWR(var=,sheet=);

                                           DATA MD_QWR_GWG; SET MD_pooledyrsF; KEEP md_compyr sex lnwage realwages real_wklyearn md_stdwght_Pool

                                                                                                                                                                                            &var;

                                          RUN;

 

                                           PROC SORT DATA = MD_QWR_GWG; by sex md_compyr &var; run;

                                           PROC MEANS DATA = MD_QWR_GWG NOPRINT;

                                                                        VAR lnwage realwages real_wklyearn ;

                                                                        BY sex md_compyr &var;

                                                                        WEIGHT md_stdwght_Pool;

                                                                        OUTPUT OUT = wages MEAN(lnwage realwages real_wklyearn ) = lnwage hourly_earn weekly_earn;

                                           RUN;

 

                                          DATA men women;

                                                          SET wages;

                                                          IF sex = 1 THEN output men;

                                                          ELSE IF sex = 2 then output women;

                                           RUN;

 

                                           DATA MEN2; SET men;  lnwage_M = lnwage; hrly_M = hourly_earn; wkly_M = weekly_earn; Nobs_M = _FREQ_;

                                                                                                                                 KEEP md_compyr lnwage_M hrly_M  wkly_M Nobs_M &var; RUN;

 

                                           DATA WOMEN2; SET women; lnwage_F = lnwage; hrly_W = hourly_earn; wkly_W = weekly_earn; Nobs_W = _FREQ_;

                                                                                                                                 KEEP md_compyr lnwage_F hrly_W wkly_W Nobs_W &var; RUN;

 

                                           DATA GWG_QWR_All;

                                                          MERGE MEN2 WOMEN2; by md_compyr;

                                                          LOG_GAP = lnwage_M - lnwage_F;

                                                          GWR_hrly = hrly_W / hrly_M; /* gender ratio */

                                                          GWR_wkly = wkly_W / wkly_M;

                                                          GWG_hrly = 1 - GWR_hrly;          /* gender gaps */

                                                          GWG_wkly = 1 - GWR_wkly;

                                           RUN;

                                           PROC PRINT DATA = GWG_QWR_ALL; TITLE ' Gender wage ratio, by demographic and job related characteristics, 2007-8 and 2021-2022 - QWR_All';  RUN;

 

                                           /* saves as an excel spreadsheet */

 

                                           PROC EXPORT DATA= GWG_QWR_All

                                                       OUTFILE="my_data\QWR\All.xlsx"

                                                       DBMS=EXCELCS REPLACE;

                                                                            SHEET = "&sheet.";

                                           RUN;

 

                             %mend QWR;

                             options symbolgen;

                             %QWR(var=AGEGROUP5yr, sheet=AGEGROUP5yr);

                             %QWR(var=md_educlev,     sheet=md_educlev);

                             %QWR(var=dv_regionalt,   sheet=dv_regionalt);

                             %QWR(var=dv_ROQUE,       sheet=dv_ROQUE);

                             %QWR(var=dv_ontR,        sheet=dv_ontR);

 

Then, I add another variable (Group4) to the BY statement. 

 

              /** By 4-group population **/                                                                                                                                        

                             %macro QWR_gr(var=,sheet=);

                                           DATA MD_QWR_GWG; SET MD_pooledyrsF; KEEP md_compyr sex GROUP4 lnwage realwages real_wklyearn md_stdwght_Pool

                                                                                                                                                                                            &var;

                                          RUN;

 

                                           PROC SORT DATA = MD_QWR_GWG; by sex md_compyr GROUP4 &var; run;

                                           PROC MEANS DATA = MD_QWR_GWG NOPRINT;

                                                                        VAR lnwage realwages real_wklyearn ;

                                                                        BY sex md_compyr GROUP4 &var;

                                                                        WEIGHT md_stdwght_Pool;

                                                                        OUTPUT OUT = wages MEAN(lnwage realwages real_wklyearn ) = lnwage hourly_earn weekly_earn;

                                           RUN;

 

                                          DATA men women;

                                                          SET wages;

                                                          IF sex = 1 THEN output men;

                                                          ELSE IF sex = 2 then output women;

                                           RUN;

 

                                           DATA MEN2; SET men;  WHERE Group4=1 ; lnwage_M = lnwage; hrly_M = hourly_earn; wkly_M = weekly_earn; Nobs_M = _FREQ_;

                                                                                                                                 KEEP md_compyr GROUP4 lnwage_M hrly_M  wkly_M Nobs_M &var; RUN;

 

                                           DATA WOMEN2; SET women; lnwage_F = lnwage; hrly_W = hourly_earn; wkly_W = weekly_earn; Nobs_W = _FREQ_;

                                                                                                                                 KEEP md_compyr GROUP4 lnwage_F hrly_W wkly_W Nobs_W &var; RUN;

 

                                           DATA GWG_QWR_group;

                                                          MERGE MEN2 WOMEN2; by md_compyr;

                                                          LOG_GAP = lnwage_M - lnwage_F;

                                                          GWR_hrly = hrly_W / hrly_M; /* gender ratio */

                                                          GWR_wkly = wkly_W / wkly_M;

                                                          GWG_hrly = 1 - GWR_hrly;          /* gender gaps */

                                                          GWG_wkly = 1 - GWR_wkly;

                                           RUN;

                                           PROC PRINT DATA = GWG_QWR_group; TITLE ' Gender wage ratio, by demographic and job related characteristics, 2007-8 and 2021-2022 - QWR_group';  RUN;

 

                                           /* saves as an excel spreadsheet */

                                           PROC EXPORT DATA= GWG_QWR_group

                                                       OUTFILE="my_data\QWR\group.xlsx"

                                                       DBMS=EXCELCS REPLACE;

                                                                            SHEET = "&sheet.";

                                           RUN;

 

                             %mend QWR_gr;

                             options symbolgen;

                             %QWR_gr(var=AGEGROUP5yr, sheet=AGEGROUP5yr);

                             %QWR_gr(var=md_educlev,     sheet=md_educlev);

                             %QWR_gr(var=dv_regionalt,   sheet=dv_regionalt);

                             %QWR_gr(var=dv_ROQUE,       sheet=dv_ROQUE);

                             %QWR_gr(var=dv_ontR,        sheet=dv_ontR);

   

Here, the data looks like this:

 

OBS          md_compyr       Group4      AGEGROUP5yr    lnwage_M     hrly_M .............

1                2007                  1                1                              4.91          28.22

2                2007                  1                2                              2.17          17.11

3                2007                  1                3                              3.45          29.89

4                2007                  1                4                              3.55          28.63

5                2007                  1                5                              3.12          28.03

6                2007                  1                6                              2.98          20.13

7                2007                  1                7                              3.55          28.63

8                2007                  2                1                              3.55          28.63

9                2007                  2                2                              3.55          28.63

10              2007                  2                3                              3.55          28.63

11              2007                  2                4                              3.55          28.63

12              2007                  2                5                              3.55          28.63

13              2007                  2                6                              3.55          28.63

14              2007                  2                7                              3.55          28.63

15              2008                  1                1                              3.91          30.22

16              2008                  1                2                              3.50          30.43

17              2008                  1                3                              3.19          29.80

18              2008                  1                4                              2.89          27.63

19              2008                  1                5                              3.22          28.13

20              2008                  1                6                              2.88          20.10

21              2008                  1                7                              3.25          28.13

22              2008                  2                1                              3.25          28.13

23              2008                  2                2                              3.25          28.13

24              2008                  2                3                              3.25          28.13

25              2008                  2                4                              3.25          28.13

26              2008                  2                5                              3.25          28.13

27              2008                  2                6                              3.25          28.13

28              2008                  2                7                              3.25          28.13

 

 

for each year group, the estimations of age groups are repeated after first category of 7. It's the same for other variables. 

Could somebody tell me why?     

Thanks.  

Note: It has nothing to do with the macro. I tested it. 

12 REPLIES 12
PaigeMiller
Diamond | Level 26

for each year group, the estimations of age groups are repeated after first category of 7. It's the same for other variables.

 

There are 7 values for age group, and 2 values for variable GROUP4, that's why.

 

Note: It has nothing to do with the macro. I tested it.

 

I don't really know what this means. I don't know what you tested or how you tested it, but it does have to do with the macro, the code in the macro tells the macro to do this.


From now on, please copy and paste your SAS code into the window that appears when you click on the "little running man" icon. Please consider this mandatory from now on.

--
Paige Miller
altadata1
Obsidian | Level 7

The first macro works perfectly fine. The issue is with THE SECOND MACRO, and the only difference between two syntax is adding the variable "Group4" to BY statement. 

By saying "I tested it", I mean I ran the syntax by replacing "&var" with "AGEGROUP5yr" and no macro. I get the same issue with the output. 

Thank you for the advice on SAS box. 

 

 

altadata1
Obsidian | Level 7

"There are 7 values for age group, and 2 values for variable GROUP4, that's why"

 

I have 3 values for "md_comyr"  (I didn't show full output. The md_compyr has 3 categories of 2007, 2008 and 2021) , and it is perfectly fine with the first macro. 

 

Thanks. 

 

altadata1
Obsidian | Level 7

I have the same problem with this syntax- when I said I had replaced &var with AGEGROUP5yr and no macro. 

DATA MD_QWR_GWG; 
   SET MD_pooledyrsF; 
   KEEP md_compyr sex MD_GROUP4 lnwage realwages real_wklyearn md_stdwght_Pool MD_AGEGROUP5yr;
RUN;

   PROC SORT DATA = MD_QWR_GWG; by sex md_compyr MD_GROUP4 MD_AGEGROUP5yr; 
   RUN; 
   PROC MEANS DATA = MD_QWR_GWG NOPRINT;
			     VAR lnwage realwages real_wklyearn ;
			     BY sex md_compyr MD_GROUP4 MD_AGEGROUP5yr;
			     WEIGHT md_stdwght_Pool;
			     OUTPUT OUT = wages MEAN(lnwage realwages real_wklyearn ) = lnwage hourly_earn weekly_earn;
   RUN;

		 	DATA men women; 
				SET wages;
				IF sex = 1 THEN output men; 
				ELSE IF sex = 2 then output women;
			RUN;

			DATA MEN2; 
SET men;
WHERE MD_Group4=1 ;
lnwage_M = lnwage;
hrly_M = hourly_earn;
wkly_M = weekly_earn;
Nobs_M = _FREQ_; KEEP md_compyr MD_GROUP4 MD_AGEGROUP5yr lnwage_M hrly_M wkly_M Nobs_M ;
RUN; DATA WOMEN2;
SET women;
lnwage_F = lnwage;
hrly_W = hourly_earn;
wkly_W = weekly_earn;
Nobs_W = _FREQ_; KEEP md_compyr MD_GROUP4 MD_AGEGROUP5yr lnwage_F hrly_W wkly_W Nobs_W ;
RUN; DATA GWG_QWR_group; MERGE MEN2 WOMEN2; by md_compyr; LOG_GAP = lnwage_M - lnwage_F; GWR_hrly = hrly_W / hrly_M; /* gender ratio */ GWR_wkly = wkly_W / wkly_M; GWG_hrly = 1 - GWR_hrly; /* gender gaps */ GWG_wkly = 1 - GWR_wkly; RUN; PROC PRINT DATA = GWG_QWR_group; RUN; /* saves as an excel spreadsheet */ PROC EXPORT DATA= GWG_QWR_group OUTFILE="\\fld8filer\Research\Mandana\IWG\QWR\test\test3.xlsx" DBMS=EXCELCS REPLACE; RUN;
Tom
Super User Tom
Super User

So you generate means by a number of variables:

PROC MEANS DATA = MD_QWR_GWG NOPRINT;
...
  BY sex md_compyr MD_GROUP4 MD_AGEGROUP5yr;
 ...

Split the data into MEN and WOMEN .

And then attempt to MERGE them back using only ONE of the original BY variables.

DATA GWG_QWR_group;
  MERGE MEN2 WOMEN2; 
  by md_compyr;
...

That last step needs to merge by the SAME variables as you used in the PROC MEANS (except for SEX since you removed that by splitting the data into MEN and WOMEN datasets).

 BY md_compyr MD_GROUP4 MD_AGEGROUP5yr;

PS Do you have a cat that likes to walk across your keyboard but somehow only steps on the space bar?

ballardw
Super User

Your code references 5 data sets after proc means. Which one are you talking about.

What does the input data look like, as in number of unique levels for each variable used on the BY statement?

 

Very Strong Hint: Supply a small example data set and what the result should look like.

 

Does your log show a message similar to this:

 

NOTE: MERGE statement has more than one data set with repeats of BY values.

That is very likely from your code.

 

 

When you use this:

 PROC MEANS DATA = MD_QWR_GWG NOPRINT;
      VAR lnwage realwages real_wklyearn ;
       BY sex md_compyr &var;

So you have at least 2 more variables than md_compyr in the results.

 

Then you combine those results with

 

 DATA GWG_QWR_All;
     MERGE MEN2 WOMEN2; 
    by md_compyr;

so for every value of md_compyr is going to be repeated based on the number of levels of sex and any variable passed as your macro. You might also look at the way you make Men2 and Women2 in your second set of code as there is WHERE that is applied to MEN that is not applied to Women. So the Women2 data set likely has more records than the Men2 so the merge behaves differently than expected. I suggest that you reconsider that style of coding with multiple statements on a single line as it can be much harder to miss differences like this.

 

 

DATA MEN2; 
SET men; WHERE Group4=1 ; lnwage_M = lnwage;
hrly_M = hourly_earn; wkly_M = weekly_earn; Nobs_M = _FREQ_; KEEP md_compyr GROUP4 lnwage_M hrly_M wkly_M Nobs_M &var; RUN; DATA WOMEN2; SET women; lnwage_F = lnwage; hrly_W = hourly_earn; wkly_W = weekly_earn; Nobs_W = _FREQ_; KEEP md_compyr GROUP4 lnwage_F hrly_W wkly_W Nobs_W &var; RUN;

 

 

The data step merge when there are repeats of by group values in both data sets, as your Men2 and Women2 when there are different numbers of records is unlikely to get the desired result.

 

Suggestion: provide a small example data set, or use a data set like SASHELP.Class and show what you expect for a result.

 

Forum suggestion: Paste code into text or code box opened on the forum with the </> or "running man" icons that appear above the message window. The way the forum treats tabs makes your code pretty ugly and hard to follow.

altadata1
Obsidian | Level 7

"Your code references 5 data sets after proc means. Which one are you talking about." 

      Sorry, I don't understand what you exactly mean. 

"What does the input data look like, as in number of unique levels for each variable used on the BY statement?"

Something like this (each row is an individual observation)

 

md_compyr       sex        Groups4     AGEGROUP5yr  ........................

2007                   1              1                  1

2007                   1              1                  1

2007                   1              1                  1

2007                   1              1                  1

2007                   1              1                  1

2007                   1              1                  1

2007                   1              1                  1

2007                   1              1                  2

2007                   1              1                  2

2007                   1              1                  2

2007                   1              1                  2

2007                   1              1                  2

2007                   1              1                  2

2007                   1              1                  3

2007                   1              1                  3

2007                   1              1                  3

2007                   1              1                  3

2007                   1              1                  3

2007                   1              1                  3

2007                   1              1                  3

.

.

.

2007                   1              1                  7

2007                   1              1                  7

2007                   1              1                  7

2007                   1              1                  7

2007                   1              1                  7

.

.

.

.

.

2007                   1              2                  1

2007                   1              2                  1

2007                   1              2                  1

2007                   1              2                  1

2007                   1              2                  1

2007                   1              2                  1

2007                   1              2                  1

2007                   1              2                  2

2007                   1              2                  2

2007                   1              2                  2

2007                   1              2                  2

2007                   1              2                  2

2007                   1              2                  2

2007                   1              2                  3

2007                   1              2                  3

2007                   1              2                  3

2007                   1              2                  3

2007                   1              2                  3

2007                   1              2                  3

2007                   1              2                  3

.

.

.

2007                   1              2                 7

2007                   1              2                  7

2007                   1              2                  7

2007                   1              2                  7

2007                   1              2                  7

 

then the category of "2008"  for md_compyr.  After the obs for "2021", it starts with sex 2, 

Sorry, I'm not allowed to share the dataset. 

altadata1
Obsidian | Level 7

'Does your log show a message similar to this:

 

NOTE: MERGE statement has more than one data set with repeats of BY values.

That is very likely from your code.-'

 

you mean after following your "strong hit"?

altadata1
Obsidian | Level 7

I have only group=1 for men dataset, because i calculate the rates for women/men while only values for men in group1 are the denominator. As I mentioned, the same syntax works very well without "Group4". It happens when I add Groups4 to the BY statement.  

altadata1
Obsidian | Level 7

You are right. It's about the group4. I have only group=1 in Men2 but group=1 and group=2 in Women2,  and it is problematic once I merge men2 and women2 .The values of lnwage_M hrly_M and wkly_M must be repeated for calculation of rates for group=2 in women2.  I cannot figure out how.  

ballardw
Super User

@altadata1 wrote:

You are right. It's about the group4. I have only group=1 in Men2 but group=1 and group=2 in Women2,  and it is problematic once I merge men2 and women2 .The values of lnwage_M hrly_M and wkly_M must be repeated for calculation of rates for group=2 in women2.  I cannot figure out how.  


Which is why I suggested showing us some data and the expected results.

I am not really sure what you intend.

Did you have working code without the macros? That is the first step in any macro development: make sure everything works without the macro language and then turn that into a macro.

altadata1
Obsidian | Level 7

Hello. Sorry I got sick and was away from work for two weeks. 

What I need to do is to calculate the wage ratio for women in test group 1 and test group 2 vs men in test group 1. I'm working on a code to work properly and them will create a macro. But I couldn't come up with a proper code yet. 

 

Thanks. 

 

 

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 1236 views
  • 0 likes
  • 4 in conversation