BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Jason2020
Obsidian | Level 7

Hi,

I have a large data set of 17,033 subjects originating of a survey of schools. The data shown below is part of the data set, and it includes the subject id (ID), province code (PROVINCE), school code (SCHOOL), the gender of the student (GENDER), and the total number of schools in each province (TOT_SCHOOL). In order to calculate the sampling probabilities I need to add to this data set 2 new variables: number of schools sampled in each province (N_SCHOOL), and proportion of subjects of each gender within each province (PROP_GENDER). The number of schools sampled in each province varies. Below I include the HAVE data set, and the WANT data set. The values of N_SCHOOL and PROP_GENDER are from Proc Freq (tables shown below) and entered manually.

I greatly appreciate any programming help (proc sql or otherwise) so that I can apply the solution to this data and future surveys.

 

Data HAVE

  Obs   ID   province  School   Gender  tot_school

    1    11189    Aga     ElHli         2        53

    2    11190    Aga     ElHli         2        53

    3    11191    Aga     ElHli         2        53

    4    11192    Aga     ElHli         2        53

    5    11193    Aga     ElHli         2        53

    6    11194    Aga     ElHli         2        53

    7    11195    Aga     ElHli         2        53

    8    11196    Aga     ElHli         2        53

    9    11197    Aga     ElHli         2        53

   10   11198    Aga     ElHli         2        53

.

.

17031     3844    kal     Tast          1        46

17032     3845    kal     Tast          2        46

17033     3846    kal     Tast          1        46

17034     3847    kal     Tast          2        46

 

 

Data WANT

 Obs   ID    province  School   Gender  tot_school n_school  prop_gender

    1    11189    Aga     ElHli         2        53         3         41.91

    2    11190    Aga     ElHli         2        53         3         41.91

    3    11191    Aga     ElHli         2        53         3         41.91

    4    11192    Aga     ElHli         2        53         3         41.91

    5    11193    Aga     ElHli         2        53         3         41.91

    6    11194    Aga     ElHli         2        53         3         41.91

    7    11195    Aga     ElHli         2        53         3         41.91

    8    11196    Aga     ElHli         2        53         3         41.91

    9    11197    Aga     ElHli         2        53         3         41.91

   10   11198    Aga     ElHli         2        53         3         41.91

.

.

17031     3844    kal     Tast          1        46         3         49.89

17032     3845    kal     Tast          2        46         3         50.11

17033     3846    kal     Tast          1        46         3         50.11

17034     3847    kal     Tast          2        46         3         49.89

 

proc freq;

tables province * gender/nofreq nopercent nocol;

run;

Table of province by Gender

Province             Gender

 

Row Pct          ‚       1‚       2‚  Total

---------------------------------------------------------

Aga              ‚  58.09 ‚  41.91 ‚

Ben              ‚  50.50 ‚  49.50 ‚

Ber              ‚  48.47 ‚  51.53 ‚

.

.

kal              ‚  49.89 ‚  50.11 ‚

-------------------------------------------------------

Total              8130     8903    17033

 

proc freq;

tables province * school /list nofreq nocol;

run;

 

The FREQ Procedure

                                                                Cumulative    Cumulative

province  School   Frequency   Percent   Frequency     Percent

------------------------------------------------------------------------------------------

Aga       ElHli            200        1.17           200         1.17

Aga       Lycla            498        2.92           698         4.10

Aga       Nab              242        1.42           940         5.52

Ben       Ibndoun       374        2.20          1314         7.71

Ben       Siddim          89        0.52          1403         8.24

Ben       ZIA              439        2.58          1842        10.81

Ber       Elfa             229        1.34          2071        12.16

Ber       Elq              238        1.40          2309        13.56

Ber       Ibn              220        1.29          2529        14.85

.

.

kal       Abo              119        0.70         16710        98.10

kal       Kha              224        1.32         16934        99.41

kal       Tast             100        0.59         17034       100.00

 

1 ACCEPTED SOLUTION

Accepted Solutions
PhilC
Rhodochrosite | Level 12

Ill give an example of your latter statistic, prop_gender.  Since you are familiar with PROG FREQ, if you want your numbers that the procedure outputs, you may use these options:

proc freq data=SASHELP.CLASS;
  tables age * sex/ noprint OUTPCT 
                    out=result(where=(sex="M") 
                               drop=count PERCENT	PCT_COL
                              ) ;
run;

 And this data set you can merge in with your original data.

Age Sex PCT_ROW
11 M 50
12 M 60
13 M 33.33333
14 M 50
15 M 50
16 M 100

View solution in original post

3 REPLIES 3
PhilC
Rhodochrosite | Level 12

Ill give an example of your latter statistic, prop_gender.  Since you are familiar with PROG FREQ, if you want your numbers that the procedure outputs, you may use these options:

proc freq data=SASHELP.CLASS;
  tables age * sex/ noprint OUTPCT 
                    out=result(where=(sex="M") 
                               drop=count PERCENT	PCT_COL
                              ) ;
run;

 And this data set you can merge in with your original data.

Age Sex PCT_ROW
11 M 50
12 M 60
13 M 33.33333
14 M 50
15 M 50
16 M 100
Jason2020
Obsidian | Level 7

Thank you very much.

What is the equivalent of this in Proc SQL?

PhilC
Rhodochrosite | Level 12

For you N_School stat, which is the cardinality of School within each Province, I'll use SASHELP.SHOES :

PROC SQL;
   SELECT Region, 
          (COUNT(DISTINCT(Subsidiary))) AS COUNT_DISTINCT_of_Subsidiary
      FROM SASHELP.SHOES 
      GROUP BY Region;
QUIT;

And you can check the dataset to verify these numbers are true counts:

Region COUNT_DISTINCT_of_Subsidiary
Africa 8
Asia 3
Canada 5
Central America/Caribbean 4
Eastern Europe 4
Middle East 3
Pacific 6
South America 7
United States 5
Western Europe 8

 

 

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 1495 views
  • 1 like
  • 2 in conversation