Solved: Count number of groups and add to data set

Jason2020 · Posted 03-09-2021 05:07 PM

Hi,

I have a large data set of 17,033 subjects originating of a survey of schools. The data shown below is part of the data set, and it includes the subject id (ID), province code (PROVINCE), school code (SCHOOL), the gender of the student (GENDER), and the total number of schools in each province (TOT_SCHOOL). In order to calculate the sampling probabilities I need to add to this data set 2 new variables: number of schools sampled in each province (N_SCHOOL), and proportion of subjects of each gender within each province (PROP_GENDER). The number of schools sampled in each province varies. Below I include the HAVE data set, and the WANT data set. The values of N_SCHOOL and PROP_GENDER are from Proc Freq (tables shown below) and entered manually.

I greatly appreciate any programming help (proc sql or otherwise) so that I can apply the solution to this data and future surveys.

Data HAVE

Obs ID province School Gender tot_school

1 11189 Aga ElHli 2 53

2 11190 Aga ElHli 2 53

3 11191 Aga ElHli 2 53

4 11192 Aga ElHli 2 53

5 11193 Aga ElHli 2 53

6 11194 Aga ElHli 2 53

7 11195 Aga ElHli 2 53

8 11196 Aga ElHli 2 53

9 11197 Aga ElHli 2 53

10 11198 Aga ElHli 2 53

.

17031 3844 kal Tast 1 46

17032 3845 kal Tast 2 46

17033 3846 kal Tast 1 46

17034 3847 kal Tast 2 46

Data WANT

Obs ID province School Gender tot_school n_school prop_gender

1 11189 Aga ElHli 2 53 3 41.91

2 11190 Aga ElHli 2 53 3 41.91

3 11191 Aga ElHli 2 53 3 41.91

4 11192 Aga ElHli 2 53 3 41.91

5 11193 Aga ElHli 2 53 3 41.91

6 11194 Aga ElHli 2 53 3 41.91

7 11195 Aga ElHli 2 53 3 41.91

8 11196 Aga ElHli 2 53 3 41.91

9 11197 Aga ElHli 2 53 3 41.91

10 11198 Aga ElHli 2 53 3 41.91

.

17031 3844 kal Tast 1 46 3 49.89

17032 3845 kal Tast 2 46 3 50.11

17033 3846 kal Tast 1 46 3 50.11

17034 3847 kal Tast 2 46 3 49.89

proc freq;

tables province * gender/nofreq nopercent nocol;

run;

Table of province by Gender

Province Gender

Row Pct ‚ 1‚ 2‚ Total

---------------------------------------------------------

Aga ‚ 58.09 ‚ 41.91 ‚

Ben ‚ 50.50 ‚ 49.50 ‚

Ber ‚ 48.47 ‚ 51.53 ‚

.

kal ‚ 49.89 ‚ 50.11 ‚

-------------------------------------------------------

Total 8130 8903 17033

proc freq;

tables province * school /list nofreq nocol;

run;

The FREQ Procedure

Cumulative Cumulative

province School Frequency Percent Frequency Percent

------------------------------------------------------------------------------------------

Aga ElHli 200 1.17 200 1.17

Aga Lycla 498 2.92 698 4.10

Aga Nab 242 1.42 940 5.52

Ben Ibndoun 374 2.20 1314 7.71

Ben Siddim 89 0.52 1403 8.24

Ben ZIA 439 2.58 1842 10.81

Ber Elfa 229 1.34 2071 12.16

Ber Elq 238 1.40 2309 13.56

Ber Ibn 220 1.29 2529 14.85

.

kal Abo 119 0.70 16710 98.10

kal Kha 224 1.32 16934 99.41

kal Tast 100 0.59 17034 100.00

PhilC · Posted 03-12-2021 02:19 PM

Ill give an example of your latter statistic, prop_gender. Since you are familiar with PROG FREQ, if you want your numbers that the procedure outputs, you may use these options:

proc freq data=SASHELP.CLASS;
  tables age * sex/ noprint OUTPCT 
                    out=result(where=(sex="M") 
                               drop=count PERCENT	PCT_COL
                              ) ;
run;

And this data set you can merge in with your original data.

Age	Sex	PCT_ROW
11	M	50
12	M	60
13	M	33.33333
14	M	50
15	M	50
16	M	100

View solution in original post

PhilC · Posted 03-12-2021 02:19 PM

Ill give an example of your latter statistic, prop_gender. Since you are familiar with PROG FREQ, if you want your numbers that the procedure outputs, you may use these options:

proc freq data=SASHELP.CLASS;
  tables age * sex/ noprint OUTPCT 
                    out=result(where=(sex="M") 
                               drop=count PERCENT	PCT_COL
                              ) ;
run;

And this data set you can merge in with your original data.

Age	Sex	PCT_ROW
11	M	50
12	M	60
13	M	33.33333
14	M	50
15	M	50
16	M	100

Jason2020 · Posted 03-18-2021 10:07 AM

Thank you very much.

What is the equivalent of this in Proc SQL?

PhilC · Posted 03-12-2021 02:24 PM

For you N_School stat, which is the cardinality of School within each Province, I'll use SASHELP.SHOES :

PROC SQL;
   SELECT Region, 
          (COUNT(DISTINCT(Subsidiary))) AS COUNT_DISTINCT_of_Subsidiary
      FROM SASHELP.SHOES 
      GROUP BY Region;
QUIT;

And you can check the dataset to verify these numbers are true counts:

Region	COUNT_DISTINCT_of_Subsidiary
Africa	8
Asia	3
Canada	5
Central America/Caribbean	4
Eastern Europe	4
Middle East	3
Pacific	6
South America	7
United States	5
Western Europe	8

Count number of groups and add to data set

Re: Count number of groups and add to data set

Re: Count number of groups and add to data set

Re: Count number of groups and add to data set

Re: Count number of groups and add to data set

Count number of groups and add to data set

Re: Count number of groups and add to data set

Re: Count number of groups and add to data set

Re: Count number of groups and add to data set

Re: Count number of groups and add to data set

SAS Innovate 2025: Call for Content

Click image to register for webinar

Classroom Training Available!