Solved: Re: How to count students taking unique subjects

dakshu92 · Posted 08-14-2023 06:05 PM

Data have

S. id Subject

1 Math

1 English

1 Biology

2 Math

3 English

3 Math

3 French

4 Science

4 English

4 History

5 Math

6 Business

6 French

7 Math

Want

Students taking only math, no other subjects

Subject N

Math 3

And how to get this:

ID Math

1 0

2 1

3 0

4 0

5 1

6 0

7 1

Quentin · Posted 08-15-2023 10:04 AM

@dakshu92 wrote:

What I have done till now:

data want;

set have;

if subject=math then math=1;

else math=0;

run;

For count:

PROC SQL;

create table math_only as

select id

min(math) as onlymath

from have

group by id;

quit;

proc freq data=math_only;

tables only math;

run;

With this I am getting the count but not sure if it only taking the ids taking only math.

I see some typos in your code that should be generating errors, but I believe the logic is correct.

data want;
  set have;
  if subject='Math' then math=1;     /*math needs to be in quotes, and capitalized*/
  else math=0;
run;

PROC SQL;
  create table math_only as 
  select id
  ,min(math) as onlymath  /*need a comma between columns*/
  from want
  group by id;
quit;

proc freq data=math_only;
 tables onlymath;
run;

The Boston Area SAS Users Group is hosting free webinars!
Next up: Joe Madden & Joseph Henry present Putting Power into the Hands of the Programmer with SAS Viya Workbench on Wednesday Nov 6.
Register now at https://www.basug.org/events.

View solution in original post

Patrick · Posted 08-14-2023 09:23 PM

This feels very much like an exercise and as such we shouldn't just provide the full answer. What have you tried so far? What approaches can you think of?

It's also o.k. to post some not yet working code and ask for help.

dakshu92 · Posted 08-14-2023 09:45 PM

What I have done till now:

data want;

set have;

if subject=math then math=1;

else math=0;

run;

For count:

PROC SQL;

create table math_only as

select id

min(math) as onlymath

from have

group by id;

quit;

proc freq data=math_only;

tables only math;

run;

With this I am getting the count but not sure if it only taking the ids taking only math.

Tom · Posted 08-15-2023 09:18 AM

Your IF statement is not going to work right as posted.

if subject=math

is testing if the value of the variable named SUBJECT matches the value of the variable named MATH.

But your dataset does not have a variable named MATH. It does have some values of SUBJECT that contain the string Math. But it does not have any values of SUBJECT that would match the string math.

So code:

if subject='Math' then math=1;
else math=0;

Or since SAS will evaluate boolean expressions to 1 for TRUE and 0 for FALSE you could just use:

math = (subject='Math');

Quentin · Posted 08-15-2023 10:04 AM

@dakshu92 wrote:

What I have done till now:

data want;

set have;

if subject=math then math=1;

else math=0;

run;

For count:

PROC SQL;

create table math_only as

select id

min(math) as onlymath

from have

group by id;

quit;

proc freq data=math_only;

tables only math;

run;

With this I am getting the count but not sure if it only taking the ids taking only math.

I see some typos in your code that should be generating errors, but I believe the logic is correct.

data want;
  set have;
  if subject='Math' then math=1;     /*math needs to be in quotes, and capitalized*/
  else math=0;
run;

PROC SQL;
  create table math_only as 
  select id
  ,min(math) as onlymath  /*need a comma between columns*/
  from want
  group by id;
quit;

proc freq data=math_only;
 tables onlymath;
run;

The Boston Area SAS Users Group is hosting free webinars!
Next up: Joe Madden & Joseph Henry present Putting Power into the Hands of the Programmer with SAS Viya Workbench on Wednesday Nov 6.
Register now at https://www.basug.org/events.

dakshu92 · Posted 08-21-2023 07:28 PM

Thank you everyone! I was able to get the count of the subject, however how do I make a new dummy variable?

What I tried:

data want;

set have;

if math=max(math) then math_first = 1;

else math_first=0;

run;

Tom · Posted 08-21-2023 08:16 PM

Your current code is just going to set MATH_FIRST to 1 on every observation.

That is because you are comparing the current value of the variable MATH (which will be created as missing if it does not already exist in HAVE) to the current value of the variable MATH. That is because the MAX() function is for taking the largest value from the list of values you are passing it. For example the if you called MAX() like this:

biggest = max(10,20,30,40);

then BIGGEST will be set to 40 since it larger than any of 10 , 20 or 30.

Since you only passed in the value of the variable MATH then by definition the largest value of that single value you passed in is going to be the same single value.

What do you want the new variable to indicate? Can you describe in words what you want? Can you create an example input dataset and show the values you want for MATH_FIRST on every observation of that input data?

dakshu92 · Posted 08-21-2023 11:17 PM

Thanks for your help! I actually figured it out. The code I used earlier worked.

mkeintz · Posted 08-14-2023 10:09 PM

data have;
  input id  Subject :$9. ;
datalines;
1                      Math
1                      English
1                      Biology
2                      Math  
2                      Math
2                      Math  
3                      English
3                      Math
3                      French
4                      Science
4                      English
4                      History
5                      Math
6                      Business
6                      French
7                      Math
7                      Math
run;

data math_only;
  merge  have (where=(subject='Math') in=inmath )  
         have (where=(subject^='Math') in=notmath)  ;
  by id;
  if last.id;
  math= (inmath=1 and notmath=0);
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

andreas_lds · Posted 08-15-2023 06:11 AM

There are many ways to solve this problem:

one data step with by-group processing and retain
@mkeintz suggestion
two proc freqs and a merge
....

proc freq data=have noprint;
   table Id * Subject / out=Math(drop= Percent where= (Subject = 'Math') rename= (Count = Math));
   table Id / out=Total(drop= Percent rename= (Count = Total));
run;

data want2;
   merge Total Math;
   by Id;

   Math = Total = Math;

   drop Total Subject;
run;

Ksharp · Posted 08-15-2023 07:34 AM

data have;
input  id          Subject $;
cards;
1                      Math
1                      English
1                      Biology
2                      Math  
2                      Math
2                      Math  
3                      English
3                      Math
3                      French
4                      Science
4                      English
4                      History
5                      Math
6                      Business
6                      French
7                      Math
7                      Math
;

proc sql;
create table want as
select id,count(Subject)=sum(Subject='Math') as Math
 from have
  group by id;
quit;

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away