Data have
S. id Subject
1 Math
1 English
1 Biology
2 Math
2 Math
2 Math
3 English
3 Math
3 French
4 Science
4 English
4 History
5 Math
6 Business
6 French
7 Math
7 Math
Want
Students taking only math, no other subjects
Subject N
Math 3
And how to get this:
ID Math
1 0
2 1
3 0
4 0
5 1
6 0
7 1
@dakshu92 wrote:
What I have done till now:
data want;
set have;
if subject=math then math=1;
else math=0;
run;
For count:
PROC SQL;
create table math_only as
select id
min(math) as onlymath
from have
group by id;
quit;
proc freq data=math_only;
tables only math;
run;
With this I am getting the count but not sure if it only taking the ids taking only math.
I see some typos in your code that should be generating errors, but I believe the logic is correct.
data want;
set have;
if subject='Math' then math=1; /*math needs to be in quotes, and capitalized*/
else math=0;
run;
PROC SQL;
create table math_only as
select id
,min(math) as onlymath /*need a comma between columns*/
from want
group by id;
quit;
proc freq data=math_only;
tables onlymath;
run;
This feels very much like an exercise and as such we shouldn't just provide the full answer. What have you tried so far? What approaches can you think of?
It's also o.k. to post some not yet working code and ask for help.
What I have done till now:
data want;
set have;
if subject=math then math=1;
else math=0;
run;
For count:
PROC SQL;
create table math_only as
select id
min(math) as onlymath
from have
group by id;
quit;
proc freq data=math_only;
tables only math;
run;
With this I am getting the count but not sure if it only taking the ids taking only math.
Your IF statement is not going to work right as posted.
if subject=math
is testing if the value of the variable named SUBJECT matches the value of the variable named MATH.
But your dataset does not have a variable named MATH. It does have some values of SUBJECT that contain the string Math. But it does not have any values of SUBJECT that would match the string math.
So code:
if subject='Math' then math=1;
else math=0;
Or since SAS will evaluate boolean expressions to 1 for TRUE and 0 for FALSE you could just use:
math = (subject='Math');
@dakshu92 wrote:
What I have done till now:
data want;
set have;
if subject=math then math=1;
else math=0;
run;
For count:
PROC SQL;
create table math_only as
select id
min(math) as onlymath
from have
group by id;
quit;
proc freq data=math_only;
tables only math;
run;
With this I am getting the count but not sure if it only taking the ids taking only math.
I see some typos in your code that should be generating errors, but I believe the logic is correct.
data want;
set have;
if subject='Math' then math=1; /*math needs to be in quotes, and capitalized*/
else math=0;
run;
PROC SQL;
create table math_only as
select id
,min(math) as onlymath /*need a comma between columns*/
from want
group by id;
quit;
proc freq data=math_only;
tables onlymath;
run;
Thank you everyone! I was able to get the count of the subject, however how do I make a new dummy variable?
What I tried:
data want;
set have;
if math=max(math) then math_first = 1;
else math_first=0;
run;
Your current code is just going to set MATH_FIRST to 1 on every observation.
That is because you are comparing the current value of the variable MATH (which will be created as missing if it does not already exist in HAVE) to the current value of the variable MATH. That is because the MAX() function is for taking the largest value from the list of values you are passing it. For example the if you called MAX() like this:
biggest = max(10,20,30,40);
then BIGGEST will be set to 40 since it larger than any of 10 , 20 or 30.
Since you only passed in the value of the variable MATH then by definition the largest value of that single value you passed in is going to be the same single value.
What do you want the new variable to indicate? Can you describe in words what you want? Can you create an example input dataset and show the values you want for MATH_FIRST on every observation of that input data?
Thanks for your help! I actually figured it out. The code I used earlier worked.
data have;
input id Subject :$9. ;
datalines;
1 Math
1 English
1 Biology
2 Math
2 Math
2 Math
3 English
3 Math
3 French
4 Science
4 English
4 History
5 Math
6 Business
6 French
7 Math
7 Math
run;
data math_only;
merge have (where=(subject='Math') in=inmath )
have (where=(subject^='Math') in=notmath) ;
by id;
if last.id;
math= (inmath=1 and notmath=0);
run;
There are many ways to solve this problem:
proc freq data=have noprint;
table Id * Subject / out=Math(drop= Percent where= (Subject = 'Math') rename= (Count = Math));
table Id / out=Total(drop= Percent rename= (Count = Total));
run;
data want2;
merge Total Math;
by Id;
Math = Total = Math;
drop Total Subject;
run;
data have;
input id Subject $;
cards;
1 Math
1 English
1 Biology
2 Math
2 Math
2 Math
3 English
3 Math
3 French
4 Science
4 English
4 History
5 Math
6 Business
6 French
7 Math
7 Math
;
proc sql;
create table want as
select id,count(Subject)=sum(Subject='Math') as Math
from have
group by id;
quit;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.