I have two data sets with common id variable. The first one has car_type and the second one has a flag variable. My goal is to use PROC SQL to find the number of car_type is "A" (in data 1) and flag = 1 (in data 2) by the id. For example, for the data sets below, the count would be 1 since there are id = 1, 2, 6 with car_type of A (in data 1) and id = 1 with flag = 1 (in data 2).
Thanks, see data code below:
data old_1;
input id car_type $;
datalines;
1 A
2 A
3 B
4 C
5 D
6 A
;
data old_2;
input id flag;
datalines;
1 1
2 0
3 1
4 1
;
You are amost there indeed. But notice the note
NOTE: The query requires remerging summary statistics back with the original data.
in the log? It says that you have asked for a summary function (count) but also for a variable that is not summarized (id). So the count has to be copied next to every value of id in the result. Just remove the mention of id, and the problem disappears.
proc sql;
create table test as
select
count(*) as count
from old_2
where id in (select id from old_1 where car_type = 'A') and flag
;
quit;
(I also removed unnecessary aliases)
Let's start with what you have tried so far...
Since this is clearly homework or practice you should give it an attempt first, or at least show what you've tried if you've already done that.
You need to join the data by ID
See this example:
And then filter via a WHERE statement.
@sasecn wrote:
I have two data sets with common id variable. The first one has car_type and the second one has a flag variable. My goal is to use PROC SQL to find the number of car_type is "A" (in data 1) and flag = 1 (in data 2) by the id. For example, for the data sets below, the count would be 1 since there are id = 1, 2, 6 with car_type of A (in data 1) and id = 1 with flag = 1 (in data 2).
Thanks, see data code below:
data old_1; input id car_type $; datalines; 1 A 2 A 3 B 4 C 5 D 6 A ; data old_2; input id flag; datalines; 1 1 2 0 3 1 4 1 ;
I tried and almost there. Just haven't found a way to show the count only. So far, the code works but always showing the result as a table not a single number.
data old_1;
input id car_type $;
datalines;
1 A
2 A
3 B
4 C
5 D
6 A
;
data old_2;
input id flag;
datalines;
1 1
2 0
3 1
4 1
;
proc sql;
create table test as
select
count(*) as count,
t1.id
from old_2 t1
where t1.id in (select t2.id from old_1 t2 where car_type = 'A') and t1.flag
;
quit;
You are amost there indeed. But notice the note
NOTE: The query requires remerging summary statistics back with the original data.
in the log? It says that you have asked for a summary function (count) but also for a variable that is not summarized (id). So the count has to be copied next to every value of id in the result. Just remove the mention of id, and the problem disappears.
proc sql;
create table test as
select
count(*) as count
from old_2
where id in (select id from old_1 where car_type = 'A') and flag
;
quit;
(I also removed unnecessary aliases)
Great! Thanks.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.