I have a dataset that consists of 10-12 variables. The relevant variables are student IDs, college information, specialization and deployment time. The data look like as follows :
ID College Specialization Time
1 ABC Java 1
2 ABC SAS 3
3 XYZ Java 4
4 XYZ SAS 4
5 XYZ SAS 3
6 FAB Java 4
7 FAB SPSS 10
8 FAB SAS 3
Task is to calculate average time for unique names of college and specialization. If the average time for college is less than or equal to 3, we create a new column named college_flag and set it to 1 else 0.
The same logic applies to specialization. If the average time for specialization is less than or equal to 3, we create a new column named specializatiom_flag and set it to 1 else 0. How it is possible via PROC SQL or Data Step?
I would actually use proc summary as it will provide multiple levels of the summaries and then go through a data step to add the flag.
the data summary would look:
proc summary data=have;
class college specialization;
var time;
output out=CollegeSummary mean=MeanTime;
run;
There will be a variable called _type_ that is set for different combinations of the class variables. Look at the data and either drop combinations your aren't interested in or add the flags;
data want;
set CollegeSummary (where=(_type_ in (1,2)));
if _type_=1 then specilization_flag=(meanTime le 3);
If _type_=2 then college_flag = (meantime le 3);
run;
Thanks a bunch for you comment. It creates multiple datasets for it. I want these two columns to be added to the master file (original dataset file) only . I must apologize for not showing the desired output while explaining the problem. The result should look like as follows :
ID College Specialization Time specialization_flag college_flag
1 ABC Java 1 1 1
2 ABC SAS 3 0 1
3 XYZ Java 4 1 0
4 XYZ SAS 4 0 0
5 XYZ SAS 3 0 0
6 FAB Java 4 1 0
7 FAB SPSS 10 0 0
8 FAB SAS 3 0 0
Try this !
data y1;
input id college $ speci $ time;
cards;
1 ABC Java 1
2 ABC SAS 3
3 XYZ Java 4
4 XYZ SAS 4
5 XYZ SAS 3
6 FAB Java 4
7 FAB SPSS 10
8 FAB SAS 3
;
run;
proc sql;
create table y2 as
select id, college, speci, time,
avg(time) as avg_time,
case when calculated
avg_time<=3 then 1
else 0
end as college_flg
from y1
group by college;
quit;
proc sql;
create table y3(drop=avg_time) as
select id, college, speci, time,
college_flg, avg(time) as avg_time,
case when calculated
avg_time<=3 then 1
else 0
end as speci_flg
from y2
group by speci
order by id;
quit;
Final table y3 looks excatly like your desired output.
proc sql;
select a.id,a.college,a.Specialization, a.Time,case when b.avg_time_sp<=3 then 1 else 0 end as specialization_flag,
case when c.avg_time_col<=3 then 1 else 0 end as college_flag from have a
inner join (select Specialization,avg(time) as avg_time_sp
from have group by Specialization) b
on a.Specialization=b.Specialization
inner join (select college,avg(time) as avg_time_col
from have group by college) c
on a.college=c.college;
quit;
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.