Help using Base SAS procedures

Data Manipulation

Reply
Regular Contributor
Posts: 182

Data Manipulation

I have a dataset that consists of 10-12 variables. The relevant variables are student IDs, college information, specialization and deployment time. The data look like as follows :

ID College Specialization Time

1  ABC      Java               1

2  ABC      SAS               3

3  XYZ       Java               4

4  XYZ       SAS               4

5  XYZ       SAS               3

6  FAB       Java               4

7  FAB      SPSS             10

8  FAB       SAS               3

Task is to calculate average time for unique names of college and specialization. If the average time for college is less than or equal to 3, we create a new column named college_flag and set it to 1 else 0.

The same logic applies to specialization. If the average time for specialization is less than or equal to 3, we create a new column named specializatiom_flag and set it to 1 else 0. How it is possible via PROC SQL or Data Step?

Super User
Posts: 10,871

Re: Data Manipulation

I would actually use proc summary as it will provide multiple levels of the summaries and then go through a data step to add the flag.

the data summary would look:

proc summary data=have;

     class college specialization;

     var time;

     output out=CollegeSummary mean=MeanTime;

run;

There will be a variable called _type_ that is set for different combinations of the class variables. Look at the data and either drop combinations your aren't interested in or add the flags;

data want;

     set CollegeSummary (where=(_type_ in (1,2)));

     if _type_=1 then specilization_flag=(meanTime le 3);

     If _type_=2 then college_flag = (meantime le 3);

    

run;

Regular Contributor
Posts: 182

Re: Data Manipulation

Thanks a bunch for you comment. It creates multiple datasets for it. I want these two columns to be added to the master file (original dataset file) only . I must apologize for not showing the desired output while explaining the problem. The result should look like as follows :

ID College Specialization Time specialization_flag college_flag

1  ABC      Java               1       1     1

2  ABC      SAS               3      0     1

3  XYZ       Java               4     1     0

4  XYZ       SAS               4      0   0

5  XYZ       SAS               3     0   0

6  FAB       Java               4      1  0

7  FAB      SPSS             10    0  0

8  FAB       SAS               3    0  0

Contributor
Posts: 22

Re: Data Manipulation

Try this !

data y1;

input id college $ speci $ time;

cards;

1 ABC Java 1

2 ABC SAS 3

3 XYZ Java 4

4 XYZ SAS 4

5 XYZ SAS 3

6 FAB Java 4

7 FAB SPSS 10

8 FAB SAS 3

;

run;

proc sql;

create table y2 as

select id, college, speci, time,
avg(time)
as avg_time,

case when calculated
avg_time<=
3 then 1

else 0

end as college_flg

from y1

group by college;

quit;

proc sql;

create table y3(drop=avg_time) as

select id, college, speci, time,
college_flg, avg(time)
as avg_time,

case when calculated
avg_time<=
3 then 1

else 0

end as speci_flg

from y2

group by speci

order by id;

quit;

Final table y3 looks excatly like your desired output.

Trusted Advisor
Posts: 1,222

Re: Data Manipulation

proc sql;

select a.id,a.college,a.Specialization, a.Time,case when b.avg_time_sp<=3 then 1 else 0 end as specialization_flag,

case when c.avg_time_col<=3 then 1 else 0 end as college_flag from have a

inner join (select Specialization,avg(time) as avg_time_sp

from have group by Specialization) b

on a.Specialization=b.Specialization

inner join (select college,avg(time) as avg_time_col

from have group by college) c

on a.college=c.college;

quit;

Ask a Question
Discussion stats
  • 4 replies
  • 251 views
  • 1 like
  • 4 in conversation