BookmarkSubscribeRSS Feed
xliu1
Quartz | Level 8

I need to classify each unique ID based on two columns. One is academic level, and 2nd one is Intensity_FLG. So within each Academic_Level, if the sum of INTENSITY_FLG >0, then this ID should be classified as Full-time, otherwise, this ID will be classified as Part-time. Not sure what goes wrong, the output only populates partial values for that created column ENR_Intensity, and some IDs are still classified based on each row. The Intensity_FLG is based on each record for each ID. For example, U24654 has three records on Intensity_FLG, one of them is 1, so this ID should be classified as Full-time for all three rows. How should I change my code? Thanks.

DATA ENR_2;
SET ENR;
IF SUM(LEVEL_FLG) <0 THEN Academic_Level='NonDegree';
ELSE IF SUM(LEVEL_FLG) >0 THEN Academic_Level= 'Graduate';
ELSE Academic_Level= 'Undergraduate';

IF Academic_Level='NonDegree' AND SUM (INTENSITY_FLG)>0 THEN ENR_Intensity='Full-Time';
ELSE IF Academic_Level='NonDegree' AND SUM (INTENSITY_FLG)=0 THEN ENR_Intensity='Part-Time';
ELSE IF Academic_Level='Undergraduate' AND SUM (INTENSITY_FLG)>0 THEN ENR_Intensity='Full-Time';
ELSE IF Academic_Level='Undergraduate' AND SUM (INTENSITY_FLG)=0 THEN ENR_Intensity='Part-Time';
ELSE IF Academic_Level='Graduate' AND SUM (INTENSITY_FLG)>0 THEN ENR_Intensity='Full-Time';
ELSE IF Academic_Level='Graduate' AND SUM (INTENSITY_FLG)=0 THEN ENR_Intensity='Part-Time';
run;

 

UID INTENSITY_FLG Academic_Level ENR_Intensity
U13873 0 NonDegree Part-Time
U24654 0 Graduate Part-Time
U24654 1 Graduate Full-Time
U24654 0 Graduate Part-Time
U25049 1 NonDegree Full-Time
U45431 0 Graduate Part-Time
U45431 0 Graduate Part-Time
U64514 0 NonDegree Part-Time
U66605 0 Graduate Part-Time
U66605 1 Graduate Full-Time
U66605 0 Graduate Part-Time
U70374 0 Graduate Part-Time
U70374 0 Graduate Part-Time
U70374 0 Graduate Part-Time
U81899 1 Undergrad  
U82709 1 Undergrad  
U82709 0 Undergrad  
4 REPLIES 4
Reeza
Super User

Sort your data so that intensity flag is 1 in the first record and retain the flag. SAS does operate on each row individually, so otherwise you need to either calculate the value separately and merge it in or you need to use SQL.

 

proc sort data=have;
by uid  academic_level descending intensity_flg;
run;

data want;
set have;
by uid academic_level;
retain enr_intensity;
if first.academic_level and intensity_flag=1 then enr_intensity='Full-Time';
else enr_intensity='Part-Time';
run;

Or add the sum as a new column and use that to classify.

 

proc sql;
create table want as
select *, sum(intensity_flg) as total_intensity
/*,case when total_intensity >=1 then 'Full-time'
else 'Part-Time' end as ENR_INTENSITY*/
from have
group by 
uid, intensity_flg;
quit;

@xliu1 wrote:

I need to classify each unique ID based on two columns. One is academic level, and 2nd one is Intensity_FLG. So within each Academic_Level, if the sum of INTENSITY_FLG >0, then this ID should be classified as Full-time, otherwise, this ID will be classified as Part-time. Not sure what goes wrong, the output only populates partial values for that created column ENR_Intensity, and some IDs are still classified based on each row. The Intensity_FLG is based on each record for each ID. For example, U24654 has three records on Intensity_FLG, one of them is 1, so this ID should be classified as Full-time for all three rows. How should I change my code? Thanks.

DATA ENR_2;
SET ENR;
IF SUM(LEVEL_FLG) <0 THEN Academic_Level='NonDegree';
ELSE IF SUM(LEVEL_FLG) >0 THEN Academic_Level= 'Graduate';
ELSE Academic_Level= 'Undergraduate';

IF Academic_Level='NonDegree' AND SUM (INTENSITY_FLG)>0 THEN ENR_Intensity='Full-Time';
ELSE IF Academic_Level='NonDegree' AND SUM (INTENSITY_FLG)=0 THEN ENR_Intensity='Part-Time';
ELSE IF Academic_Level='Undergraduate' AND SUM (INTENSITY_FLG)>0 THEN ENR_Intensity='Full-Time';
ELSE IF Academic_Level='Undergraduate' AND SUM (INTENSITY_FLG)=0 THEN ENR_Intensity='Part-Time';
ELSE IF Academic_Level='Graduate' AND SUM (INTENSITY_FLG)>0 THEN ENR_Intensity='Full-Time';
ELSE IF Academic_Level='Graduate' AND SUM (INTENSITY_FLG)=0 THEN ENR_Intensity='Part-Time';
run;

 

UID INTENSITY_FLG Academic_Level ENR_Intensity
U13873 0 NonDegree Part-Time
U24654 0 Graduate Part-Time
U24654 1 Graduate Full-Time
U24654 0 Graduate Part-Time
U25049 1 NonDegree Full-Time
U45431 0 Graduate Part-Time
U45431 0 Graduate Part-Time
U64514 0 NonDegree Part-Time
U66605 0 Graduate Part-Time
U66605 1 Graduate Full-Time
U66605 0 Graduate Part-Time
U70374 0 Graduate Part-Time
U70374 0 Graduate Part-Time
U70374 0 Graduate Part-Time
U81899 1 Undergrad  
U82709 1 Undergrad  
U82709 0 Undergrad  

 

ballardw
Super User

First, lets post something actually readable:

IF Academic_Level='NonDegree' AND SUM (INTENSITY_FLG)>0 THEN ENR_Intensity='Full-Time';
ELSE IF Academic_Level='NonDegree' AND SUM (INTENSITY_FLG)=0 THEN ENR_Intensity='Part-Time';
ELSE IF Academic_Level='Undergraduate' AND SUM (INTENSITY_FLG)>0 THEN ENR_Intensity='Full-Time';
ELSE IF Academic_Level='Undergraduate' AND SUM (INTENSITY_FLG)=0 THEN ENR_Intensity='Part-Time';
ELSE IF Academic_Level='Graduate' AND SUM (INTENSITY_FLG)>0 THEN ENR_Intensity='Full-Time';
ELSE IF Academic_Level='Graduate' AND SUM (INTENSITY_FLG)=0 THEN ENR_Intensity='Part-Time'

If you expect the SUM to be adding values from multiple observations then you are misunderstanding how a DATASTEP SUM (or Mean, STD, Max, Min) function works. You are only referencing one variable, so only the value of that single variable on that single observation is used so "SUM (INTENSITY_FLG)" is exactly equal to INTENSITY_FLG.

 

Your other main problem likely comes in from this:

IF SUM(LEVEL_FLG) <0 THEN Academic_Level='NonDegree';
ELSE IF SUM(LEVEL_FLG) >0 THEN Academic_Level= 'Graduate';
ELSE Academic_Level= 'Undergraduate';

If the variable Academic_level does not exist on the input data set then it is created on the first line of that IF. Since you do not specify a length for the variable before creation SAS uses the length of the first value "NonDegree" to set the length.

Which is shorter than Undergraduate. So the value is truncated to "Undergrad", matching the length of NonDegree.

When attempt to compare the value of the Academic_level = "Undergraduate" that fails because that is not the value of the variable and is never true.

add this line BEFORE the If to specify a length.

Length Academic_Level $ 13;

 

Example input data in the form of a data step and what you expect for output given that data step might give us a chance to provide what you need.

HB
Barite | Level 11 HB
Barite | Level 11

Your description doesn't mention undergrad, grad, and nondegree. It's hard to understand what you have and what you want.

With data like

data have;
   input uid $6. intensity_flg;
datalines;
U13873 0
U24654 0
U24654 1
U24654 0
U25049 1
U45431 0
U45431 0
U64514 0
U66605 0
U66605 1
U66605 0
U70374 0
U70374 0
U70374 0
U81899 1
U82709 1
U82709 0
;
run;

most people here would use the data step, sort, loop, and use first.  I'm an SQL guy so I use SQL.
This:

proc sql;
	select a.uid, a.intensity_flg, b.flag
	from have a left join 
		(select uid, max(intensity_flg) as flag
		from have
		group by uid) b on a.uid = b.uid;
quit;

give this:

uid intensity_flg flag
U13873 0 0
U24654 0 1
U24654 1 1
U24654 0 1
U25049 1 1
U45431 0 0
U45431 0 0
U64514 0 0
U66605 0 1
U66605 1 1
U66605 0 1
U70374 0 0
U70374 0 0
U70374 0 0
U81899 1 1
U82709 0 1
U82709 1 1

 

xliu1
Quartz | Level 8

Thank you all for your prompt responses. I was able to get this out. 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 749 views
  • 0 likes
  • 4 in conversation