Solved: How to group observations with logic?

hx · Posted 12-02-2016 03:01 PM

Hi, I have a dataset as follows, the variable 'w' has already been sorted, and I want to group them according to the values of 'value' and generate a new variable 'nw', and 'nw' should be given as the last w whose value is less than or equal to 200, and for those values large than 200, stay the same. It's really hard to describe here.

w	value
abate	200
abbreviations	50
abdomen	20
abdominal	110
abelcet	100
aberrant	100
abiraterone	100
abl1	100
ablation	210
abnormal	30
abo	780
abscission	260
absence	100
absenteeism	140
absorption	130
abstinence	150
abstracting	110
academia	210
academic	30
academy	600
acceleration	180
access	130
accessibility	100
accident	200
accidents	230
accountability	50
accounting	100
accreditation	230
acculturation	30
ace	600
acetates	920
acetazolamide	310
acetic	110
acetone	210
acetyl	50
acetylaminofluorene	200
acetylatio	230
acetylation	200
bupropion	550
burden	560
burkitt's	230
burn	300
burns	140
businesses	550
butanoic	800
butterflies	500
butyrates	100
butyric	450
butyryl	800
bypass	330
bystander	300
c	300
c2	130
c2h2	150
c3h	250
c57bl	720

That is to say, I want to have something like:

w	value	nw
abate	200	abl1
abbreviations	50	abl1
abdomen	20	abl1
abdominal	110	abl1
abelcet	100	abl1
aberrant	100	abl1
abiraterone	100	abl1
abl1	100	abl1
ablation	210	ablation
abnormal	30	abnormal
abo	780	abo
abscission	260	abscission
absence	100	abstracting
absenteeism	140	abstracting
absorption	130	abstracting
abstinence	150	abstracting
abstracting	110	abstracting
academia	210	academia
academic	30	academic
academy	600	academy
acceleration	180	accident
access	130	accident
accessibility	100	accident
accident	200	accident
accidents	230	accidents
accountability	50	accounting
accounting	100	accounting
accreditation	230	accreditation
acculturation	30	acculturation
ace	600	ace
acetates	920	acetates
acetazolamide	310	acetazolamide
acetic	110	acetic
acetone	210	acetone
acetyl	50	acetylaminofluorene
acetylaminofluorene	200	acetylaminofluorene
acetylatio	230	acetylatio
acetylation	200	acetylation
bupropion	550	bupropion
burden	560	burden
burkitt's	230	burkitt's
burn	300	burn
burns	140	burns
businesses	550	businesses
butanoic	800	butanoic
butterflies	500	butterflies
butyrates	100	butyrates
butyric	450	butyric
butyryl	800	butyryl
bypass	330	bypass
bystander	300	bystander
c	300	c
c2	130	c2h2
c2h2	150	c2h2
c3h	250	c3h
c57bl	720	c57bl

All the things that came into my mind is to use sort but I don't think it worked. It seems like I cannot sort by the 'values' since I have sorted the data by'w', can someone help me out? Really appreciated.

Reeza · Posted 12-02-2016 04:47 PM

This should get you close, I haven't fully verified it works though.

PS - in the future please post your same data as a data step - like the first step in my proc.

data have;
informat w $30. value 8.;
input w	$ value;
cards;
abate   200
abbreviations   50
abdomen 20
abdominal   110
abelcet 100
aberrant    100
abiraterone 100
abl1    100
ablation    210
abnormal    30
abo 780
abscission  260
absence 100
absenteeism 140
absorption  130
abstinence  150
abstracting 110
academia    210
academic    30
academy 600
acceleration    180
access  130
accessibility   100
accident    200
accidents   230
accountability  50
accounting  100
accreditation   230
acculturation   30
ace 600
acetates    920
acetazolamide   310
acetic  110
acetone 210
acetyl  50
acetylaminofluorene 200
acetylatio  230
acetylation 200
bupropion   550
burden  560
burkitt's   230
burn    300
burns   140
businesses  550
butanoic    800
butterflies 500
butyrates   100
butyric 450
butyryl 800
bypass  330
bystander   300
c   300
c2  130
c2h2    150
c3h 250
c57bl   720
;
run;

data add_order;
set have;
order=_n_;
run;

proc sort data=add_order;
by descending order;
run;

data want;
set add_order;
retain prev_group group;
if value>=200 then prev_group=1;
if prev_group=1 and value<200 then do;
	group = w;
	prev_group=0;
end;
else if prev_group=1 and value>=200 then group=w;

run;

proc sort data=want;
by order;
run;

View solution in original post

mkeintz · Posted 12-02-2016 03:23 PM

Why does w=accident/value=200 get nw=accident, yet w=abate/value=200 get nw=abl1?

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

hx · Posted 12-02-2016 03:44 PM

Thank you for your reply, mkeintz!
w=abate/value=200 get nw=abl1 because I wanted to get the nw="the last w of the values that are <=200", for instance w=accident/value=200 get nw=accident because accidents/value=230 which is larger than 200. Does that make sense? sorry I didn't describe it clearly.

Reeza · Posted 12-02-2016 03:47 PM

Can you reverse the order of your data? So you're not looking forward but looking backward and reversing your logic. Its much easier and cleaner to lag in SAS.

hx · Posted 12-02-2016 03:59 PM

Do you mean reverse them by w alphabetically? But I still want to get the new variable based on w/values, how does that work by reversing?

Reeza · Posted 12-02-2016 04:05 PM

I don't know how you created your order in the first place.

But if you reverse your entire data you can reverse your logic.

So it becomes if the value is over 200 then assign the value from column 1. Use retain to keep the value from row to row.

data want;
set have;
retain group;
if X > 200 then group=column1;
run;

Reeza · Posted 12-02-2016 04:12 PM

Nevermind, I didn't understand your logic. You're handling the values greater than 200 differently than I thought.

hx · Posted 12-02-2016 04:17 PM

Yeah, but your code did give me some thinking, by using the group did give me something like what I want, yet still can't. Thank you anyway, I'm still trying

Reeza · Posted 12-02-2016 04:20 PM

You'd have to set a flag and then check the value of the flag so you need two retained variables.

hx · Posted 12-02-2016 04:40 PM

How can I set the value of the flag? Should I use dummies or 'w'?

Reeza · Posted 12-02-2016 04:47 PM

This should get you close, I haven't fully verified it works though.

PS - in the future please post your same data as a data step - like the first step in my proc.

data have;
informat w $30. value 8.;
input w	$ value;
cards;
abate   200
abbreviations   50
abdomen 20
abdominal   110
abelcet 100
aberrant    100
abiraterone 100
abl1    100
ablation    210
abnormal    30
abo 780
abscission  260
absence 100
absenteeism 140
absorption  130
abstinence  150
abstracting 110
academia    210
academic    30
academy 600
acceleration    180
access  130
accessibility   100
accident    200
accidents   230
accountability  50
accounting  100
accreditation   230
acculturation   30
ace 600
acetates    920
acetazolamide   310
acetic  110
acetone 210
acetyl  50
acetylaminofluorene 200
acetylatio  230
acetylation 200
bupropion   550
burden  560
burkitt's   230
burn    300
burns   140
businesses  550
butanoic    800
butterflies 500
butyrates   100
butyric 450
butyryl 800
bypass  330
bystander   300
c   300
c2  130
c2h2    150
c3h 250
c57bl   720
;
run;

data add_order;
set have;
order=_n_;
run;

proc sort data=add_order;
by descending order;
run;

data want;
set add_order;
retain prev_group group;
if value>=200 then prev_group=1;
if prev_group=1 and value<200 then do;
	group = w;
	prev_group=0;
end;
else if prev_group=1 and value>=200 then group=w;

run;

proc sort data=want;
by order;
run;

hx · Posted 12-02-2016 05:05 PM

Thank you so much Reeza! it did work! I got to look deep into your code to get better understanding.

Astounding · Posted 12-02-2016 04:44 PM

This is untested, but should be at least most of the way there:

data want;

do until (done1 or value > 200);

set have end=done1;

if value <= 200 then nw = w;

end;

do until (done2 or value > 200);

set have end=done2;

if value > 200 then nw = w;

output;

end;

run;

hx · Posted 12-02-2016 05:14 PM

I tried but it seemed that new variable nw only has one value which is the last w in the set.

Astounding · Posted 12-02-2016 05:18 PM

Sounds like something is a little off. While I can't see your log, try checking "done1" and "done2" to make sure they appear in the proper places.

How to group observations with logic?

Re: How to group observations with logic?

Re: How to group observations with logic?

Re: How to group observations with logic?

Re: How to group observations with logic?

Re: How to group observations with logic?

Re: How to group observations with logic?

Re: How to group observations with logic?

Re: How to group observations with logic?

Re: How to group observations with logic?

Re: How to group observations with logic?

Re: How to group observations with logic?

Re: How to group observations with logic?

Re: How to group observations with logic?

Re: How to group observations with logic?

Re: How to group observations with logic?

Catch up on SAS Innovate 2026

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away