turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- Base SAS Programming
- /
- How to group observations with logic?

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

12-02-2016 03:01 PM

Hi, I have a dataset as follows, the variable 'w' has already been sorted, and I want to group them according to the values of 'value' and generate a new variable 'nw', and 'nw' should be given as the last w whose value is less than or equal to 200, and for those values large than 200, stay the same. It's really hard to describe here.

w | value |

abate | 200 |

abbreviations | 50 |

abdomen | 20 |

abdominal | 110 |

abelcet | 100 |

aberrant | 100 |

abiraterone | 100 |

abl1 | 100 |

ablation | 210 |

abnormal | 30 |

abo | 780 |

abscission | 260 |

absence | 100 |

absenteeism | 140 |

absorption | 130 |

abstinence | 150 |

abstracting | 110 |

academia | 210 |

academic | 30 |

academy | 600 |

acceleration | 180 |

access | 130 |

accessibility | 100 |

accident | 200 |

accidents | 230 |

accountability | 50 |

accounting | 100 |

accreditation | 230 |

acculturation | 30 |

ace | 600 |

acetates | 920 |

acetazolamide | 310 |

acetic | 110 |

acetone | 210 |

acetyl | 50 |

acetylaminofluorene | 200 |

acetylatio | 230 |

acetylation | 200 |

bupropion | 550 |

burden | 560 |

burkitt's | 230 |

burn | 300 |

burns | 140 |

businesses | 550 |

butanoic | 800 |

butterflies | 500 |

butyrates | 100 |

butyric | 450 |

butyryl | 800 |

bypass | 330 |

bystander | 300 |

c | 300 |

c2 | 130 |

c2h2 | 150 |

c3h | 250 |

c57bl | 720 |

That is to say, I want to have something like:

w | value | nw |

abate | 200 | abl1 |

abbreviations | 50 | abl1 |

abdomen | 20 | abl1 |

abdominal | 110 | abl1 |

abelcet | 100 | abl1 |

aberrant | 100 | abl1 |

abiraterone | 100 | abl1 |

abl1 | 100 | abl1 |

ablation | 210 | ablation |

abnormal | 30 | abnormal |

abo | 780 | abo |

abscission | 260 | abscission |

absence | 100 | abstracting |

absenteeism | 140 | abstracting |

absorption | 130 | abstracting |

abstinence | 150 | abstracting |

abstracting | 110 | abstracting |

academia | 210 | academia |

academic | 30 | academic |

academy | 600 | academy |

acceleration | 180 | accident |

access | 130 | accident |

accessibility | 100 | accident |

accident | 200 | accident |

accidents | 230 | accidents |

accountability | 50 | accounting |

accounting | 100 | accounting |

accreditation | 230 | accreditation |

acculturation | 30 | acculturation |

ace | 600 | ace |

acetates | 920 | acetates |

acetazolamide | 310 | acetazolamide |

acetic | 110 | acetic |

acetone | 210 | acetone |

acetyl | 50 | acetylaminofluorene |

acetylaminofluorene | 200 | acetylaminofluorene |

acetylatio | 230 | acetylatio |

acetylation | 200 | acetylation |

bupropion | 550 | bupropion |

burden | 560 | burden |

burkitt's | 230 | burkitt's |

burn | 300 | burn |

burns | 140 | burns |

businesses | 550 | businesses |

butanoic | 800 | butanoic |

butterflies | 500 | butterflies |

butyrates | 100 | butyrates |

butyric | 450 | butyric |

butyryl | 800 | butyryl |

bypass | 330 | bypass |

bystander | 300 | bystander |

c | 300 | c |

c2 | 130 | c2h2 |

c2h2 | 150 | c2h2 |

c3h | 250 | c3h |

c57bl | 720 | c57bl |

All the things that came into my mind is to use sort but I don't think it worked. It seems like I cannot sort by the 'values' since I have sorted the data by'w', can someone help me out? Really appreciated.

Accepted Solutions

Solution

12-02-2016
05:05 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

12-02-2016 04:47 PM

This should get you close, I haven't fully verified it works though.

PS - in the future please post your same data as a data step - like the first step in my proc.

```
data have;
informat w $30. value 8.;
input w $ value;
cards;
abate 200
abbreviations 50
abdomen 20
abdominal 110
abelcet 100
aberrant 100
abiraterone 100
abl1 100
ablation 210
abnormal 30
abo 780
abscission 260
absence 100
absenteeism 140
absorption 130
abstinence 150
abstracting 110
academia 210
academic 30
academy 600
acceleration 180
access 130
accessibility 100
accident 200
accidents 230
accountability 50
accounting 100
accreditation 230
acculturation 30
ace 600
acetates 920
acetazolamide 310
acetic 110
acetone 210
acetyl 50
acetylaminofluorene 200
acetylatio 230
acetylation 200
bupropion 550
burden 560
burkitt's 230
burn 300
burns 140
businesses 550
butanoic 800
butterflies 500
butyrates 100
butyric 450
butyryl 800
bypass 330
bystander 300
c 300
c2 130
c2h2 150
c3h 250
c57bl 720
;
run;
data add_order;
set have;
order=_n_;
run;
proc sort data=add_order;
by descending order;
run;
data want;
set add_order;
retain prev_group group;
if value>=200 then prev_group=1;
if prev_group=1 and value<200 then do;
group = w;
prev_group=0;
end;
else if prev_group=1 and value>=200 then group=w;
run;
proc sort data=want;
by order;
run;
```

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

12-02-2016 03:23 PM

Why does w=accident/value=200 get nw=accident, yet w=abate/value=200 get nw=abl1?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

12-02-2016 03:44 PM

Thank you for your reply, mkeintz!

w=abate/value=200 get nw=abl1 because I wanted to get the nw="the last w of the values that are <=200", for instance w=accident/value=200 get nw=accident because accidents/value=230 which is larger than 200. Does that make sense? sorry I didn't describe it clearly.

w=abate/value=200 get nw=abl1 because I wanted to get the nw="the last w of the values that are <=200", for instance w=accident/value=200 get nw=accident because accidents/value=230 which is larger than 200. Does that make sense? sorry I didn't describe it clearly.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

12-02-2016 03:47 PM

Can you reverse the order of your data? So you're not looking forward but looking backward and reversing your logic. Its much easier and cleaner to lag in SAS.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

12-02-2016 03:59 PM

Do you mean reverse them by w alphabetically? But I still want to get the new variable based on w/values, how does that work by reversing?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

12-02-2016 04:05 PM

I don't know how you created your order in the first place.

But if you reverse your entire data you can reverse your logic.

So it becomes if the value is over 200 then assign the value from column 1. Use retain to keep the value from row to row.

```
data want;
set have;
retain group;
if X > 200 then group=column1;
run;
```

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

12-02-2016 04:12 PM

Nevermind, I didn't understand your logic. You're handling the values greater than 200 differently than I thought.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

12-02-2016 04:17 PM

Yeah, but your code did give me some thinking, by using the group did give me something like what I want, yet still can't. Thank you anyway, I'm still trying

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

12-02-2016 04:20 PM

You'd have to set a flag and then check the value of the flag so you need two retained variables.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

12-02-2016 04:40 PM

How can I set the value of the flag? Should I use dummies or 'w'?

Solution

12-02-2016
05:05 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

12-02-2016 04:47 PM

This should get you close, I haven't fully verified it works though.

PS - in the future please post your same data as a data step - like the first step in my proc.

```
data have;
informat w $30. value 8.;
input w $ value;
cards;
abate 200
abbreviations 50
abdomen 20
abdominal 110
abelcet 100
aberrant 100
abiraterone 100
abl1 100
ablation 210
abnormal 30
abo 780
abscission 260
absence 100
absenteeism 140
absorption 130
abstinence 150
abstracting 110
academia 210
academic 30
academy 600
acceleration 180
access 130
accessibility 100
accident 200
accidents 230
accountability 50
accounting 100
accreditation 230
acculturation 30
ace 600
acetates 920
acetazolamide 310
acetic 110
acetone 210
acetyl 50
acetylaminofluorene 200
acetylatio 230
acetylation 200
bupropion 550
burden 560
burkitt's 230
burn 300
burns 140
businesses 550
butanoic 800
butterflies 500
butyrates 100
butyric 450
butyryl 800
bypass 330
bystander 300
c 300
c2 130
c2h2 150
c3h 250
c57bl 720
;
run;
data add_order;
set have;
order=_n_;
run;
proc sort data=add_order;
by descending order;
run;
data want;
set add_order;
retain prev_group group;
if value>=200 then prev_group=1;
if prev_group=1 and value<200 then do;
group = w;
prev_group=0;
end;
else if prev_group=1 and value>=200 then group=w;
run;
proc sort data=want;
by order;
run;
```

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

12-02-2016 05:05 PM

Thank you so much Reeza! it did work! I got to look deep into your code to get better understanding.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

12-02-2016 04:44 PM

This is untested, but should be at least most of the way there:

data want;

do until (done1 or value > 200);

set have end=done1;

if value <= 200 then nw = w;

end;

do until (done2 or value > 200);

set have end=done2;

if value > 200 then nw = w;

output;

end;

run;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

12-02-2016 05:14 PM

I tried but it seemed that new variable nw only has one value which is the last w in the set.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

12-02-2016 05:18 PM

Sounds like something is a little off. While I can't see your log, try checking "done1" and "done2" to make sure they appear in the proper places.