12-28-2017 02:15 AM
I am trying to do loop over the same observations in a big dataset. I have used existing codes in the communities , however, they did not work.
here is a part of dataset:
Id year1 year2 year3
1 20 . .
1 . 30 .
1 . . 15
2 20 . .
I want to apply some conditions , for example:
do until (last.Id);
set have ;
if year1=year2=year3 then delete;end; run;
what is the problem with this code?
your help will be appreciated
12-28-2017 03:24 AM
2 problems at least:
1. The set statement will keep the latest values, so . . 15 for ID=1
You can try to use UPDATE instead to ignore missing values.
Maybe you can replace the set statement with (I never used UPDATE in a do loop so that's a test) :
update HAVE(obs=0) HAVE;
You dont need a loop really. Using the above statement without a do loop and testing end of group with function lag() would work for sure.
Another option is to store the non-missing values in different, retained, variables.
2. The test year1=year2=year3 should be year1=year2 and year2=year3
12-28-2017 06:10 AM
Reconsider your data model. Having the years in the columns is why you have this challenge. A normalized table would have three columns: id, year and value. This would make any logic or reporting much easier. And just imagine what would be the consequence if you have to add another year. Look at proc transpose or diy a datastep.
Hope this helps,
12-28-2017 06:19 AM
If the intent is to keep ID=1, but delete ID=2, while getting all the data onto a single observation for each ID, you could try:
update have (obs=0) have;
if year1=year2=year3 then delete;
If that's not the intent, you will need to spell it out. We can't really figure out why the code doesn't work if we don't know what it is supposed to accomplish.