- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi all, I have a data set consisting of an ID and six ordered numerical columns (Largest, Second, Third, ... Sixth).
Have:
ID Largest Second Third Fourth Fifth Sixth
1 98 97 96 85 70 1
2 77 0 0 0 0 0
3 61 55 0 0 0 0
4 44 30 12 0 0 0
Is there a way, in a Data step (or some other means), to get the non-zero values to replace the zero values in order, and in some cases repeat? See example below. Also, apologies if this has been answered elsewhere! My search fu may be a little lacking.
Want:
ID Largest Second Third Fourth Fifth Sixth
1 98 97 96 85 70 1
2 77 77 77 77 77 77
3 61 55 61 55 61 55
4 44 30 12 44 30 12
I've tried some IF THEN ELSE IF logic in a Data step which doesn't seem to stick. Any help is certainly appreciated!
DATA Want;
SET Have;
IF SECOND=0 AND THIRD=0 AND FOURTH=0 AND FIFTH=0 AND SIXTH=0
THEN SECOND=Largest
AND THIRD=Largest
AND FOURTH=Largest
AND FIFTH=Largest
AND SIXTH=Largest;
ELSE IF SECOND ne 0 AND THIRD=0 AND FOURTH=0 AND FIFTH=0 AND SIXTH=0
THEN THIRD=Largest
AND FOURTH=SECOND
AND FIFTH=Largest
AND SIXTH=SECOND;
ELSE IF SECOND ne 0 AND THIRD ne 0 AND FOURTH=0 AND FIFTH=0 AND SIXTH=0
THEN FOURTH=Largest
AND FIFTH=SECOND
AND SIXTH=THIRD;
ELSE IF SECOND ne 0 AND THIRD ne 0 AND FOURTH ne 0 AND FIFTH=0 AND SIXTH=0
THEN FIFTH=Largest
AND SIXTH=SECOND;
ELSE IF SECOND ne 0 AND THIRD ne 0 AND FOURTH ne 0 AND FIFTH ne 0 AND SIXTH=0
THEN SIXTH=Largest;
RUN;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I should also note that the variables have a suffix that I didn't include in my example, such that the functions of Largest, Second, etc. are not actually being called. Sorry for the confusion.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Here's one example. Note how to post example data in datastep.
data have; input ID Largest Second Third Fourth Fifth Sixth; datalines; 1 98 97 96 85 70 1 2 77 0 0 0 0 0 3 61 55 0 0 0 0 4 44 30 12 0 0 0 ; run; data want; set have; array v Largest Second Third Fourth Fifth Sixth; zpos = whichn(0,of v(*)); if zpos > 1 then do i=zpos to dim(v); r=v[zpos-1]; v[i] = r; end; /* drop zpos i r; after debugging uncomment*/ run;
Replace Largest etc with your variable names. It is up to you to sort the values for this to work.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Here is my tested code:
data have; input ID Largest Second Third Fourth Fifth Sixth; datalines; 1 98 97 96 85 70 1 2 77 0 0 0 0 0 3 61 55 0 0 0 0 4 44 30 12 0 0 0 ;run; data want; set have; array vx Largest Second Third Fourth Fifth Sixth; do i=1 to dim(vx); if vx(i) = 0 then do; if i=1 then delete; /* when starts with 0 - ??? */ else do; m=i-1; link fulfill; end; end; end; drop i j m; return; fulfill: j=1; do until(m=0); vx(i) = vx(j); i+1; if i > dim(vx) then m=0; else do; j+1; if j > m then j=1; end; end; return; run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
If you want to make several variable affectations in your "THEN" clause, then separating them with "AND" is not the way to do it.
(X=Y AND W=Z) is a boolean that will be 1 if the expression is true and 0 otherwise.
You have to enclose your instructions with "DO" and "END"
if (...) then do;
...
...
end;
A better option here would be to use arrays.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi
Assuming that zero values always are in the end, here is a simple solution:
data want; set have;
array values Largest Second Third Fourth Fifth Sixth;
drop i j firstempty;
* find first zero-value column - i.e. length of series to repeat + 1;
do i = 2 to 6;
if values{i} = 0 then do;
firstempty = i;
leave;
end;
end;
* replace with value from corresponding element in non-zero series;
if firstempty ne . then do j = firstempty to 6;
values{j} = values{j - firstempty + 1};
end;
run;