I have a file with a case number that is sporadically filled in. The file is 500k plus rows, If the next time a case # is filled in happens to be the same case # as before I want to fill in the blanks. I thought of using lag() but some of the missing rows are pretty large. If the next case # populated is different the rows between should be left blank.
data have;
infile cards dsd;
input case$ ;
cards;
500
,
,
,
500
,
600
,
,
700
700
,
700
;
run;
data want;
infile cards dsd;
input case$ ;
cards;
500
500
500
500
500
,
600
,
,
700
700
700
700
;
run;
Try this
data have;
infile cards dsd;
input case ;
cards;
500
,
,
,
500
,
600
,
,
700
700
,
700
;
run;
data want;
do _N_ = 1 by 1 until (case);
if _N_ = 1 then c = case;
set have;
end;
flag = (c = case);
do _N_ = 1 to _N_;
set have;
if flag then case = c;
output;
end;
run;
There is also an ID field,
I can use:
data want;
set have;
by id;
retain _case;
if not missing(case_id) then _case = case_id;
run;
that gets me part of the way, but it fills in the last rows when the next value will not be the same. I suppose I could sort the other way and delete some.
Try this
data have;
infile cards dsd;
input case ;
cards;
500
,
,
,
500
,
600
,
,
700
700
,
700
;
run;
data want;
do _N_ = 1 by 1 until (case);
if _N_ = 1 then c = case;
set have;
end;
flag = (c = case);
do _N_ = 1 to _N_;
set have;
if flag then case = c;
output;
end;
run;
this looks great, can up modify this for case as a character?
I see the issue, case is before the set statement so it's a number. I added format case $500.; after the data step and this works. thanks for the help.
Next code can create a dataset where cases are in next format:
case1 case2 case3
case2 case3 case4
case3 case4 case5
...
case(n-2) case(n-1) case (n)
case (n-1) case(n)
case(n) /* <<< the last one */
The code is not tested:
data temp;
merge have (rename=(case=case_prev)) /* firstobs=1 */
have (firstobs=2 rename=(case=case_current))
have (firstobs=3 reanme=(case=case_next);
by;
run;
data want (keep=case);
set temp end=eof;
retain previous_case;
if _N_=1 then do;
case = case_prev; output;
if case_next = case_curr then case=.; /* missing value as blank */
else case = case_curr;
output;
previous_case = case;
end; else do;
if case_curr = previous_case then case=.;
else case = case_curr;
output;
previous_case = case;
end;
if eof then do; case = case_prev; output; end;
run;
Just for fun .
data have;
infile cards dsd;
n+1;
input case ;
cards;
500
,
,
,
500
,
600
,
,
700
700
,
700
;
run;
data have;
set have;
retain x1;
if not missing(case) then x1=case;
run;
proc sort data=have;by descending n;run;
data have;
set have;
retain x2;
if not missing(case) then x2=case;
run;
proc sort data=have;by n;run;
data want;
set have;
if x1=x2 then case=x1;
run;
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.