Putting this question again because the formatting seems to be messed up
Dear All: My data set is as follows Date ST IDa IDb Cat JUN012018 500 A AA I JUN022018 500 A AA I JUN032018 500 A AA I JUN012018 501 A AA JUN022018 501 A AA JUN032018 501 A AA JUN012018 502 A AA I JUN012018 502 A AA I JUN012018 500 A AAB JUN022018 500 A AAB JUN032018 500 A AAB JUN012018 501 A AAB D JUN022018 501 A AAB D JUN032018 501 A AAB D JUN012018 502 A AAB JUN012018 502 A AAB and so on The data set should look like this Date ST IDa IDb Cat JUN012018 500 A AA I JUN022018 500 A AA I JUN032018 500 A AA I JUN012018 501 A AA I JUN022018 501 A AA I JUN032018 501 A AA I JUN012018 502 A AA I JUN012018 502 A AA I JUN012018 500 A AAB D JUN022018 500 A AAB D JUN032018 500 A AAB D JUN012018 501 A AAB D JUN022018 501 A AAB D JUN032018 501 A AAB D JUN012018 502 A AAB D JUN012018 502 A AAB D Thanks Stan
hat's the difference between the two?
No difference. I saw that the formatting was messed up, so I posted it again. OI hope that the data and the question is clearer now.
The start and end look the same to me.
Here’s some instructions on posting sample data. Also, don’t edit the info in the code block in the main editor, that’s messes up the line breaks for some reason.
@RandyStan wrote:
No difference. I saw that the formatting was messed up, so I posted it again. OI hope that the data and the question is clearer now.
Please check the CAT column. In the first data set the variables are missing. In the second, the data set I want, they column CAT is populated.
Thanks so much
Given the way the data is formatted it’s really hard to see that or what the logic is.
@RandyStan wrote:
Please check the CAT column. In the first data set the variables are missing. In the second, the data set I want, they column CAT is populated.
Thanks so much
Below picture created by combining your have and want data.
Please explain the logic how you'd populate "CAT_WANT".
Below how you'd best post sample data if you want to motivate people to post answers with working code.
data have;
infile datalines dlm=' ' truncover;
format Date date9.;
input _dt1 $1-3 _dt2 $4-5 _dt3 $6-9 (ST IDa IDb Cat_Have) ($);
date=input(cats(_dt2,_dt1,_dt3),date9.);
drop _dt:;
datalines;
JUN012018 500 A AA I
JUN022018 500 A AA I
JUN032018 500 A AA I
JUN012018 501 A AA
JUN022018 501 A AA
JUN032018 501 A AA
JUN012018 502 A AA I
JUN012018 502 A AA I
JUN012018 500 A AAB
JUN022018 500 A AAB
JUN032018 500 A AAB
JUN012018 501 A AAB D
JUN022018 501 A AAB D
JUN032018 501 A AAB D
JUN012018 502 A AAB
JUN012018 502 A AAB
;
run;
Use a double DO UNTIL() loop.The first loop records the last non-missing Cat value in the IDb-group, the second loop forces that value where Cat is missing :
data have;
length Date $10 ST 8 IDa $1 IDb $3 Cat $1;
infile datalines truncover;
input Date ST IDa IDb Cat;
datalines;
JUN012018 500 A AA I
JUN022018 500 A AA I
JUN032018 500 A AA I
JUN012018 501 A AA
JUN022018 501 A AA
JUN032018 501 A AA
JUN012018 502 A AA I
JUN012018 502 A AA I
JUN012018 500 A AAB
JUN022018 500 A AAB
JUN032018 500 A AAB
JUN012018 501 A AAB D
JUN022018 501 A AAB D
JUN032018 501 A AAB D
JUN012018 502 A AAB
JUN012018 502 A AAB
;
data want;
do until(last.IDb);
set have; by IDb notsorted;
if not missing(Cat) then groupCat = Cat;
end;
do until(last.IDb);
set have; by IDb notsorted;
if missing(Cat) then Cat = groupCat;
output;
end;
drop groupCat;
run;
proc print data=want noobs; run;
This problem is very amenable to a self merge, where the first instance of HAVE has all the records and variables. The second instance has only the BY-variable(s) (IDB in your case) and the new variable of interest (cat_want, renamed from cat_have), and it only has records with non-blank values of cat_want:
data want;
merge have
have (keep=idb cat_have rename=(cat_have=cat_want) where=(cat_want^=' '));
by idb;
run;
data have;
length Date $10 ST 8 IDa $1 IDb $3 Cat $1;
infile datalines truncover;
input Date ST IDa IDb Cat;
datalines;
JUN012018 500 A AA I
JUN022018 500 A AA I
JUN032018 500 A AA I
JUN012018 501 A AA
JUN022018 501 A AA
JUN032018 501 A AA
JUN012018 502 A AA I
JUN012018 502 A AA I
JUN012018 500 A AAB
JUN022018 500 A AAB
JUN032018 500 A AAB
JUN012018 501 A AAB D
JUN022018 501 A AAB D
JUN032018 501 A AAB D
JUN012018 502 A AAB
JUN012018 502 A AAB
;
proc sql;
create table want(drop=cat) as
select *,max(cat) as cat1
from have
group by idb
order by idb,st,date;
quit;
data have;
infile datalines truncover;
input Date : $10. ST IDa $ IDb $ Cat $;
datalines;
JUN012018 500 A AA I
JUN022018 500 A AA I
JUN032018 500 A AA I
JUN012018 501 A AA
JUN022018 501 A AA
JUN032018 501 A AA
JUN012018 502 A AA I
JUN012018 502 A AA I
JUN012018 500 A AAB
JUN022018 500 A AAB
JUN032018 500 A AAB
JUN012018 501 A AAB D
JUN022018 501 A AAB D
JUN032018 501 A AAB D
JUN012018 502 A AAB
JUN012018 502 A AAB
;
data want;
set have;
if (_n_ = 1) then do;
if 0 then set have(keep=idb cat);
declare hash h(dataset: "have(keep=idb cat where=(not missing(cat))", duplicate: "r");
h.definekey('idb');
h.definedata('cat');
h.definedone();
end;
rc=h.find();
drop rc;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.