DATA Step, Macro, Functions and more

Populating columns with missing variables

Reply
Contributor
Posts: 23

Populating columns with missing variables

Dear All My data is as follows Date ST IDa IDb Cat JUN012018 500 A AA I JUN022018 500 A AA I JUN032018 500 A AA I JUN012018 501 A AA JUN022018 501 A AA JUN032018 501 A AA JUN012018 502 A AA I JUN012018 502 A AA I JUN012018 500 A AAB JUN022018 500 A AAB JUN032018 500 A AAB JUN012018 501 A AAB D JUN022018 501 A AAB D JUN032018 501 A AAB D JUN012018 502 A AAB JUN012018 502 A AAB and so on The data set should look like this Date ST IDa IDb Cat JUN012018 500 A AA I JUN022018 500 A AA I JUN032018 500 A AA I JUN012018 501 A AA I JUN022018 501 A AA I JUN032018 501 A AA I JUN012018 502 A AA I JUN012018 502 A AA I JUN012018 500 A AAB D JUN022018 500 A AAB D JUN032018 500 A AAB D JUN012018 501 A AAB D JUN022018 501 A AAB D JUN032018 501 A AAB D JUN012018 502 A AAB D JUN012018 502 A AAB D Thanks Stan
Contributor
Posts: 23

Re: Populating columns with missing variables

Posted in reply to RandyStan

Putting this question again because the formatting seems to be messed up

 

Dear All:
My data set is as follows

Date                            ST                        IDa                         IDb              Cat
JUN012018               500                          A                           AA                I
JUN022018               500                          A                           AA                I
JUN032018               500                          A                           AA                I

JUN012018                501                         A                           AA 

JUN022018               501                          A                           AA

JUN032018                501                         A                           AA

JUN012018                502                         A                           AA                 I

JUN012018                502                          A                           AA                I

JUN012018               500                          A                           AAB               
JUN022018               500                          A                           AAB                
JUN032018               500                          A                           AAB                

JUN012018                501                         A                           AAB              D

JUN022018               501                          A                           AAB              D

JUN032018                501                         A                           AAB              D

JUN012018                502                         A                           AAB                 

JUN012018                502                          A                           AAB                

 

 

and so on

The data set should look like this

 

 

Date                            ST                        IDa                         IDb              Cat
JUN012018               500                          A                           AA                I
JUN022018               500                          A                           AA                I
JUN032018               500                          A                           AA                I

JUN012018                501                         A                           AA                I

JUN022018               501                          A                           AA                I

JUN032018                501                         A                           AA                I

JUN012018                502                         A                           AA                 I

JUN012018                502                          A                           AA                I

JUN012018               500                          A                           AAB             D              
JUN022018               500                          A                           AAB             D            
JUN032018               500                          A                           AAB             D   

JUN012018                501                         A                           AAB              D

JUN022018               501                          A                           AAB              D

JUN032018                501                         A                           AAB              D

JUN012018                502                         A                           AAB              D      

JUN012018                502                          A                           AAB             D

 

 

Thanks 

Stan          

  

Regular Contributor
Posts: 167

Re: Populating columns with missing variables

Posted in reply to RandyStan

hat's the difference between the two?

--------------
blog: papersandprograms.com
Contributor
Posts: 23

Re: Populating columns with missing variables

Posted in reply to PaulBrownPhD

No difference.  I saw that the formatting was messed up, so I posted it again.  OI hope that the data and the question is clearer now.

Super User
Posts: 23,778

Re: Populating columns with missing variables

Posted in reply to RandyStan

The start and end look the same to me. 

 

Here’s  some instructions on posting sample data. Also, don’t edit the info in the code block in the main editor, that’s messes up the line breaks for some reason. 

 


@RandyStan wrote:

No difference.  I saw that the formatting was messed up, so I posted it again.  OI hope that the data and the question is clearer now.


 

Contributor
Posts: 23

Re: Populating columns with missing variables

Please check the CAT column.  In the first data set the variables are missing.  In the second, the data set I want, they column CAT is populated.

   Thanks so much

Super User
Posts: 23,778

Re: Populating columns with missing variables

Posted in reply to RandyStan

Given the way the data is formatted it’s really hard to see that or what the logic is. 

 


@RandyStan wrote:

Please check the CAT column.  In the first data set the variables are missing.  In the second, the data set I want, they column CAT is populated.

   Thanks so much


 

Respected Advisor
Posts: 4,743

Re: Populating columns with missing variables

[ Edited ]
Posted in reply to RandyStan

@RandyStan

Below picture created by combining your have and want data.

Please explain the logic how you'd populate "CAT_WANT".

Capture.JPG

 

Below how you'd best post sample data if you want to motivate people to post answers with working code.

data have;
  infile datalines dlm=' ' truncover;
  format Date date9.;
  input _dt1 $1-3 _dt2 $4-5 _dt3 $6-9 (ST IDa IDb Cat_Have) ($);
  date=input(cats(_dt2,_dt1,_dt3),date9.);
  drop _dt:;
  datalines;
JUN012018 500 A AA I
JUN022018 500 A AA I
JUN032018 500 A AA I
JUN012018 501 A AA 
JUN022018 501 A AA
JUN032018 501 A AA
JUN012018 502 A AA I
JUN012018 502 A AA I
JUN012018 500 A AAB 
JUN022018 500 A AAB 
JUN032018 500 A AAB 
JUN012018 501 A AAB D
JUN022018 501 A AAB D
JUN032018 501 A AAB D
JUN012018 502 A AAB 
JUN012018 502 A AAB 
;
run;

 

Esteemed Advisor
Posts: 5,543

Re: Populating columns with missing variables

Posted in reply to RandyStan

Use a double DO UNTIL() loop.The first loop records the last non-missing Cat value in the IDb-group, the second loop forces that value where Cat is missing :

 

data have;
length Date $10 ST 8 IDa $1 IDb $3 Cat $1;
infile datalines truncover;
input Date ST IDa IDb Cat;
datalines;
JUN012018 500 A AA I
JUN022018 500 A AA I
JUN032018 500 A AA I
JUN012018 501 A AA 
JUN022018 501 A AA
JUN032018 501 A AA
JUN012018 502 A AA I
JUN012018 502 A AA I
JUN012018 500 A AAB 
JUN022018 500 A AAB 
JUN032018 500 A AAB 
JUN012018 501 A AAB D
JUN022018 501 A AAB D
JUN032018 501 A AAB D
JUN012018 502 A AAB 
JUN012018 502 A AAB 
;

data want;
do until(last.IDb);
    set have; by IDb notsorted;
    if not missing(Cat) then groupCat = Cat;
    end;
do until(last.IDb);
    set have; by IDb notsorted;
    if missing(Cat) then Cat = groupCat;
    output;
    end;
drop groupCat;
run;

proc print data=want noobs; run;

 

 

PG
Trusted Advisor
Posts: 1,346

Re: Populating columns with missing variables

Posted in reply to RandyStan

This problem is very amenable to a self merge, where the first instance of HAVE has all the records and variables.  The second instance has only the BY-variable(s) (IDB in your case) and the new variable of interest (cat_want, renamed from cat_have), and it only has records with non-blank values of cat_want:

 

data want;
  merge have
        have (keep=idb cat_have rename=(cat_have=cat_want) where=(cat_want^=' '));
  by idb;
run;
PROC Star
Posts: 1,845

Re: Populating columns with missing variables

[ Edited ]
Posted in reply to RandyStan
data have;
length Date $10 ST 8 IDa $1 IDb $3 Cat $1;
infile datalines truncover;
input Date ST IDa IDb Cat;
datalines;
JUN012018 500 A AA I
JUN022018 500 A AA I
JUN032018 500 A AA I
JUN012018 501 A AA 
JUN022018 501 A AA
JUN032018 501 A AA
JUN012018 502 A AA I
JUN012018 502 A AA I
JUN012018 500 A AAB 
JUN022018 500 A AAB 
JUN032018 500 A AAB 
JUN012018 501 A AAB D
JUN022018 501 A AAB D
JUN032018 501 A AAB D
JUN012018 502 A AAB 
JUN012018 502 A AAB 
;


proc sql;
create table want(drop=cat) as
select *,max(cat) as cat1
from have
group by idb
order by idb,st,date;
quit;

PROC Star
Posts: 1,845

Re: Populating columns with missing variables

Posted in reply to novinosrin
data have;
infile datalines truncover;
input Date : $10. ST  IDa $ IDb $ Cat $;
datalines;
JUN012018 500 A AA I
JUN022018 500 A AA I
JUN032018 500 A AA I
JUN012018 501 A AA 
JUN022018 501 A AA
JUN032018 501 A AA
JUN012018 502 A AA I
JUN012018 502 A AA I
JUN012018 500 A AAB 
JUN022018 500 A AAB 
JUN032018 500 A AAB 
JUN012018 501 A AAB D
JUN022018 501 A AAB D
JUN032018 501 A AAB D
JUN012018 502 A AAB 
JUN012018 502 A AAB 
;

data want;
set have;
if (_n_ = 1) then do;
if 0 then set have(keep=idb cat);
    declare hash h(dataset: "have(keep=idb cat where=(not missing(cat))", duplicate: "r");
    h.definekey('idb');
    h.definedata('cat');
    h.definedone();
 end;
 rc=h.find();
 drop rc;
 run;

Ask a Question
Discussion stats
  • 11 replies
  • 216 views
  • 1 like
  • 7 in conversation