DATA Step, Macro, Functions and more

Subset

Reply
Super Contributor
Posts: 673

Subset

product         altid                           cc_code
abc             1234                           A1
abc             1234
abc             1234                          B1

abc             2345                          A1
abc             2345                          A1
abc             2345                          B1

abc             3456
abc             3456

abc             4567                          A1
abc             4567                          A1

From the above dataset 
How to get a dataset  which has the following records


product         altid                        cc_code
abc             1234                           A1
abc             1234                           B1
abc             2345                           A1
abc             2345                           B1
abc             3456
abc             4567                           A1

Respected Advisor
Posts: 3,156

Subset

Try this one:

data have;

infile cards truncover;

input (product altid cc_code) (:$10.);

cards;

abc             1234                           A1

abc             1234

abc             1234                          B1

abc             2345                          A1

abc             2345                          A1

abc             2345                          B1

abc             3456

abc             3456

abc             4567                          A1

abc             4567                          A1

;

proc sql;

create table want as

select distinct a.product, a.altid, b.cc_code from have a

left join

(select distinct * from have where not missing (cc_code))b

on a.product=b.product and

    a.altid=b.altid

order by a.product, a.altid, b.cc_code;

quit;

proc print;run;

Regards,

Haikuo

PROC Star
Posts: 7,492

Subset

or, using proc sort and a datastep:

data have;

  input product $ altid cc_code $;

  cards;

abc 1234 A1

abc 1234 .

abc 1234 B1

abc 2345 A1

abc 2345 A1

abc 2345 B1

abc 3456 .

abc 3456 .

abc 4567 A1

abc 4567 A1

;

proc sort data=have out=want nodupkey;

  by product altid cc_code;

run;

data want;

  set want;

  by product altid;

  if (last.altid and missing(cc_code)) or

    not missing(cc_code);

run;

Super Contributor
Posts: 673

Subset

small change in data structure

data have;

infile cards truncover;

input (product altid cc_code dxden rank) (:$10.);

cards;

abc             1234                          A1     42390     2

abc             1234                                   49392    

abc             1234                          B1     79902      1

abc             1234                          B1     79902      2

abc             2345                          A1     42390      2

abc             2345                          A1     42390      1

abc             2345                          B1     79902      1

abc             3456                                   49392    

abc             3456                                   49393

abc             4567                          A1     42390       1  

abc             4567                          A1      42390      1

;

this is expected:

abc             1234                          A1     42390     2

abc             1234                          B1     79902     1

abc             2345                          A1     42390     1

abc             3456                                   49392

abc             4567                          A1     42390     1

PROC Star
Posts: 7,492

Subset

Why isn't the following identified record included in the desired output?

abc             1234                          A1     42390     2

abc             1234                                   49392   

abc             1234                          B1     79902      1

abc             1234                          B1     79902      2

abc             2345                          A1     42390      2

abc             2345                          A1     42390      1

--------------> abc             2345                          B1     79902      1

abc             3456                                   49392   

abc             3456                                   49393

abc             4567                          A1     42390       1 

abc             4567                          A1      42390      1

If that was just an oversight, using the proc sort method would still accomplish the task by adding one additional sort (but I can't test it at the moment).  i.e.:

proc sort data=have out=want;

  by product altid cc_code dxden rank;

run;

proc sort data=want nodupkey;

  by product altid cc_code;

run;

data want;

  set want;

  by product altid;

  if (last.altid and missing(cc_code)) or

    not missing(cc_code);

run;

Respected Advisor
Posts: 3,156

Subset

Agree with Art, the rule is not consistent. A tweak on my original code will do as well, you need another join though.

data have;

infile cards truncover;

input (product altid cc_code dxden rank) (:$10.);

cards;

abc             1234                          A1     42390     2

abc             1234                           .        49392

abc             1234                          B1     79902      1

abc             1234                          B1     79902      2

abc             2345                          A1     42390      2

abc             2345                          A1     42390      1

abc             2345                          B1     79902      1

abc             3456                           .        49392

abc             3456                           .        49393

abc             4567                          A1     42390       1

abc             4567                          A1      42390      1

;

proc sql;

create table want as

select distinct a.product, a.altid, b.cc_code, coalescec(b.dxden,c.dxden) as deden, b.rank from have a

left join

(select product, altid, cc_code, dxden ,min(rank) as rank from have where not missing (cc_code)

group by product, altid ,cc_code, dxden)b

on a.product=b.product and

    a.altid=b.altid

left join (select product, altid, min(dxden) as dxden from have

group by product, altid) c

on a.product=c.product and

   a.altid=c.altid

order by a.product, a.altid, b.cc_code;

quit;

proc print;run;

Super Contributor
Posts: 1,636

Subset

data have;

infile cards missover;

input product$          :altid $          : cc_code$ ;

cards;

abc             1234                           A1

abc             1234

abc             1234                          B1

abc             2345                          A1

abc             2345                          A1

abc             2345                          B1

abc             3456

abc             3456

abc             4567                          A1

abc             4567                          A1

;

data have;

  set have;

  by product altid;

  if first.altid or last.altid;

proc sql;

   create table want as select distinct * from have;

quit;

proc print;run;

                                 obs    product    altid    cc_code

                                 1       abc      1234       A1

                                 2       abc      1234       B1

                                 3       abc      2345       A1

                                 4       abc      2345       B1

                                 5       abc      3456

                                 6       abc      4567       A1

Super User
Posts: 10,044

Subset

I have the same question Art described for you.

data have;
infile cards truncover;
input (product altid cc_code dxden rank) (:$10.);
cards;
abc             1234                          A1     42390     2
abc             1234                           .        49392
abc             1234                          B1     79902      1
abc             1234                          B1     79902      2
abc             2345                          A1     42390      2
abc             2345                          A1     42390      1
abc             2345                          B1     79902      1
abc             3456                           .        49392
abc             3456                           .        49393
abc             4567                          A1     42390       1
abc             4567                          A1      42390      1
;
run;
proc sort data=have ;by product altid cc_code;run;;
data x;
 set have;
 by product altid cc_code;
 if first.cc_code and last.cc_code and missing(cc_code) then delete;
run;
proc sort data=x out=want nodupkey;by product altid cc_code;run;


Ksharp

Ask a Question
Discussion stats
  • 7 replies
  • 389 views
  • 4 likes
  • 5 in conversation