DATA Step, Macro, Functions and more

select only specific group

Reply
Occasional Contributor
Posts: 16

select only specific group

I want to select those providers that have combination of rec_id not just one (adj or orig)…need to have both types of Rec ID.

Here is example of my data

PROVIDER                           REC_ID

1234                                       ADJ

1234                                       ADJ

1234                                       ADJ

1234                                       ORIG

1234                                       ORIG

5556                                       ADJ

5556                                       ADJ

5556                                       ADJ

Any ideas on how I can get that? From above example I simply would just have 1234 provider in my final data set and I would drop 5556 provider since he has only ADJ type of REC_ID

PROC Star
Posts: 7,356

select only specific group

You could do it fairly easily with proc sql.  e.g.,

proc sql;

  select distinct provider

    from have

      group by provider

        having min(rec_id) ne max(rec_id)

  ;

quit;

PROC Star
Posts: 7,356

select only specific group

Or, since you mentioned "final data set", if you want all of the records for cases that meet the criteria you could use something like:

proc sql noprint;

  create table want as

    select *

      from have

        group by provider

          having min(rec_id) ne max(rec_id)

  ;

quit;

Regular Contributor
Posts: 184

select only specific group

This is also a good place to utilize the use of DISTINCT with summary functions:

   having count(distinct rec_id) GT 1

art297 wrote:

Or, since you mentioned "final data set", if you want all of the records for cases that meet the criteria you could use something like:

proc sql noprint;

  create table want as

    select *

      from have

        group by provider

          having min(rec_id) ne max(rec_id)

  ;

quit;

N/A
Posts: 1

select only specific group

I was having a similar problem and your solution worked for me. Thank you.

Valued Guide
Posts: 2,174

select only specific group

I think a simple data step is enough for this data

data result ;

call missing( adj_ct, ori_ct ) ;

do until( last.provider ) ;

set your.data ;

by provider ;

adj_ct +( rec_id='ADJ' ) ;

ori_ct +( rec_id='ORIG' ) ;

end ;

if adj_ct and ori_ct ;

drop adj_ct ori_ct ;

run ;

Super User
Posts: 9,662

select only specific group

But .Peter

I tested your code. It is not right.

if adj_ct and ori_ct ; only can  hold the last.provider.

See the following:

data temp;
input PROVIDER     $                      REC_ID $;
cards;
1234                                       ADJ
1234                                       ADJ
1234                                       ADJ
1234                                       ORIG
1234                                       ORIG
5556                                       ADJ
5556                                       ADJ
5556                                       ADJ
;
run;
data result ;
call missing( adj_ct, ori_ct ) ;
do until( last.provider ) ;
set temp ;
by provider ;
adj_ct +( rec_id='ADJ' ) ;
ori_ct +( rec_id='ORIG' ) ;
end ;
if adj_ct and ori_ct ;
drop adj_ct ori_ct ;
run ;

Ksharp

Valued Guide
Posts: 2,174

select only specific group

Ksharp

I thought one row was the required answer : indicating the provider with both (distinct) REC_ID

For that purpose my code seems appropriate:

adding another provider with rows like

789   ORIG

789   ADJ

my data step creates two rows

I can offer a very simple variation if ALL rows for providers of both REC_ID, but the original posting was asking only about PROVIDERS

Super User
Posts: 9,662

select only specific group

Hi.

Peter.

Maye I or You misunderstood what op's mean.

Not sure Op want only one provider or muli-providers.

Ksharp

Occasional Contributor
Posts: 6

select only specific group

Hi Karo,

I made an exemple that make what you need. The same example can be build in SQL sintax.

Please let me know if it works !

Regards.

Kassim

/******

*

* START THE CODE HERE !!!!

*

*******/

/* CREATE A TABLE TO TEST */

data teste;

    input PROVIDER 4. REC_ID $3.;

datalines;

1234ADJ

1234ADJ

1234ADJ

1234ORIG

1234ORIG

5556ADJ

5556ADJ

5556ADJ

;

run;

/* SORT IT BY THE KEY FIELDS */

proc sort data=teste;

    by provider rec_id;

run;

/* COUNT HOW MANY KEYS HAS THE KEY FIELD */

data teste1;

    set teste;

    by provider rec_id;

    retain cont type;

    if first.provider then do;

        type = "";

        cont = 0;

    end;

    if first.rec_id then do;

        if type = "" or type ne rec_id then do;

            type = rec_id;

            cont = sum(cont, 1);

        end;

    end;

run;

/* MAKE A JOIN TO RETAIN THE IMPORTANT DATA */

proc sql;

    create table teste2 as

        select tb0.*

        from teste tb0

        inner join (select distinct * from teste1 where cont gt 1) tb1

        on tb0.provider = tb1.provider       

    ;

quit;

Ask a Question
Discussion stats
  • 9 replies
  • 176 views
  • 0 likes
  • 7 in conversation