Solved: 1 to 1 & 1 to many w/2 variables (repeating)

Dsrountree · Posted 08-06-2013 08:59 AM

SAS Enterprise Guide

Version 4.3 (4.3.0.11182)Hot fix 14

Windows XP

Service Pack 3

SAS 9.2

/*

I NEED TO CREATE A DATA STEP THAT WILL PARSE TWO VARIABLES LOOKING

FOR A ONE TO ONE RELATIONSHIP EVEN IF THE DATA IS REPEATED IN DIFFERENT

COLUMNS(ROWS) EXAMPLE C 3 IS CONSIDERED ONE TO ONE. ANY TIME A DISTINCT

NPI NUMBER SHARES THE SAME TAX_ID - THIS IS CONSIDERED A ONE TO MANY

RELATIONSHIP AND SHOULD BE STORED IN THE TAX_ID_NPI_ONE_2_MANY TABLE.

ANYTIME AN NPI NUMBER IS REPEATED SHARING THE SAME TAX_ID NUMBER THIS IS

CONSIDER ONE TO ONE RELATIONSHIP AND SHOULD BE STORED IN TAX_ID_NPI_ONE_2_ONE.

I'M NOT SURE IF THE FIRST/LAST FUNCTION WITH AN IF WAS THE CORRECT ROUTE.

I AM NEW TO THIS SAS THING SO ANY HELP IS GREATLY APPRECIATED. :smileyconfused:

*/

data TEST_DATA;

input TAX_ID $ NPI;

datalines;

A 1

B 2

C 3

D 4

D 5

D 6

E 3

F 4

G 5

G 6

;

data TAX_ID_NPI_ONE_2_ONE TAX_ID_NPI_ONE_2_MANY;

set WORK.TEST_DATA;

keep TAX_ID

NPI;

by TAX_ID NPI;

if ((FIRST.TAX_ID) AND (FIRST.NPI OR LAST.NPI))

OR ((LAST.TAX_ID) AND (FIRST.NPI OR LAST.NPI)) THEN

OUTPUT TAX_ID_NPI_ONE_2_ONE;

ELSE OUTPUT TAX_ID_NPI_ONE_2_MANY;

run;

/*

RESULTS:

TAX_ID_NPI_ONE_2_ONE

A 1 - CORRECT

B 2 - CORRECT

C 3 - CORRECT

- INCORRECT MISSING THIRD C3

D 4 - INCORRECT

D 6 - INCORRECT

E 3 - CORRECT

F 4 - CORRECT

G 5 - INCORRECT HAS ONE TO MANY RELATIONSHIP

G 6 - INCORRECT HAS ONE TO MANY RELATIONSHIP

TAX_ID_NPI_ONE_2_MANY

- MISSING D 4

- MISSING D 6

- MISSING G 5

- MISSING G 6

C 3 - INCORRECT HAS ONE TO ONE RELATIONSHIP

D 5 - CORRECT

*/

Tom · Posted 08-06-2013 09:17 AM

You need some way to count the number of distinct NPI values for each TAX_ID. You could do it with PROC SQL, but here is a method that takes advantage of the fact that your data is already sorted.

data test_data;

input tax_id $ npi @@;

datalines;

A 1 B 2 C 3 C 3 C 3 D 4 D 5 D 6 E 3 F 4 F 4 G 5 G 6

;;;;

data tax_id_npi_one_2_one tax_id_npi_one_2_many ;

do until (last.tax_id);

set test_data;

by tax_id npi;

n_npi=sum(n_npi,first.npi);

end;

do until (last.tax_id);

set test_data;

by tax_id npi;

if n_npi=1 then output tax_id_npi_one_2_one;

else output tax_id_npi_one_2_many ;

end;

run;

TAX_ID_NPI_ONE_2_ONE;

tax_id=A npi=1 n_npi=1

tax_id=B npi=2 n_npi=1

tax_id=C npi=3 n_npi=1

tax_id=E npi=3 n_npi=1

tax_id=F npi=4 n_npi=1

TAX_ID_NPI_ONE_2_MANY;

tax_id=D npi=4 n_npi=3

tax_id=D npi=5 n_npi=3

tax_id=D npi=6 n_npi=3

tax_id=G npi=5 n_npi=2

tax_id=G npi=6 n_npi=2

View solution in original post

Tom · Posted 08-06-2013 09:17 AM

You need some way to count the number of distinct NPI values for each TAX_ID. You could do it with PROC SQL, but here is a method that takes advantage of the fact that your data is already sorted.

data test_data;

input tax_id $ npi @@;

datalines;

A 1 B 2 C 3 C 3 C 3 D 4 D 5 D 6 E 3 F 4 F 4 G 5 G 6

;;;;

data tax_id_npi_one_2_one tax_id_npi_one_2_many ;

do until (last.tax_id);

set test_data;

by tax_id npi;

n_npi=sum(n_npi,first.npi);

end;

do until (last.tax_id);

set test_data;

by tax_id npi;

if n_npi=1 then output tax_id_npi_one_2_one;

else output tax_id_npi_one_2_many ;

end;

run;

TAX_ID_NPI_ONE_2_ONE;

tax_id=A npi=1 n_npi=1

tax_id=B npi=2 n_npi=1

tax_id=C npi=3 n_npi=1

tax_id=E npi=3 n_npi=1

tax_id=F npi=4 n_npi=1

TAX_ID_NPI_ONE_2_MANY;

tax_id=D npi=4 n_npi=3

tax_id=D npi=5 n_npi=3

tax_id=D npi=6 n_npi=3

tax_id=G npi=5 n_npi=2

tax_id=G npi=6 n_npi=2

Tom · Posted 08-06-2013 09:50 AM

Here is an SQL solution. Note that SQL cannot generate multiple output data sets in one step the way that a data step can.

proc sql ;

create table TAX_ID_NPI_ONE_2_ONE as

select * from test_data

group by tax_id

having count(distinct npi) = 1

;

create table TAX_ID_NPI_ONE_2_MANY as

select * from test_data

group by tax_id

having count(distinct npi) > 1

;

quit;

Dsrountree · Posted 08-06-2013 10:02 AM

Very helpful and when I re-read and validated the code - your two examples made it easy to understand and re-code accordinlgy. I actually learned something instead of copying and editing your code.

Dsrountree · Posted 08-06-2013 11:47 AM

How would I include a third, fourth, etc variable if I was looking at a different Ex: Provider ID that shared the same TAX-ID & NPI. C 3 121 and C 3 122 are the same excluding the third column. Same tax_id, same npi, but he has a different provider id. If I wanted to distinguish the variance would I use a CASE within the same data set?

A 1 999

B 2 111

C 3 121

C 3 121

C 3 122

D 4 555

D 5 567

D 6 568

E 3 678

F 4 679

G 5 680

G 6 681

UrvishShah · Posted 08-07-2013 01:52 AM

How about following one...

proc sql;

create table want as

select tax_id,max(npi) as max,min(npi) as min,

case

when calculated max EQ calculated min then "one_2_one"

else "one_2_many"

end as relation

from have as a

group by tax_id;

quit;

proc sql;

create table one_2_one as

select a.*

from have as a, want as b

where a.tax_id = b.tax_id and

relation = "one_2_one";

create table one_2_many as

select a.*

from have as a, want as b

where a.tax_id = b.tax_id and

relation = "one_2_many";

quit;

-Urvish

1 to 1 & 1 to many w/2 variables (repeating)

Re: 1 to 1 & 1 to many w/2 variables (repeating)

Re: 1 to 1 & 1 to many w/2 variables (repeating)

Re: 1 to 1 & 1 to many w/2 variables (repeating)

Re: 1 to 1 & 1 to many w/2 variables (repeating)

Re: 1 to 1 & 1 to many w/2 variables (repeating)

Re: 1 to 1 & 1 to many w/2 variables (repeating)

Classroom Training Available!