SAS Enterprise Guide
Version 4.3 (4.3.0.11182)Hot fix 14
Windows XP
Service Pack 3
SAS 9.2
/*
I NEED TO CREATE A DATA STEP THAT WILL PARSE TWO VARIABLES LOOKING
FOR A ONE TO ONE RELATIONSHIP EVEN IF THE DATA IS REPEATED IN DIFFERENT
COLUMNS(ROWS) EXAMPLE C 3 IS CONSIDERED ONE TO ONE. ANY TIME A DISTINCT
NPI NUMBER SHARES THE SAME TAX_ID - THIS IS CONSIDERED A ONE TO MANY
RELATIONSHIP AND SHOULD BE STORED IN THE TAX_ID_NPI_ONE_2_MANY TABLE.
ANYTIME AN NPI NUMBER IS REPEATED SHARING THE SAME TAX_ID NUMBER THIS IS
CONSIDER ONE TO ONE RELATIONSHIP AND SHOULD BE STORED IN TAX_ID_NPI_ONE_2_ONE.
I'M NOT SURE IF THE FIRST/LAST FUNCTION WITH AN IF WAS THE CORRECT ROUTE.
I AM NEW TO THIS SAS THING SO ANY HELP IS GREATLY APPRECIATED. :smileyconfused:
*/
data TEST_DATA;
input TAX_ID $ NPI;
datalines;
A 1
B 2
C 3
C 3
C 3
D 4
D 5
D 6
E 3
F 4
F 4
G 5
G 6
;
data TAX_ID_NPI_ONE_2_ONE TAX_ID_NPI_ONE_2_MANY;
set WORK.TEST_DATA;
keep TAX_ID
NPI;
by TAX_ID NPI;
if ((FIRST.TAX_ID) AND (FIRST.NPI OR LAST.NPI))
OR ((LAST.TAX_ID) AND (FIRST.NPI OR LAST.NPI)) THEN
OUTPUT TAX_ID_NPI_ONE_2_ONE;
ELSE OUTPUT TAX_ID_NPI_ONE_2_MANY;
run;
/*
RESULTS:
TAX_ID_NPI_ONE_2_ONE
A 1 - CORRECT
B 2 - CORRECT
C 3 - CORRECT
C 3 - CORRECT
- INCORRECT MISSING THIRD C3
D 4 - INCORRECT
D 6 - INCORRECT
E 3 - CORRECT
F 4 - CORRECT
F 4 - CORRECT
G 5 - INCORRECT HAS ONE TO MANY RELATIONSHIP
G 6 - INCORRECT HAS ONE TO MANY RELATIONSHIP
TAX_ID_NPI_ONE_2_MANY
- MISSING D 4
- MISSING D 6
- MISSING G 5
- MISSING G 6
C 3 - INCORRECT HAS ONE TO ONE RELATIONSHIP
D 5 - CORRECT
*/
You need some way to count the number of distinct NPI values for each TAX_ID. You could do it with PROC SQL, but here is a method that takes advantage of the fact that your data is already sorted.
data test_data;
input tax_id $ npi @@;
datalines;
A 1 B 2 C 3 C 3 C 3 D 4 D 5 D 6 E 3 F 4 F 4 G 5 G 6
;;;;
data tax_id_npi_one_2_one tax_id_npi_one_2_many ;
do until (last.tax_id);
set test_data;
by tax_id npi;
n_npi=sum(n_npi,first.npi);
end;
do until (last.tax_id);
set test_data;
by tax_id npi;
if n_npi=1 then output tax_id_npi_one_2_one;
else output tax_id_npi_one_2_many ;
end;
run;
TAX_ID_NPI_ONE_2_ONE;
tax_id=A npi=1 n_npi=1
tax_id=B npi=2 n_npi=1
tax_id=C npi=3 n_npi=1
tax_id=C npi=3 n_npi=1
tax_id=C npi=3 n_npi=1
tax_id=E npi=3 n_npi=1
tax_id=F npi=4 n_npi=1
tax_id=F npi=4 n_npi=1
TAX_ID_NPI_ONE_2_MANY;
tax_id=D npi=4 n_npi=3
tax_id=D npi=5 n_npi=3
tax_id=D npi=6 n_npi=3
tax_id=G npi=5 n_npi=2
tax_id=G npi=6 n_npi=2
You need some way to count the number of distinct NPI values for each TAX_ID. You could do it with PROC SQL, but here is a method that takes advantage of the fact that your data is already sorted.
data test_data;
input tax_id $ npi @@;
datalines;
A 1 B 2 C 3 C 3 C 3 D 4 D 5 D 6 E 3 F 4 F 4 G 5 G 6
;;;;
data tax_id_npi_one_2_one tax_id_npi_one_2_many ;
do until (last.tax_id);
set test_data;
by tax_id npi;
n_npi=sum(n_npi,first.npi);
end;
do until (last.tax_id);
set test_data;
by tax_id npi;
if n_npi=1 then output tax_id_npi_one_2_one;
else output tax_id_npi_one_2_many ;
end;
run;
TAX_ID_NPI_ONE_2_ONE;
tax_id=A npi=1 n_npi=1
tax_id=B npi=2 n_npi=1
tax_id=C npi=3 n_npi=1
tax_id=C npi=3 n_npi=1
tax_id=C npi=3 n_npi=1
tax_id=E npi=3 n_npi=1
tax_id=F npi=4 n_npi=1
tax_id=F npi=4 n_npi=1
TAX_ID_NPI_ONE_2_MANY;
tax_id=D npi=4 n_npi=3
tax_id=D npi=5 n_npi=3
tax_id=D npi=6 n_npi=3
tax_id=G npi=5 n_npi=2
tax_id=G npi=6 n_npi=2
Here is an SQL solution. Note that SQL cannot generate multiple output data sets in one step the way that a data step can.
proc sql ;
create table TAX_ID_NPI_ONE_2_ONE as
select * from test_data
group by tax_id
having count(distinct npi) = 1
;
create table TAX_ID_NPI_ONE_2_MANY as
select * from test_data
group by tax_id
having count(distinct npi) > 1
;
quit;
Very helpful and when I re-read and validated the code - your two examples made it easy to understand and re-code accordinlgy. I actually learned something instead of copying and editing your code.
How would I include a third, fourth, etc variable if I was looking at a different Ex: Provider ID that shared the same TAX-ID & NPI. C 3 121 and C 3 122 are the same excluding the third column. Same tax_id, same npi, but he has a different provider id. If I wanted to distinguish the variance would I use a CASE within the same data set?
A 1 999
B 2 111
C 3 121
C 3 121
C 3 122
D 4 555
D 5 567
D 6 568
E 3 678
F 4 679
F 4 679
G 5 680
G 6 681
How about following one...
proc sql;
create table want as
select tax_id,max(npi) as max,min(npi) as min,
case
when calculated max EQ calculated min then "one_2_one"
else "one_2_many"
end as relation
from have as a
group by tax_id;
quit;
proc sql;
create table one_2_one as
select a.*
from have as a, want as b
where a.tax_id = b.tax_id and
relation = "one_2_one";
create table one_2_many as
select a.*
from have as a, want as b
where a.tax_id = b.tax_id and
relation = "one_2_many";
quit;
-Urvish
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.