SAS Enterprise Guide
Version 4.3 (4.3.0.11182)Hot fix 14
Windows XP
Service Pack 3
SAS 9.2
/*
I NEED TO CREATE A DATA STEP THAT WILL PARSE TWO VARIABLES LOOKING
FOR A ONE TO ONE RELATIONSHIP EVEN IF THE DATA IS REPEATED IN DIFFERENT
COLUMNS(ROWS) EXAMPLE C 3 IS CONSIDERED ONE TO ONE. ANY TIME A DISTINCT
NPI NUMBER SHARES THE SAME TAX_ID - THIS IS CONSIDERED A ONE TO MANY
RELATIONSHIP AND SHOULD BE STORED IN THE TAX_ID_NPI_ONE_2_MANY TABLE.
ANYTIME AN NPI NUMBER IS REPEATED SHARING THE SAME TAX_ID NUMBER THIS IS
CONSIDER ONE TO ONE RELATIONSHIP AND SHOULD BE STORED IN TAX_ID_NPI_ONE_2_ONE.
I'M NOT SURE IF THE FIRST/LAST FUNCTION WITH AN IF WAS THE CORRECT ROUTE.
I AM NEW TO THIS SAS THING SO ANY HELP IS GREATLY APPRECIATED. :smileyconfused:
*/
data TEST_DATA;
input TAX_ID $ NPI;
datalines;
A 1
B 2
C 3
C 3
C 3
D 4
D 5
D 6
E 3
F 4
F 4
G 5
G 6
;
data TAX_ID_NPI_ONE_2_ONE TAX_ID_NPI_ONE_2_MANY;
set WORK.TEST_DATA;
keep TAX_ID
NPI;
by TAX_ID NPI;
if ((FIRST.TAX_ID) AND (FIRST.NPI OR LAST.NPI))
OR ((LAST.TAX_ID) AND (FIRST.NPI OR LAST.NPI)) THEN
OUTPUT TAX_ID_NPI_ONE_2_ONE;
ELSE OUTPUT TAX_ID_NPI_ONE_2_MANY;
run;
/*
RESULTS:
TAX_ID_NPI_ONE_2_ONE
A 1 - CORRECT
B 2 - CORRECT
C 3 - CORRECT
C 3 - CORRECT
- INCORRECT MISSING THIRD C3
D 4 - INCORRECT
D 6 - INCORRECT
E 3 - CORRECT
F 4 - CORRECT
F 4 - CORRECT
G 5 - INCORRECT HAS ONE TO MANY RELATIONSHIP
G 6 - INCORRECT HAS ONE TO MANY RELATIONSHIP
TAX_ID_NPI_ONE_2_MANY
- MISSING D 4
- MISSING D 6
- MISSING G 5
- MISSING G 6
C 3 - INCORRECT HAS ONE TO ONE RELATIONSHIP
D 5 - CORRECT
*/
You need some way to count the number of distinct NPI values for each TAX_ID. You could do it with PROC SQL, but here is a method that takes advantage of the fact that your data is already sorted.
data test_data;
input tax_id $ npi @@;
datalines;
A 1 B 2 C 3 C 3 C 3 D 4 D 5 D 6 E 3 F 4 F 4 G 5 G 6
;;;;
data tax_id_npi_one_2_one tax_id_npi_one_2_many ;
do until (last.tax_id);
set test_data;
by tax_id npi;
n_npi=sum(n_npi,first.npi);
end;
do until (last.tax_id);
set test_data;
by tax_id npi;
if n_npi=1 then output tax_id_npi_one_2_one;
else output tax_id_npi_one_2_many ;
end;
run;
TAX_ID_NPI_ONE_2_ONE;
tax_id=A npi=1 n_npi=1
tax_id=B npi=2 n_npi=1
tax_id=C npi=3 n_npi=1
tax_id=C npi=3 n_npi=1
tax_id=C npi=3 n_npi=1
tax_id=E npi=3 n_npi=1
tax_id=F npi=4 n_npi=1
tax_id=F npi=4 n_npi=1
TAX_ID_NPI_ONE_2_MANY;
tax_id=D npi=4 n_npi=3
tax_id=D npi=5 n_npi=3
tax_id=D npi=6 n_npi=3
tax_id=G npi=5 n_npi=2
tax_id=G npi=6 n_npi=2
You need some way to count the number of distinct NPI values for each TAX_ID. You could do it with PROC SQL, but here is a method that takes advantage of the fact that your data is already sorted.
data test_data;
input tax_id $ npi @@;
datalines;
A 1 B 2 C 3 C 3 C 3 D 4 D 5 D 6 E 3 F 4 F 4 G 5 G 6
;;;;
data tax_id_npi_one_2_one tax_id_npi_one_2_many ;
do until (last.tax_id);
set test_data;
by tax_id npi;
n_npi=sum(n_npi,first.npi);
end;
do until (last.tax_id);
set test_data;
by tax_id npi;
if n_npi=1 then output tax_id_npi_one_2_one;
else output tax_id_npi_one_2_many ;
end;
run;
TAX_ID_NPI_ONE_2_ONE;
tax_id=A npi=1 n_npi=1
tax_id=B npi=2 n_npi=1
tax_id=C npi=3 n_npi=1
tax_id=C npi=3 n_npi=1
tax_id=C npi=3 n_npi=1
tax_id=E npi=3 n_npi=1
tax_id=F npi=4 n_npi=1
tax_id=F npi=4 n_npi=1
TAX_ID_NPI_ONE_2_MANY;
tax_id=D npi=4 n_npi=3
tax_id=D npi=5 n_npi=3
tax_id=D npi=6 n_npi=3
tax_id=G npi=5 n_npi=2
tax_id=G npi=6 n_npi=2
Here is an SQL solution. Note that SQL cannot generate multiple output data sets in one step the way that a data step can.
proc sql ;
create table TAX_ID_NPI_ONE_2_ONE as
select * from test_data
group by tax_id
having count(distinct npi) = 1
;
create table TAX_ID_NPI_ONE_2_MANY as
select * from test_data
group by tax_id
having count(distinct npi) > 1
;
quit;
Very helpful and when I re-read and validated the code - your two examples made it easy to understand and re-code accordinlgy. I actually learned something instead of copying and editing your code.
How would I include a third, fourth, etc variable if I was looking at a different Ex: Provider ID that shared the same TAX-ID & NPI. C 3 121 and C 3 122 are the same excluding the third column. Same tax_id, same npi, but he has a different provider id. If I wanted to distinguish the variance would I use a CASE within the same data set?
A 1 999
B 2 111
C 3 121
C 3 121
C 3 122
D 4 555
D 5 567
D 6 568
E 3 678
F 4 679
F 4 679
G 5 680
G 6 681
How about following one...
proc sql;
create table want as
select tax_id,max(npi) as max,min(npi) as min,
case
when calculated max EQ calculated min then "one_2_one"
else "one_2_many"
end as relation
from have as a
group by tax_id;
quit;
proc sql;
create table one_2_one as
select a.*
from have as a, want as b
where a.tax_id = b.tax_id and
relation = "one_2_one";
create table one_2_many as
select a.*
from have as a, want as b
where a.tax_id = b.tax_id and
relation = "one_2_many";
quit;
-Urvish
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.