BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Dsrountree
Obsidian | Level 7

SAS Enterprise Guide

Version 4.3 (4.3.0.11182)Hot fix 14

Windows XP

Service Pack 3

SAS 9.2

/*

I NEED TO CREATE A DATA STEP THAT WILL PARSE TWO VARIABLES LOOKING

FOR A ONE TO ONE RELATIONSHIP EVEN IF THE DATA IS REPEATED IN DIFFERENT

COLUMNS(ROWS) EXAMPLE C 3 IS CONSIDERED ONE TO ONE. ANY TIME A DISTINCT

NPI NUMBER SHARES THE SAME TAX_ID - THIS IS CONSIDERED A ONE TO MANY

RELATIONSHIP AND SHOULD BE STORED IN THE TAX_ID_NPI_ONE_2_MANY TABLE.

ANYTIME AN NPI NUMBER IS REPEATED SHARING THE SAME TAX_ID NUMBER THIS IS

CONSIDER ONE TO ONE RELATIONSHIP AND SHOULD BE STORED IN TAX_ID_NPI_ONE_2_ONE.

I'M NOT SURE IF THE FIRST/LAST FUNCTION WITH AN IF WAS THE CORRECT ROUTE.

I AM NEW TO THIS SAS THING SO ANY HELP IS GREATLY APPRECIATED. :smileyconfused:

*/

data TEST_DATA;

     input TAX_ID $ NPI;

     datalines;

A 1

B 2

C 3

C 3

C 3

D 4

D 5

D 6

E 3

F 4

F 4

G 5

G 6

;

data TAX_ID_NPI_ONE_2_ONE TAX_ID_NPI_ONE_2_MANY;

     set WORK.TEST_DATA;

     keep     TAX_ID

           NPI;

     by TAX_ID NPI;

     if ((FIRST.TAX_ID) AND (FIRST.NPI OR LAST.NPI))

           OR ((LAST.TAX_ID) AND (FIRST.NPI OR LAST.NPI)) THEN

           OUTPUT     TAX_ID_NPI_ONE_2_ONE;

     ELSE OUTPUT TAX_ID_NPI_ONE_2_MANY;

run;

/*

RESULTS:

TAX_ID_NPI_ONE_2_ONE

A    1 - CORRECT

B    2 - CORRECT

C    3 - CORRECT

C    3 - CORRECT

       - INCORRECT MISSING THIRD C3

D    4 - INCORRECT

D    6 - INCORRECT

E    3 - CORRECT

F    4 - CORRECT

F    4 - CORRECT

G    5 - INCORRECT HAS ONE TO MANY RELATIONSHIP

G    6 - INCORRECT HAS ONE TO MANY RELATIONSHIP

TAX_ID_NPI_ONE_2_MANY

- MISSING D 4

- MISSING D 6

- MISSING G 5

- MISSING G 6

C    3 - INCORRECT HAS ONE TO ONE RELATIONSHIP

D    5 - CORRECT

*/

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

You need some way to count the number of distinct NPI values for each TAX_ID.  You could do it with PROC SQL, but here is a method that takes advantage of the fact that your data is already sorted.

data test_data;

  input tax_id $ npi @@;

datalines;

A 1 B 2 C 3 C 3 C 3 D 4 D 5 D 6 E 3 F 4 F 4 G 5 G 6

;;;;

data tax_id_npi_one_2_one tax_id_npi_one_2_many ;

  do until (last.tax_id);

    set test_data;

    by tax_id npi;

    n_npi=sum(n_npi,first.npi);

  end;

  do until (last.tax_id);

    set test_data;

    by tax_id npi;

    if n_npi=1 then output tax_id_npi_one_2_one;

    else output tax_id_npi_one_2_many ;

  end;

run;

TAX_ID_NPI_ONE_2_ONE;

tax_id=A npi=1 n_npi=1

tax_id=B npi=2 n_npi=1

tax_id=C npi=3 n_npi=1

tax_id=C npi=3 n_npi=1

tax_id=C npi=3 n_npi=1

tax_id=E npi=3 n_npi=1

tax_id=F npi=4 n_npi=1

tax_id=F npi=4 n_npi=1

TAX_ID_NPI_ONE_2_MANY;

tax_id=D npi=4 n_npi=3

tax_id=D npi=5 n_npi=3

tax_id=D npi=6 n_npi=3

tax_id=G npi=5 n_npi=2

tax_id=G npi=6 n_npi=2

View solution in original post

5 REPLIES 5
Tom
Super User Tom
Super User

You need some way to count the number of distinct NPI values for each TAX_ID.  You could do it with PROC SQL, but here is a method that takes advantage of the fact that your data is already sorted.

data test_data;

  input tax_id $ npi @@;

datalines;

A 1 B 2 C 3 C 3 C 3 D 4 D 5 D 6 E 3 F 4 F 4 G 5 G 6

;;;;

data tax_id_npi_one_2_one tax_id_npi_one_2_many ;

  do until (last.tax_id);

    set test_data;

    by tax_id npi;

    n_npi=sum(n_npi,first.npi);

  end;

  do until (last.tax_id);

    set test_data;

    by tax_id npi;

    if n_npi=1 then output tax_id_npi_one_2_one;

    else output tax_id_npi_one_2_many ;

  end;

run;

TAX_ID_NPI_ONE_2_ONE;

tax_id=A npi=1 n_npi=1

tax_id=B npi=2 n_npi=1

tax_id=C npi=3 n_npi=1

tax_id=C npi=3 n_npi=1

tax_id=C npi=3 n_npi=1

tax_id=E npi=3 n_npi=1

tax_id=F npi=4 n_npi=1

tax_id=F npi=4 n_npi=1

TAX_ID_NPI_ONE_2_MANY;

tax_id=D npi=4 n_npi=3

tax_id=D npi=5 n_npi=3

tax_id=D npi=6 n_npi=3

tax_id=G npi=5 n_npi=2

tax_id=G npi=6 n_npi=2

Tom
Super User Tom
Super User

Here is an SQL solution. Note that SQL cannot generate multiple output data sets in one step the way that a data step can.

proc sql ;

  create table TAX_ID_NPI_ONE_2_ONE as

    select * from test_data

    group by tax_id

    having count(distinct npi) = 1

  ;

  create table TAX_ID_NPI_ONE_2_MANY as

    select * from test_data

    group by tax_id

    having count(distinct npi) > 1

  ;

quit;

Dsrountree
Obsidian | Level 7

Very helpful and when I re-read and validated the code - your two examples made it easy to understand and re-code accordinlgy.  I actually learned something instead of copying and editing your code.  Smiley Happy

Dsrountree
Obsidian | Level 7

How would I include a third, fourth, etc variable if I was looking at a different Ex: Provider ID that shared the same TAX-ID & NPI.  C 3 121 and C 3 122 are the same excluding the third column.  Same tax_id, same npi, but he has a different provider id.  If I wanted to distinguish the variance would I use a CASE within the same data set?

A 1 999

B 2 111

C 3 121

C 3 121

C 3 122

D 4 555

D 5 567

D 6 568

E 3 678

F 4 679

F 4 679

G 5 680

G 6 681

UrvishShah
Fluorite | Level 6

How about following one...

proc sql;

   create table want as

   select tax_id,max(npi) as max,min(npi) as min,

          case

                  when calculated max EQ calculated min then "one_2_one"

                    else "one_2_many"

              end as relation

   from have as a

   group by tax_id;

quit;

proc sql;

   create table one_2_one as

   select a.*

   from have as a, want as b

   where a.tax_id = b.tax_id and

         relation = "one_2_one";

   create table one_2_many as

   select a.*

   from have as a, want as b

   where a.tax_id = b.tax_id and

         relation = "one_2_many";

quit;

-Urvish

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 970 views
  • 10 likes
  • 3 in conversation