Solved: How to create repeated observations of two variables for each value wi...

imcbczm · Posted 11-10-2015 11:57 AM

Hi,

I have a dataset but want to create a new one which will repeat two variables (here are 'ID', 'dx') for each unique value in the third variable (here is 'test'). I guess it would be simple but I could not get it done at this moment.

For expample:

------------------

data have;

input Id dx $ test;

datalines;

8210 no 1

8210 yes 20

8310 no 30

8321 yes 11

8321 no 2

;

run;

---------------

I want:

----------

8210 no 1

8210 no 20

8210 no 30

8210 yes 1

8210 yes 20

8210 yes 30

8210 no 1

8210 no 20

8210 no 30

8321 yes 11

8321 yes 2

8321 no 11

8321 no 2

-----------------

Thank you for help!

Z

hbi · Posted 11-10-2015 12:29 PM

Below is a PROC SQL version of the same. I think "want_version1" is what you are seeking.

Comment: I included "want_version2", but it removes duplicates and returns only distinct observations.

data have;
input ID dx $ test;
datalines;
8210 no   1
8210 yes 20
8210 no  30
8321 yes 11
8321 no   2
;
run;

/* want_version1 will give you exactly what you want */
PROC SQL;
  CREATE TABLE want_version1 AS 
  SELECT FirstQuery.ID
       , dx
       , test
  FROM (SELECT ID, dx
        FROM have) AS FirstQuery
  INNER JOIN (SELECT ID, test
              FROM have) AS SecondQuery
    ON FirstQuery.ID = SecondQuery.ID;
QUIT;

/* want_version2 is a different solution that removes duplicates */
PROC SQL;
  CREATE TABLE want_version2 AS 
  SELECT DISTINCT FirstQuery.ID
       , dx
       , test
  FROM (SELECT DISTINCT ID, dx
        FROM have) AS FirstQuery
  INNER JOIN (SELECT DISTINCT ID, test
              FROM have) AS SecondQuery
    ON FirstQuery.ID = SecondQuery.ID;
QUIT;

View solution in original post

billfish · Posted 11-10-2015 12:09 PM

for the results you want , you have a type in

8310 no 30

It should be

8210 no 30.

A solution amongst others (I assume the input dataset is sorted w/r/t the variable id):

data have;

input Id dx $ test;

datalines;

8210 no 1

8210 yes 20

8210 no 30

8321 yes 11

8321 no 2

;

run;

data want;

do until (last.id);

set have;

by id;

output;

end;

do until (last.id);

set have;

by id;

output;

end;

do until (last.id);

set have;

by id;

output;

end;

run;

proc print data=want; run;

imcbczm · Posted 11-10-2015 12:26 PM

I have tried the similar code. It seems not right. BTW, I may have many unique values in 'test'.

thanks,

Z

hbi · Posted 11-10-2015 12:29 PM

Below is a PROC SQL version of the same. I think "want_version1" is what you are seeking.

Comment: I included "want_version2", but it removes duplicates and returns only distinct observations.

data have;
input ID dx $ test;
datalines;
8210 no   1
8210 yes 20
8210 no  30
8321 yes 11
8321 no   2
;
run;

/* want_version1 will give you exactly what you want */
PROC SQL;
  CREATE TABLE want_version1 AS 
  SELECT FirstQuery.ID
       , dx
       , test
  FROM (SELECT ID, dx
        FROM have) AS FirstQuery
  INNER JOIN (SELECT ID, test
              FROM have) AS SecondQuery
    ON FirstQuery.ID = SecondQuery.ID;
QUIT;

/* want_version2 is a different solution that removes duplicates */
PROC SQL;
  CREATE TABLE want_version2 AS 
  SELECT DISTINCT FirstQuery.ID
       , dx
       , test
  FROM (SELECT DISTINCT ID, dx
        FROM have) AS FirstQuery
  INNER JOIN (SELECT DISTINCT ID, test
              FROM have) AS SecondQuery
    ON FirstQuery.ID = SecondQuery.ID;
QUIT;

imcbczm · Posted 11-10-2015 07:17 PM

Great, thank you so much!

Astounding · Posted 11-10-2015 12:31 PM

In a way, this is an all combinations join so it's probably easier for SQL to handle:

proc sql;

create table want as select a.id, a.dx, b.test from have a, have b where a.id=b.id;

quit;

Good luck.

imcbczm · Posted 11-10-2015 07:19 PM

It works, thank you!

slchen · Posted 11-10-2015 12:32 PM

data want;
set have end=last ;
output;
if last then do;
do i=1 to nobs;
set have point=i nobs=nobs;
if dx='no' then dx='yes';
else dx='no';
output;
end;
stop;
end;
run;

imcbczm · Posted 11-10-2015 07:20 PM

sorry, not really what i want.

Steelers_In_DC · Posted 11-10-2015 03:40 PM

Here is another solutions:

data have;
input Id dx $ test;
datalines;
8210 no 1
8210 yes 20
8210 no 30
8321 yes 11
8321 no 2
;

proc sql;
create table want as
select /*distinct?*/
a.id,
a.dx,
b.test
from have a full join
have b on
a.id = b.id;

How to create repeated observations of two variables for each value within ID

Re: How to create repeated observations of two variables for each value within ID

Re: How to create repeated observations of two variables for each value within ID

Re: How to create repeated observations of two variables for each value within ID

Re: How to create repeated observations of two variables for each value within ID

Re: How to create repeated observations of two variables for each value within ID

Re: How to create repeated observations of two variables for each value within ID

Re: How to create repeated observations of two variables for each value within ID

Re: How to create repeated observations of two variables for each value within ID

Re: How to create repeated observations of two variables for each value within ID

Re: How to create repeated observations of two variables for each value within ID

SAS Innovate 2025: Register Now

SAS Training: Just a Click Away