Hi,
I have a dataset but want to create a new one which will repeat two variables (here are 'ID', 'dx') for each unique value in the third variable (here is 'test'). I guess it would be simple but I could not get it done at this moment.
For expample:
------------------
data have;
input Id dx $ test;
datalines;
8210 no 1
8210 yes 20
8310 no 30
8321 yes 11
8321 no 2
;
run;
---------------
I want:
----------
8210 no 1
8210 no 20
8210 no 30
8210 yes 1
8210 yes 20
8210 yes 30
8210 no 1
8210 no 20
8210 no 30
8321 yes 11
8321 yes 2
8321 no 11
8321 no 2
-----------------
Thank you for help!
Z
Below is a PROC SQL version of the same. I think "want_version1" is what you are seeking.
Comment: I included "want_version2", but it removes duplicates and returns only distinct observations.
data have;
input ID dx $ test;
datalines;
8210 no 1
8210 yes 20
8210 no 30
8321 yes 11
8321 no 2
;
run;
/* want_version1 will give you exactly what you want */
PROC SQL;
CREATE TABLE want_version1 AS
SELECT FirstQuery.ID
, dx
, test
FROM (SELECT ID, dx
FROM have) AS FirstQuery
INNER JOIN (SELECT ID, test
FROM have) AS SecondQuery
ON FirstQuery.ID = SecondQuery.ID;
QUIT;
/* want_version2 is a different solution that removes duplicates */
PROC SQL;
CREATE TABLE want_version2 AS
SELECT DISTINCT FirstQuery.ID
, dx
, test
FROM (SELECT DISTINCT ID, dx
FROM have) AS FirstQuery
INNER JOIN (SELECT DISTINCT ID, test
FROM have) AS SecondQuery
ON FirstQuery.ID = SecondQuery.ID;
QUIT;
for the results you want , you have a type in
8310 no 30
It should be
8210 no 30.
A solution amongst others (I assume the input dataset is sorted w/r/t the variable id):
data have;
input Id dx $ test;
datalines;
8210 no 1
8210 yes 20
8210 no 30
8321 yes 11
8321 no 2
;
run;
data want;
do until (last.id);
set have;
by id;
output;
end;
do until (last.id);
set have;
by id;
output;
end;
do until (last.id);
set have;
by id;
output;
end;
run;
proc print data=want; run;
I have tried the similar code. It seems not right. BTW, I may have many unique values in 'test'.
thanks,
Z
Below is a PROC SQL version of the same. I think "want_version1" is what you are seeking.
Comment: I included "want_version2", but it removes duplicates and returns only distinct observations.
data have;
input ID dx $ test;
datalines;
8210 no 1
8210 yes 20
8210 no 30
8321 yes 11
8321 no 2
;
run;
/* want_version1 will give you exactly what you want */
PROC SQL;
CREATE TABLE want_version1 AS
SELECT FirstQuery.ID
, dx
, test
FROM (SELECT ID, dx
FROM have) AS FirstQuery
INNER JOIN (SELECT ID, test
FROM have) AS SecondQuery
ON FirstQuery.ID = SecondQuery.ID;
QUIT;
/* want_version2 is a different solution that removes duplicates */
PROC SQL;
CREATE TABLE want_version2 AS
SELECT DISTINCT FirstQuery.ID
, dx
, test
FROM (SELECT DISTINCT ID, dx
FROM have) AS FirstQuery
INNER JOIN (SELECT DISTINCT ID, test
FROM have) AS SecondQuery
ON FirstQuery.ID = SecondQuery.ID;
QUIT;
In a way, this is an all combinations join so it's probably easier for SQL to handle:
proc sql;
create table want as select a.id, a.dx, b.test from have a, have b where a.id=b.id;
quit;
Good luck.
data want;
set have end=last ;
output;
if last then do;
do i=1 to nobs;
set have point=i nobs=nobs;
if dx='no' then dx='yes';
else dx='no';
output;
end;
stop;
end;
run;
Here is another solutions:
data have;
input Id dx $ test;
datalines;
8210 no 1
8210 yes 20
8210 no 30
8321 yes 11
8321 no 2
;
proc sql;
create table want as
select /*distinct?*/
a.id,
a.dx,
b.test
from have a full join
have b on
a.id = b.id;
Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.
Register today!Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.