Hi,
I have a dataset but want to create a new one which will repeat two variables (here are 'ID', 'dx') for each unique value in the third variable (here is 'test'). I guess it would be simple but I could not get it done at this moment.
For expample:
------------------
data have;
input Id dx $ test;
datalines;
8210 no 1
8210 yes 20
8310 no 30
8321 yes 11
8321 no 2
;
run;
---------------
I want:
----------
8210 no 1
8210 no 20
8210 no 30
8210 yes 1
8210 yes 20
8210 yes 30
8210 no 1
8210 no 20
8210 no 30
8321 yes 11
8321 yes 2
8321 no 11
8321 no 2
-----------------
Thank you for help!
Z
Below is a PROC SQL version of the same. I think "want_version1" is what you are seeking.
Comment: I included "want_version2", but it removes duplicates and returns only distinct observations.
data have;
input ID dx $ test;
datalines;
8210 no 1
8210 yes 20
8210 no 30
8321 yes 11
8321 no 2
;
run;
/* want_version1 will give you exactly what you want */
PROC SQL;
CREATE TABLE want_version1 AS
SELECT FirstQuery.ID
, dx
, test
FROM (SELECT ID, dx
FROM have) AS FirstQuery
INNER JOIN (SELECT ID, test
FROM have) AS SecondQuery
ON FirstQuery.ID = SecondQuery.ID;
QUIT;
/* want_version2 is a different solution that removes duplicates */
PROC SQL;
CREATE TABLE want_version2 AS
SELECT DISTINCT FirstQuery.ID
, dx
, test
FROM (SELECT DISTINCT ID, dx
FROM have) AS FirstQuery
INNER JOIN (SELECT DISTINCT ID, test
FROM have) AS SecondQuery
ON FirstQuery.ID = SecondQuery.ID;
QUIT;
for the results you want , you have a type in
8310 no 30
It should be
8210 no 30.
A solution amongst others (I assume the input dataset is sorted w/r/t the variable id):
data have;
input Id dx $ test;
datalines;
8210 no 1
8210 yes 20
8210 no 30
8321 yes 11
8321 no 2
;
run;
data want;
do until (last.id);
set have;
by id;
output;
end;
do until (last.id);
set have;
by id;
output;
end;
do until (last.id);
set have;
by id;
output;
end;
run;
proc print data=want; run;
I have tried the similar code. It seems not right. BTW, I may have many unique values in 'test'.
thanks,
Z
Below is a PROC SQL version of the same. I think "want_version1" is what you are seeking.
Comment: I included "want_version2", but it removes duplicates and returns only distinct observations.
data have;
input ID dx $ test;
datalines;
8210 no 1
8210 yes 20
8210 no 30
8321 yes 11
8321 no 2
;
run;
/* want_version1 will give you exactly what you want */
PROC SQL;
CREATE TABLE want_version1 AS
SELECT FirstQuery.ID
, dx
, test
FROM (SELECT ID, dx
FROM have) AS FirstQuery
INNER JOIN (SELECT ID, test
FROM have) AS SecondQuery
ON FirstQuery.ID = SecondQuery.ID;
QUIT;
/* want_version2 is a different solution that removes duplicates */
PROC SQL;
CREATE TABLE want_version2 AS
SELECT DISTINCT FirstQuery.ID
, dx
, test
FROM (SELECT DISTINCT ID, dx
FROM have) AS FirstQuery
INNER JOIN (SELECT DISTINCT ID, test
FROM have) AS SecondQuery
ON FirstQuery.ID = SecondQuery.ID;
QUIT;
In a way, this is an all combinations join so it's probably easier for SQL to handle:
proc sql;
create table want as select a.id, a.dx, b.test from have a, have b where a.id=b.id;
quit;
Good luck.
data want;
set have end=last ;
output;
if last then do;
do i=1 to nobs;
set have point=i nobs=nobs;
if dx='no' then dx='yes';
else dx='no';
output;
end;
stop;
end;
run;
Here is another solutions:
data have;
input Id dx $ test;
datalines;
8210 no 1
8210 yes 20
8210 no 30
8321 yes 11
8321 no 2
;
proc sql;
create table want as
select /*distinct?*/
a.id,
a.dx,
b.test
from have a full join
have b on
a.id = b.id;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.