I need to transpose each subset of subject. But after transpose I want each subject to have unique columns thus resulting in a wide data set. I don't think I can accomplish this with transpose alone.
data have;
input subject count start end change;
datalines;
12345 1 20 35 1
12345 1 36 39 2
12345 1 40 45 0
23456 2 15 19 0
23456 2 20 24 1
34567 3 33 44 1
34567 3 45 50 2
45678 1 15 18 0
45678 1 19 22 1
run;
So for example below if the first subject occupies columns st_1, ed_1, chg_1 through to st_3, ed_3, chg_3
Then the second subject will commence with st_4, ed_4, chg_4 through st_5, ed_5, chg_5 etc. But columns can be reused once the count variable resets to 1. Count resets after every 3 distinct subjects.
Desired output;
subject | count | st_1 | ed_1 | chg_1 | st_2 | ed_2 | chg_2 | st_3 | ed_3 | chg_3 | st_4 | ed_4 | chg_4 | st_5 | ed_5 | chg_5 | st_6 | ed_6 | chg_6 | st_7 | ed_7 | chg_7 |
12345 | 1 | 20 | 35 | 1 | 36 | 39 | 2 | 40 | 45 | 0 | ||||||||||||
23456 | 2 | 15 | 19 | 0 | 20 | 24 | 1 | |||||||||||||||
34567 | 3 | 33 | 44 | 1 | 45 | 50 | 2 | |||||||||||||||
45678 | 1 | 15 | 18 | 0 | 19 | 22 | 1 |
Any suggestions?
Since we want to preserve variable names, we need to do a transpose for each variable separately; we also need a preliminary step that creates the repeat count within each group, and a final macro to reorder the columns as wanted:
data have;
input subject count start end change;
datalines;
12345 1 20 35 1
12345 1 36 39 2
12345 1 40 45 0
23456 2 15 19 0
23456 2 20 24 1
34567 3 33 44 1
34567 3 45 50 2
45678 1 15 18 0
45678 1 19 22 1
;
run;
data int;
set have;
by count notsorted;
retain vcount group 0;
if first.count and count = 1
then vcount = 1;
else vcount + 1;
run;
proc transpose data=int out=want1 (drop=_name_) prefix=start;
by subject;
id vcount;
var start;
run;
proc transpose data=int out=want2 (drop=_name_) prefix=end;
by subject;
id vcount;
var end;
run;
proc transpose data=int out=want3 (drop=_name_) prefix=change;
by subject;
id vcount;
var change;
run;
data want;
merge
want1
want2
want3
;
by subject;
run;
proc sql noprint;
select max(vcount) into :vcount from int;
quit;
%macro order_vars;
data want;
format subject
%do i = 1 %to &vcount;
start&i
end&i
change&i
%end;
;
set want;
run;
%mend;
%order_vars
Since we want to preserve variable names, we need to do a transpose for each variable separately; we also need a preliminary step that creates the repeat count within each group, and a final macro to reorder the columns as wanted:
data have;
input subject count start end change;
datalines;
12345 1 20 35 1
12345 1 36 39 2
12345 1 40 45 0
23456 2 15 19 0
23456 2 20 24 1
34567 3 33 44 1
34567 3 45 50 2
45678 1 15 18 0
45678 1 19 22 1
;
run;
data int;
set have;
by count notsorted;
retain vcount group 0;
if first.count and count = 1
then vcount = 1;
else vcount + 1;
run;
proc transpose data=int out=want1 (drop=_name_) prefix=start;
by subject;
id vcount;
var start;
run;
proc transpose data=int out=want2 (drop=_name_) prefix=end;
by subject;
id vcount;
var end;
run;
proc transpose data=int out=want3 (drop=_name_) prefix=change;
by subject;
id vcount;
var change;
run;
data want;
merge
want1
want2
want3
;
by subject;
run;
proc sql noprint;
select max(vcount) into :vcount from int;
quit;
%macro order_vars;
data want;
format subject
%do i = 1 %to &vcount;
start&i
end&i
change&i
%end;
;
set want;
run;
%mend;
%order_vars
I found a better method, using a double transpose:
(start with the int dataset from my previous post)
proc transpose data=int out=int2;
by subject vcount;
var start end change;
run;
data int3;
set int2;
_name_ = cats(_name_,strip(put(vcount,best.)));
run;
proc transpose data=int3 out=want (drop=_name_);
by subject;
var col1;
id _name_;
run;
no macro programming needed, everything sorts out by itself.
your solution works great for the sample data i provided. I think I always oversimplify my sample data. I tried to get to work for my data and it gives me errors. I have posted more realistic data below now. The count is assigned for first 15 unique (cohort,subject) and then it resets.
data have;
input cohort subject count start end change;
datalines;
xxx_020 679910101 1 0 43 0
xxx_040 119910102 2 0 60 0
xxx_040 679910102 3 0 43 0
xxx_060 119930101 4 0 15 0
xxx_060 119930101 4 15 35 1
xxx_060 119930101 4 35 50 0
xxx_060 119930101 4 50 57 1
xxx_060 119930101 4 57 64 0
xxx_060 119930101 4 64 71 1
xxx_060 119930101 4 71 78 0
xxx_060 119930101 4 78 84 1
xxx_060 119930101 4 84 99 0
xxx_060 119930101 4 99 141 1
xxx_060 119930101 4 141 148 2
xxx_060 119930101 4 148 168 1
xxx_060 119930101 4 168 176 2
xxx_060 119930101 4 176 197 1
xxx_060 119930101 4 197 204 2
xxx_060 119930101 4 204 225 1
xxx_060 119930101 4 225 268 2
xxx_060 139910101 5 0 23 0
xxx_100 119910103 6 0 22 0
xxx_100 119910103 6 22 36 1
xxx_100 119910103 6 36 58 2
xxx_100 119910103 6 58 67 0
xxx_100 119910103 6 67 71 1
xxx_100 119910103 6 71 78 0
xxx_100 119910103 6 78 87 1
xxx_100 119910103 6 87 108 0
xxx_100 119910103 6 108 134 1
xxx_100 119910103 6 134 149 0
xxx_100 119910103 6 149 155 2
xxx_100 119910103 6 155 169 0
xxx_100 119910103 6 169 184 1
xxx_100 119910103 6 184 190 2
xxx_100 119910103 6 190 197 0
xxx_100 119910103 6 197 211 1
xxx_100 119910103 6 211 218 0
xxx_100 119910103 6 218 233 1
xxx_100 139910102 7 0 163 0
xxx_100 139910102 7 163 169 2
xxx_100 139910102 7 169 176 1
xxx_100 139910102 7 176 183 0
xxx_100 139910102 7 183 190 2
xxx_100 139910102 7 190 205 1
xxx_100 139910102 7 205 211 2
xxx_100 139910102 7 211 218 0
xxx_100 139910102 7 218 233 2
xxx_100 139910102 7 233 234 0
xxx_140 119910104 8 0 7 0
xxx_140 119910104 8 7 15 2
xxx_140 119910104 8 15 22 3
xxx_140 119910104 8 22 36 2
xxx_140 119910104 8 36 38 3
xxx_140 119910104 8 38 43 4
xxx_140 119910104 8 43 50 1
xxx_140 119910104 8 50 57 2
xxx_140 119910104 8 57 64 3
xxx_140 119910104 8 64 78 1
xxx_140 119910104 8 78 92 2
xxx_140 119910104 8 92 113 3
xxx_140 119910104 8 113 120 2
xxx_140 119910104 8 120 148 0
xxx_140 119910104 8 148 149 3
xxx_140 119910105 9 0 15 0
xxx_140 119910105 9 15 29 1
xxx_140 119910105 9 29 36 2
xxx_140 119910105 9 36 51 1
xxx_140 119910105 9 51 97 0
xxx_140 119910106 10 0 16 0
xxx_140 119910106 10 16 30 2
xxx_140 119910106 10 30 37 4
xxx_140 119910106 10 37 52 0
xxx_140 139910104 11 0 8 0
xxx_140 139910104 11 8 10 3
xxx_140 139910104 11 10 17 4
xxx_140 139910104 11 17 20 2
xxx_140 139910104 11 20 29 0
xxx_140 139910104 11 29 31 3
xxx_140 139910104 11 31 34 0
xxx_140 139910105 12 0 8 0
xxx_140 139910105 12 8 22 2
xxx_140 139910105 12 22 26 3
xxx_140 139910105 12 26 36 0
xxx_140 139910105 12 36 43 2
xxx_140 139910105 12 43 50 3
xxx_140 139910105 12 50 86 2
xxx_140 139910105 12 86 87 0
xxx_140 139910106 13 0 8 0
xxx_140 139910106 13 8 15 1
xxx_140 139910106 13 15 23 2
xxx_140 139910106 13 23 30 1
xxx_140 139910106 13 30 44 2
xxx_140 139910106 13 44 45 3
xxx_140 679910104 14 0 7 0
xxx_140 679910104 14 7 28 2
xxx_140 679910104 14 28 36 3
xxx_140 679910104 14 36 42 0
xxx_140 679910104 14 42 49 2
xxx_140 679910104 14 49 105 0
xxx_140 679910104 14 105 112 2
xxx_140 679910104 14 112 126 0
xxx_140 679910104 14 126 133 1
xxx_140 679910104 14 133 140 2
xxx_140 679910104 14 140 148 0
xxx_140 679910105 15 0 24 0
xxx_140 679910105 15 24 28 1
xxx_140 679910105 15 28 42 2
xxx_140 679910105 15 42 92 0
xxx_160 119910107 1 0 8 0
xxx_160 119910107 1 8 15 1
xxx_160 119910107 1 15 22 2
xxx_160 119910107 1 22 25 0
xxx_160 119910107 1 25 29 1
xxx_160 119910107 1 29 30 0
xxx_160 119910108 2 0 1 0
xxx_160 119910108 2 1 10 1
xxx_160 119910108 2 10 17 3
xxx_160 119910108 2 17 39 2
xxx_160 139910107 3 0 17 0
xxx_160 139910107 3 17 31 2
xxx_160 139910107 3 31 38 0
xxx_160 139910107 3 38 43 2
xxx_160 139910107 3 43 48 0
;
run;
To me it looks like you have to use cohort wherever you also use subject.
Please fix your data step code so that it works, I get all kinds of messages and incomplete records. Make sure your delimiters are consistent (either all blanks or all tabs). I can only test code against solid data.
data have;
input subject count start end change;
datalines;
12345 1 20 35 1
12345 1 36 39 2
12345 1 40 45 0
23456 2 15 19 0
23456 2 20 24 1
34567 3 33 44 1
34567 3 45 50 2
45678 1 15 18 0
45678 1 19 22 1
;
run;
proc sql noprint;
select max(n) into : n
from (select count(*) as n from have group by subject,count);
quit;
proc summary data=have;
by subject count;
output out=want idgroup(out[&n] (start end change)=);
run;
proc print noobs;run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.