I need to transpose each subset of subject. But after transpose I want each subject to have unique columns thus resulting in a wide data set. I don't think I can accomplish this with transpose alone.
data have;
input subject count start end change;
datalines;
12345 1 20 35 1
12345 1 36 39 2
12345 1 40 45 0
23456 2 15 19 0
23456 2 20 24 1
34567 3 33 44 1
34567 3 45 50 2
45678 1 15 18 0
45678 1 19 22 1
run;
So for example below if the first subject occupies columns st_1, ed_1, chg_1 through to st_3, ed_3, chg_3
Then the second subject will commence with st_4, ed_4, chg_4 through st_5, ed_5, chg_5 etc. But columns can be reused once the count variable resets to 1. Count resets after every 3 distinct subjects.
Desired output;
subject | count | st_1 | ed_1 | chg_1 | st_2 | ed_2 | chg_2 | st_3 | ed_3 | chg_3 | st_4 | ed_4 | chg_4 | st_5 | ed_5 | chg_5 | st_6 | ed_6 | chg_6 | st_7 | ed_7 | chg_7 |
12345 | 1 | 20 | 35 | 1 | 36 | 39 | 2 | 40 | 45 | 0 | ||||||||||||
23456 | 2 | 15 | 19 | 0 | 20 | 24 | 1 | |||||||||||||||
34567 | 3 | 33 | 44 | 1 | 45 | 50 | 2 | |||||||||||||||
45678 | 1 | 15 | 18 | 0 | 19 | 22 | 1 |
Any suggestions?
Since we want to preserve variable names, we need to do a transpose for each variable separately; we also need a preliminary step that creates the repeat count within each group, and a final macro to reorder the columns as wanted:
data have;
input subject count start end change;
datalines;
12345 1 20 35 1
12345 1 36 39 2
12345 1 40 45 0
23456 2 15 19 0
23456 2 20 24 1
34567 3 33 44 1
34567 3 45 50 2
45678 1 15 18 0
45678 1 19 22 1
;
run;
data int;
set have;
by count notsorted;
retain vcount group 0;
if first.count and count = 1
then vcount = 1;
else vcount + 1;
run;
proc transpose data=int out=want1 (drop=_name_) prefix=start;
by subject;
id vcount;
var start;
run;
proc transpose data=int out=want2 (drop=_name_) prefix=end;
by subject;
id vcount;
var end;
run;
proc transpose data=int out=want3 (drop=_name_) prefix=change;
by subject;
id vcount;
var change;
run;
data want;
merge
want1
want2
want3
;
by subject;
run;
proc sql noprint;
select max(vcount) into :vcount from int;
quit;
%macro order_vars;
data want;
format subject
%do i = 1 %to &vcount;
start&i
end&i
change&i
%end;
;
set want;
run;
%mend;
%order_vars
Since we want to preserve variable names, we need to do a transpose for each variable separately; we also need a preliminary step that creates the repeat count within each group, and a final macro to reorder the columns as wanted:
data have;
input subject count start end change;
datalines;
12345 1 20 35 1
12345 1 36 39 2
12345 1 40 45 0
23456 2 15 19 0
23456 2 20 24 1
34567 3 33 44 1
34567 3 45 50 2
45678 1 15 18 0
45678 1 19 22 1
;
run;
data int;
set have;
by count notsorted;
retain vcount group 0;
if first.count and count = 1
then vcount = 1;
else vcount + 1;
run;
proc transpose data=int out=want1 (drop=_name_) prefix=start;
by subject;
id vcount;
var start;
run;
proc transpose data=int out=want2 (drop=_name_) prefix=end;
by subject;
id vcount;
var end;
run;
proc transpose data=int out=want3 (drop=_name_) prefix=change;
by subject;
id vcount;
var change;
run;
data want;
merge
want1
want2
want3
;
by subject;
run;
proc sql noprint;
select max(vcount) into :vcount from int;
quit;
%macro order_vars;
data want;
format subject
%do i = 1 %to &vcount;
start&i
end&i
change&i
%end;
;
set want;
run;
%mend;
%order_vars
I found a better method, using a double transpose:
(start with the int dataset from my previous post)
proc transpose data=int out=int2;
by subject vcount;
var start end change;
run;
data int3;
set int2;
_name_ = cats(_name_,strip(put(vcount,best.)));
run;
proc transpose data=int3 out=want (drop=_name_);
by subject;
var col1;
id _name_;
run;
no macro programming needed, everything sorts out by itself.
your solution works great for the sample data i provided. I think I always oversimplify my sample data. I tried to get to work for my data and it gives me errors. I have posted more realistic data below now. The count is assigned for first 15 unique (cohort,subject) and then it resets.
data have;
input cohort subject count start end change;
datalines;
xxx_020 679910101 1 0 43 0
xxx_040 119910102 2 0 60 0
xxx_040 679910102 3 0 43 0
xxx_060 119930101 4 0 15 0
xxx_060 119930101 4 15 35 1
xxx_060 119930101 4 35 50 0
xxx_060 119930101 4 50 57 1
xxx_060 119930101 4 57 64 0
xxx_060 119930101 4 64 71 1
xxx_060 119930101 4 71 78 0
xxx_060 119930101 4 78 84 1
xxx_060 119930101 4 84 99 0
xxx_060 119930101 4 99 141 1
xxx_060 119930101 4 141 148 2
xxx_060 119930101 4 148 168 1
xxx_060 119930101 4 168 176 2
xxx_060 119930101 4 176 197 1
xxx_060 119930101 4 197 204 2
xxx_060 119930101 4 204 225 1
xxx_060 119930101 4 225 268 2
xxx_060 139910101 5 0 23 0
xxx_100 119910103 6 0 22 0
xxx_100 119910103 6 22 36 1
xxx_100 119910103 6 36 58 2
xxx_100 119910103 6 58 67 0
xxx_100 119910103 6 67 71 1
xxx_100 119910103 6 71 78 0
xxx_100 119910103 6 78 87 1
xxx_100 119910103 6 87 108 0
xxx_100 119910103 6 108 134 1
xxx_100 119910103 6 134 149 0
xxx_100 119910103 6 149 155 2
xxx_100 119910103 6 155 169 0
xxx_100 119910103 6 169 184 1
xxx_100 119910103 6 184 190 2
xxx_100 119910103 6 190 197 0
xxx_100 119910103 6 197 211 1
xxx_100 119910103 6 211 218 0
xxx_100 119910103 6 218 233 1
xxx_100 139910102 7 0 163 0
xxx_100 139910102 7 163 169 2
xxx_100 139910102 7 169 176 1
xxx_100 139910102 7 176 183 0
xxx_100 139910102 7 183 190 2
xxx_100 139910102 7 190 205 1
xxx_100 139910102 7 205 211 2
xxx_100 139910102 7 211 218 0
xxx_100 139910102 7 218 233 2
xxx_100 139910102 7 233 234 0
xxx_140 119910104 8 0 7 0
xxx_140 119910104 8 7 15 2
xxx_140 119910104 8 15 22 3
xxx_140 119910104 8 22 36 2
xxx_140 119910104 8 36 38 3
xxx_140 119910104 8 38 43 4
xxx_140 119910104 8 43 50 1
xxx_140 119910104 8 50 57 2
xxx_140 119910104 8 57 64 3
xxx_140 119910104 8 64 78 1
xxx_140 119910104 8 78 92 2
xxx_140 119910104 8 92 113 3
xxx_140 119910104 8 113 120 2
xxx_140 119910104 8 120 148 0
xxx_140 119910104 8 148 149 3
xxx_140 119910105 9 0 15 0
xxx_140 119910105 9 15 29 1
xxx_140 119910105 9 29 36 2
xxx_140 119910105 9 36 51 1
xxx_140 119910105 9 51 97 0
xxx_140 119910106 10 0 16 0
xxx_140 119910106 10 16 30 2
xxx_140 119910106 10 30 37 4
xxx_140 119910106 10 37 52 0
xxx_140 139910104 11 0 8 0
xxx_140 139910104 11 8 10 3
xxx_140 139910104 11 10 17 4
xxx_140 139910104 11 17 20 2
xxx_140 139910104 11 20 29 0
xxx_140 139910104 11 29 31 3
xxx_140 139910104 11 31 34 0
xxx_140 139910105 12 0 8 0
xxx_140 139910105 12 8 22 2
xxx_140 139910105 12 22 26 3
xxx_140 139910105 12 26 36 0
xxx_140 139910105 12 36 43 2
xxx_140 139910105 12 43 50 3
xxx_140 139910105 12 50 86 2
xxx_140 139910105 12 86 87 0
xxx_140 139910106 13 0 8 0
xxx_140 139910106 13 8 15 1
xxx_140 139910106 13 15 23 2
xxx_140 139910106 13 23 30 1
xxx_140 139910106 13 30 44 2
xxx_140 139910106 13 44 45 3
xxx_140 679910104 14 0 7 0
xxx_140 679910104 14 7 28 2
xxx_140 679910104 14 28 36 3
xxx_140 679910104 14 36 42 0
xxx_140 679910104 14 42 49 2
xxx_140 679910104 14 49 105 0
xxx_140 679910104 14 105 112 2
xxx_140 679910104 14 112 126 0
xxx_140 679910104 14 126 133 1
xxx_140 679910104 14 133 140 2
xxx_140 679910104 14 140 148 0
xxx_140 679910105 15 0 24 0
xxx_140 679910105 15 24 28 1
xxx_140 679910105 15 28 42 2
xxx_140 679910105 15 42 92 0
xxx_160 119910107 1 0 8 0
xxx_160 119910107 1 8 15 1
xxx_160 119910107 1 15 22 2
xxx_160 119910107 1 22 25 0
xxx_160 119910107 1 25 29 1
xxx_160 119910107 1 29 30 0
xxx_160 119910108 2 0 1 0
xxx_160 119910108 2 1 10 1
xxx_160 119910108 2 10 17 3
xxx_160 119910108 2 17 39 2
xxx_160 139910107 3 0 17 0
xxx_160 139910107 3 17 31 2
xxx_160 139910107 3 31 38 0
xxx_160 139910107 3 38 43 2
xxx_160 139910107 3 43 48 0
;
run;
To me it looks like you have to use cohort wherever you also use subject.
Please fix your data step code so that it works, I get all kinds of messages and incomplete records. Make sure your delimiters are consistent (either all blanks or all tabs). I can only test code against solid data.
data have;
input subject count start end change;
datalines;
12345 1 20 35 1
12345 1 36 39 2
12345 1 40 45 0
23456 2 15 19 0
23456 2 20 24 1
34567 3 33 44 1
34567 3 45 50 2
45678 1 15 18 0
45678 1 19 22 1
;
run;
proc sql noprint;
select max(n) into : n
from (select count(*) as n from have group by subject,count);
quit;
proc summary data=have;
by subject count;
output out=want idgroup(out[&n] (start end change)=);
run;
proc print noobs;run;
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.