Need to transpose an imported data set using only DATA step (if you wonder why it is required for class)
I have the original dataset with patients and visits
it is required to create a new set with the weights separated by visit, while using the retain option it does not program for missing values and maintains the value until a patient with more visits appears.
I have tried the null statement, missing statement, changing the visit value, changing weight per visit to missing if value is missing, and also setting weight value to missing
Current code looks something like this
DATA fat (keep= patient gender group weight1 weight2 weight3 weight4);
set icdb.clinical;
by patient visit;
if visit = '1' then weight1 = weight;
else if visit = '2' then weight2 = weight;
else if visit = '3' then weight3 = weight;
else if visit = '4' then weight4 = weight;
if last.patient then output;
retain weight1 weight2 weight3 weight4;*keeping the new variables until next patient*;
RUN;
Besides of the need to set all the retained variable to missing after the output statement...
if last.patient then do; output; call missing(of weight1-weight4); end;
...consider if you need to transpose your data at all.
Keeping the data in a long and narrow structure is often better and a lot of SAS Proc's are very "happy" with by group processing and categorical variables for grouping (often defined via a CLASS statement in the Proc).
After you output, you must also set the 4 weight variables to missing.
Besides of the need to set all the retained variable to missing after the output statement...
if last.patient then do; output; call missing(of weight1-weight4); end;
...consider if you need to transpose your data at all.
Keeping the data in a long and narrow structure is often better and a lot of SAS Proc's are very "happy" with by group processing and categorical variables for grouping (often defined via a CLASS statement in the Proc).
this is what I end up doing to restart the retained value, homework are small exercises to test what was learned in class during the week. In this case it was output and retain functions, arrays are coming on the next unit so it will simplify doing this long process
Understand this is requested for class.
Show what the expected result actually should look like for that example as I am not understanding.
I used to get into trouble with teachers by asking "why" when assigned to do something I was pretty sure was, let us say 'sub-optimal' at best, and would ask where the proposed changed structure is better.
If the word "spreadsheet" or "Excel" is in the answer run screaming into the night.
I really hope that your teacher explains, that transposing perfectly formed data is something to be avoided.
Have you worked with arrays? This is a perfect scenario to use them, because the variable "visit" can be used as index.
/* Array version */
data fat(keep= patient gender group weight1 weight2 weight3 weight4);
set icdb.clinical;
by patient; /* you have multiple obs per patient, not per visit */
length weight1-weight4 8;
retain weight1-weight4;
array weights [4] weight1-weight4;
if first.patient then do;
call missing(of weights); /* reset all weights */
end;
weights[visit] = weight;
if last.patient then do;
output;
end;
run;
Basically, your teacher wants you to recreate what PROC TRANSPOSE does in a long-to-wide transformation.
This requires two steps:
In your case, we first need to determine the maximum number of visits present in the dataset:
proc sql noprint;
select max(visit) into :maxvisit trimmed from have;
quit;
Then you use an array in the data step:
data want;
if 0 then set have; /* retrieves variable metadata */
array weights {&maxvisit.} weight1-weight&maxvisit.;
do until (last.patient);
set have;
by patient;
weights{visit} = weight;
end;
keep patient gender weight1-weight&maxvisit.;
run;
Using a DO loop for each patient makes sure that the variables in the array are set to missing before each patient.
This code is untested; for tested code, please provide source data in a data step with datalines, so we can recreate your dataset with a simple copy/paste and submit. Do not post pictures of data.
Like this:
data have;
input patient visit v_date :date9. weight gender $;
format v_date yymmdd10.;
datalines;
1 1 11aug2006 158 Female
1 2 18sep2006 160 Female
;
My footnotes contain a link to a macro that can convert an existing dataset to such code automatically.
Use the "little running man" button next to the one indicated for posting SAS code:
The indicated button is for logs and other text where the content and formatting must not be changed upon posting.
PS and if your teacher wants you to actually not even use an array, then a step that works without losing data (as you can't know the maximum number of visits beforehand in the real world) will need macro coding to dynamically create the assignment statements for the weightX variables:
%macro transpose;
data want;
if 0 then set have;
do until (last.patient);
set have;
by patient;
select (visit);
%do i = 1 %to &maxvisit.;
when (&i) weight&i. = weight;
%end;
end;
end;
keep patient gender weight1-weight&maxvisit.;
run;
%mend;
%transpose
This code is tested based on the made-up data.
Perfectly understandable that I would need to add the counter in the real world.
As it was just a little set I was able to see it before working on it, I have learnt so much about arrays with this community, this lesson was about OUTPUT and RETAIN so unfortunately the array is not an option until the next lesson. Also was required to be done within a single data step. The then-do did the trick for me
You can easily un-macro the code to work with a fixed set:
data want;
if 0 then set have;
do until (last.patient);
set have;
by patient;
select (visit);
when (1) weight1 = weight;
when (2) weight2 = weight;
when (3) weight3 = weight;
when (4) weight4 = weight;
end;
end;
keep patient gender weight1-weight4;
run;
The nice thing about reading a group in a DO loop is that you do not have to worry when to reset the variables to missing, as the data step does it for you.
I was going to make a comment about how this assignment wants the user to do things the hard way, and then achieve a data structure that is harder to work with than the original data structure. But I'm not going to say that.
Try my MERGE skill:
https://support.sas.com/resources/papers/proceedings15/2785-2015.pdf
data have;
input patient visit v_date :date9. weight gender $;
format v_date yymmdd10.;
datalines;
1 1 11aug2006 158 Female
1 2 18sep2006 160 Female
;
proc sql;
select distinct catt('have(where=(visit=',visit,')
rename=(weight=weight_',visit,' v_date=v_date_',visit,'))') into : merge separated by ' '
from have;
quit;
data want;
merge &merge;
by patient gender;
drop visit;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.