Hopefully, this should be the last change to the requirement i have asked on this post. I would like to transpose the data. I tried proc transpose by ID and SP_MD. But the proc transpose did not work for me. I believe DO Loops and arrays are the best bet, but it has not worked well for me. I worked on the last Do loop and tried modifying what art had posted, but i keep getting error messages. I would like do have the data set set up just like i did on the model sheet the first sheet in the Excel file.
Thanks,
Thomas
Thomas,
One step you could take is to eliminate the DATA steps. The WHERE statements could be added to PROC TRANSPOSE.
But I'm a little curious about how you intend to put the data sets together at the end. You can't do what SAS calls a "merge" because you'll have the same variable names in both data sets. One set of variables will overwrite the other. So do you want to combine them side by side (but changing the variable names for MD vs. SP), or do you want to keep the same variable names and create twice as many observations (while keeping SP_vs_MD to differentiate)?
Use proc tranpsose instead of an array.
See a good example w/ code here
Ahh, much better now that you fixed the output structure. This utilizes a neat feature that Art used last week:
data before;
set before;
_name_ = compress( 'st' || put(response,1.) || question, ' .');
run;
proc transpose data=before out=after;
by idnum;
var response;
run;
If you have responses that could be 2 digits long, replace the 1. format with the z2. format.
Good luck.
or, yet another way of doing it:
data have;
input IDNUM $ Question $ Response;
cards;
001 Q.1. 1
001 Q.2. 3
001 Q.5. 4
002 Q.1. 0
002 Q.3. 3
002 Q.4. 2
003 Q.5. 3
003 Q.1. 0
003 Q.2. 5
;
data temp;
set have;
question=compress(question,'.');
run;
proc transpose data=temp out=want (drop=_:) prefix=st1;
by idnum;
var response;
id question;
idlabel question;
run;
Art,
Yes, depending on what is really needed as output I probably should have used:
_name_ = 'st1' || compress(question, ' .');
It's not 100% clear to me whether the new variable names should start with "st1" or whether the 3rd character should vary depending on the response. Given the follow-up information, it should probably remain "st1".
astounding. I appreciate your help. I missed out a critical step. Some of the IDNUM have 3-4 Q.1. How would i average the response in the code?
It all really depends on what you want to do with the data.
But you can use PROC SUMMARY to create an average of each question.
proc summary data=have nway missing ;
class idnum question ;
var response;
output out=temp mean= ;
run;
If you combine our solutions, it should work.
First, apply Tom's summary.
Then use Art's code (or my revised code with the updated assignment for _name_), using the output data set from PROC SUMMARY as the input to PROC TRANSPOSE.
Here's a more complete version:
proc summary data=before nway missing;
class idnum question;
var response;
where question ne 'CanM';
output out=temp (keep=idnum question response) mean=;
run;
data temp;
set temp;
_name_ = 'st1' || compress(question, ' .');
run;
proc transpose data=before out=after;
by idnum;
var response;
run;
I don't understand why you got duplicate ids, but Astounding's approach should avoid that. How about if you try:
data have;
input IDNUM $ Question $ Response;
cards;
001 Q.1. 1
001 Q.1. 2
001 Q.1. 3
001 Q.2. 3
001 Q.5. 4
002 Q.1. 0
002 Q.3. 3
002 Q.4. 2
002 CanM 4
003 Q.5. 3
003 Q.1. 0
003 Q.2. 5
;
proc summary data=have nway;
class idnum question;
var response;
output out=averages (drop=_:) mean=;
run;
data averages;
set averages;
_name_ = 'st1' || compress(question, ' .');
run;
proc transpose data=averages out=want (drop=_:);
by idnum;
var response;
run;
data want;
retain idnum st1Q1-st1Q5 st1CanM;
set want;
run;
Art,
Looks like our solutions are converging.
Those duplicate ids must have come from a test that didn't use PROC SUMMARY first.
astounding i have a question. This program that you and art created only calculates the average, but i need standard deviation too?cos i cannot use the proc means step after this, as it will do an averge over an average.
PROC MEANS can generate standard deviations in the output data set, as well as means.
Part of the answer depends on what you want your report to look like. Should the standard deviations form a second row for each IDNUM, or should they form additional columns on the existing row?
Also note, you will have a lot of missing standard deviations when there is only one response.
Thomas,
I'm afraid that work beckons, and I won't be able to give you a complete answer here, just a few notes.
First, it should be possible to run all 7 STs at once. You might skip this at first, getting it to work for a single ST as a starting point. But it would certainly be possible to code statements like these at the proper points:
_name_ = 'st' || put(st,1.) || compress(question, ' .');
class st idnum question;
But without attempting that hurdle, first, in the PROC SUMMARY, modify the OUTPUT statement to get both statistics:
output out=temp (keep=idnum question mean std) mean=mean std=std;
Then in the DATA step that is needed anyway, construct the combination variable:
data temp;
set temp;
_name_ = 'st1' || compress(question, ' .');
new_response = put(mean, 3.1) || '+-' || put(std, 3.1);
run;
Then use new_response instead of response within PROC TRANSPOSE.
Good luck.
Thomas,
There is probably an easier way to accomplish this, but I think that the result at least is what you are looking for:
data have;
input IDNUM $ Question $ Response;
cards;
001 Q.1. 1
001 Q.1. 2
001 Q.1. 3
001 Q.2. 3
001 Q.2. 4
002 Q.1. 0
002 Q.1. 1
002 Q.3. 3
002 Q.3. 2
002 Q.4. 2
002 Q.4. 4
002 CanM 4
002 CanM 5
003 Q.5. 3
003 Q.5. 1
003 Q.1. 0
003 Q.1. 2
003 Q.2. 5
003 Q.2. 7
;
proc summary data=have nway;
class idnum question;
var response;
output out=averages (drop=_:) mean= std= /autoname;
run;
data averages;
set averages;
_name_ = 'sm1' || compress(question, ' .');
run;
proc transpose data=averages (rename=(response_mean=response))
out=part1 (drop=_:);
by idnum;
var response;
run;
data averages;
set averages;
_name_ = 'sd1' || compress(question, ' .');
run;
proc transpose data=averages
out=part2 (drop=_:);
by idnum;
var response_stddev;
run;
proc sql noprint;
select 'st'||trim(substr(name,3))||'='||
'catt(put('||trim(name)||',4.2),"+",put(sd'||
trim(substr(name,3))||',4.2));',
'if strip(st'||trim(substr(name,3))||
')=".+ ." then call missing('||
'st'||trim(substr(name,3))||');',
'length st'||trim(substr(name,3))||' $10;'
into :calcs separated by ' ',
:miss separated by ' ',
:lengths separated by ' '
from dictionary.columns
where libname="WORK" and
memname="PART1" and
name ne "IDNUM"
;
quit;
data want (drop=sm: sd:);
&lengths.;
set part1;
set part2 (drop=idnum);
&calcs.;
&miss.;
run;
Thomas,
If you look at the code I sent, last, the creation of _name_ was in a different place:
proc summary data=have nway;
class idnum question;
var response;
output out=averages (drop=_:) mean=;
run;
data averages;
set averages;
_name_ = 'st1' || compress(question, ' .');
run;
proc transpose data=averages out=want (drop=_:);
by idnum;
var response;
run;
data want;
retain idnum st1Q1-st1Q5 st1CanM;
set want;
run;
I'll take a look at your latest need now.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.