Hello all,
Many thanks to help me to transfer the below design to the expected design.
record_id event Q1…….. Q24
1 base
1 pre_arm
1 post_arm
2 base
2 pre_arm
2 post_arm
.
.
.
20 base
20 pre_arm
20 post_arm
Expected design:
record_id event b_Q1……..b_Q24 event pre_Q1……..pre_Q24 event post_Q1……..post_Q24
1 base pre_arm post_arm
2
.
.
.
20
I think it is not only proc transpose, not sure.
Thank you in advance
data have;
input record_id event $ Q1 $ Q2 $;
cards;
1 base 1 1
1 pre_arm 2 2
1 post_arm 3 3
2 base 3 3
2 pre_arm 2 2
2 post_arm 1 1
;
proc sql noprint;
select distinct catt('have(where=(record_id=',record_id,' and event_',event,'="',event,'")
rename=(event=event_',event,' Q1=',event,'_Q1 Q2=',event,'_Q2))') into :merge separated by ' '
from have;
quit;
data want;
merge &merge.;
by record_id;
run;
Data in long form is almost always preferable to data in wide format. Preferable because almost all SAS PROCs are designed to work on long data sets. So, how to do this? Don't do it, not only is that easier but you will be better off. Maxim 19.
What possible benefit is there of this wide format that you want compared to the long format? What is the next step in your work, is it some sort of data analysis, plotting, graphings, creating a table? Please tell us in detail so we can best advise you.
Thank you for your help; this is a survey questionnaire with a Likert scale. I wanted to run the Wilcoxson sign rank test using proc univariate to do a few comparisons, such as comparing pre_arm with post_arm.
In the first design, if I create a column named total, which is the Sum(of Q1-Q24) for each patient_id, then how do I calculate (total score of post_arm – total score of pre_arm) for each id to place that in proc univariate?
but with the second design, I can create two columns named a total of pre_arm, which is Sum(of preQ1-preQ24), and a total of post_arm, which is Sum(of postQ1-postQ24), then calculate the difference b/n these measures to place that in proc univariate.
It seems that there would be easier way with long format, is n’t it?
From the long dataset I proposed, you need one query:
proc sql;
create table want as
select
record_id,
sum(case when event="pre_arm" then col1 else 0 end) as pre_sum,
sum(case when event = "post_arm" then col1 else 0 end) as post_sum,
calculated post_sum - calculated pre_sum as diff
from long
group by record_id
;
quit;
Note that the example provided in the online documentation of Proc NPAR1WAY for an exact two-sample test uses a long data set with the Class variable using the two levels.
So to perform such a test on your example data;
data have; input record_id event $ Q1 Q2 ; cards; 1 base 1 1 1 pre_arm 2 2 1 post_arm 3 3 2 base 3 3 2 pre_arm 2 2 2 post_arm 1 1 ; proc npar1way data=have; where event in ('pre_arm' 'post_arm'); class event; var q1 q2; exact wilcoxon; run;
The WHERE statement reduces it to two levels, as stated in your desire.
Thanks for your answer, I think before doing any stat test, even proc npar1way, need to calculate the total score in my first design, and I created that like
data have;
set have;
if event = "pre_arm" then total_w=Sum(of &Q1-Q24);
if event= "post_arm" then total_g=Sum(of &Q1-Q24);
run;
then, there would be two columns like the below, due to this missing, proc npar1way doesn’t work here
event total_w total_g
pre_arm 2 .
post_arm . 4
pre_arm 3 .
.
.
.
@bhr-q wrote:
Thanks for your answer, I think before doing any stat test, even proc npar1way, need to calculate the total score in my first design, and I created that like
data have; set have; if event = "pre_arm" then total_w=Sum(of &Q1-Q24); if event= "post_arm" then total_g=Sum(of &Q1-Q24);
run;then, there would be two columns like the below, due to this missing, proc npar1way doesn’t work here
event total_w total_g
pre_arm 2 .
post_arm . 4
pre_arm 3 .
.
.
.
Not sure what you are saying but if you just make ONE variable with the total it should work.
data want;
set have;
total=sum(of Q1-Q24);
run;
Do not overwrite your input dataset (you might lose information and have to remake if you made a logic error).
The & in your code should not be there. That would cause the macro processor to look for a macro variable named Q1 to expand before passing the resulting statement on to SAS to add to the data step.
One variable is correct, but the proc npar1way that you proposed needs two variables, and I believe proc npar1way doesn't work here.
@bhr-q wrote:
One variable is correct, but the proc npar1way that you proposed needs two variables, and I believe proc npar1way doesn't work here.
Huh. You have two variables. TOTAL is the numeric variable to use the VAR statement. The other is variable that has two values to use the CLASS statement.
https://www.stat.purdue.edu/~tqin/system101/method/method_wilcoxon_rank_sum_sas.htm
I know I have two variables, but with the first design, as I show above , proc npar1way doesn't work here.
my design is a Wilcoxson sign rank test not Wilcoxson rank sum test.
@bhr-q wrote:
Thank you for your help; this is a survey questionnaire with a Likert scale. I wanted to run the Wilcoxson sign rank test using proc univariate to do a few comparisons, such as comparing pre_arm with post_arm.
In the first design, if I create a column named total, which is the Sum(of Q1-Q24) for each patient_id, then how do I calculate (total score of post_arm – total score of pre_arm) for each id to place that in proc univariate?
but with the second design, I can create two columns named a total of pre_arm, which is Sum(of preQ1-preQ24), and a total of post_arm, which is Sum(of postQ1-postQ24), then calculate the difference b/n these measures to place that in proc univariate.
It seems that there would be easier way with long format, is n’t it?
This is good information. It should be in your original post, otherwise we have to guess why you are doing this confusing transformation. Without knowing this, we can't provide good answers. Anyway, @Kurt_Bremser (and possibly others) have shown how to do this with long data set, and no one could have provided the way to do this with the information in your original post.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.