Hi everyone,
I have a mock data that collect in 3 years: 2002, 2003 2005.
Variables are: fd, cl, hs, hl, all coded to 1 (yes) or 0 (no); with sub_id.
data test;
input id $ fd cl hs hl 2002 2003 2005;
datalines;
1 1 1 0 . 1 0 0 . 1 1 0 1
2 1 0 1 1 0 0 1 1 1 0 1 1
3 1 . 0 0 1 . 0 0 1 1 . 0
5 1 1 1 0 0 1 1 1 . 1 0 0
...
;
run;
I want to have outcomes as below:
a) Highly severe: In EACH surveyed year, if a subject responded "yes" to ALL fd, cl, hs, hl.
b) Severe: In EACH surveyed year, if a subject responded "yes" at least TWO-THREE to fd, cl, hs, hl.
c) Moderate: In EACH surveyed year, if a subject responded "yes" at least ONE to fd, cl, hs, hl.
d) Mild: In ANY surveyed year, if a subject responded "yes" at least ONE to fd, cl, hs, hl.
e) None: In EACH surveyed year, if a subject responded "no" to ALL fd, cl, hs, hl.
Also, the data step, should I do them in wide data or long data.
Much appreciate for your help.
Phan S.
Like this?
data HAVE;
input ID @;
do YEAR=2002 to 2004;
input FD CL HS HL @;
output;
end;
cards;
1 1 1 0 . 1 0 0 . 1 1 0 1
2 1 0 1 1 0 0 1 1 1 0 1 1
3 0 . 0 0 0 . 0 0 0 0 . 0
5 1 1 1 1 1 1 1 1 1 1 1 1
run;
proc sql;
create table WANT as
select ID
, case when min(NB_YES)=4 then 'Highly severe'
when min(NB_YES)>1 then 'Severe'
when min(NB_YES)=1 then 'Moderate'
when min(NB_YES)=0 then 'None'
else 'Mild' end as OUTCOME
from (
select ID
, YEAR
, sum(FD, CL, HS, HL) as NB_YES
from HAVE
group by ID, YEAR
)
group by ID;
quit;
ID OUTCOME
1 Moderate
2 Severe
3 None
5 Highly severe
I'm not getting a feel for your data. You have more variables listed than in your datelines and you can't name variables 2003 so that wouldn't work anyways.
Create at least two arrays, one for years, one for data. I'm assuming that if there is no survey then it's missing and not zero.
a) Highly severe: In EACH surveyed year, if a subject responded "yes" to ALL fd, cl, hs, hl. -> Add them all together and see if they add to the expected number, sum of years * number of variables you're checking.
b) Severe: In EACH surveyed year, if a subject responded "yes" at least TWO-THREE to fd, cl, hs, hl. -> add them all together and check if sum to higher than the minimum required number, 2*nYears
That should give you an idea of how these can be calculated.
@PhanS wrote:
Hi everyone,
I have a mock data that collect in 3 years: 2002, 2003 2005.
Variables are: fd, cl, hs, hl, all coded to 1 (yes) or 0 (no); with sub_id.
data test; input id $ fd cl hs hl 2002 2003 2005; datalines; 1 1 1 0 . 1 0 0 . 1 1 0 1 2 1 0 1 1 0 0 1 1 1 0 1 1 3 1 . 0 0 1 . 0 0 1 1 . 0 5 1 1 1 0 0 1 1 1 . 1 0 0 ... ; run;
I want to have outcomes as below:
a) Highly severe: In EACH surveyed year, if a subject responded "yes" to ALL fd, cl, hs, hl.
b) Severe: In EACH surveyed year, if a subject responded "yes" at least TWO-THREE to fd, cl, hs, hl.
c) Moderate: In EACH surveyed year, if a subject responded "yes" at least ONE to fd, cl, hs, hl.
d) Mild: In ANY surveyed year, if a subject responded "yes" at least ONE to fd, cl, hs, hl.
e) None: In EACH surveyed year, if a subject responded "no" to ALL fd, cl, hs, hl.
Also, the data step, should I do them in wide data or long data.
Much appreciate for your help.
Phan S.
Like this?
data HAVE;
input ID @;
do YEAR=2002 to 2004;
input FD CL HS HL @;
output;
end;
cards;
1 1 1 0 . 1 0 0 . 1 1 0 1
2 1 0 1 1 0 0 1 1 1 0 1 1
3 0 . 0 0 0 . 0 0 0 0 . 0
5 1 1 1 1 1 1 1 1 1 1 1 1
run;
proc sql;
create table WANT as
select ID
, case when min(NB_YES)=4 then 'Highly severe'
when min(NB_YES)>1 then 'Severe'
when min(NB_YES)=1 then 'Moderate'
when min(NB_YES)=0 then 'None'
else 'Mild' end as OUTCOME
from (
select ID
, YEAR
, sum(FD, CL, HS, HL) as NB_YES
from HAVE
group by ID, YEAR
)
group by ID;
quit;
ID OUTCOME
1 Moderate
2 Severe
3 None
5 Highly severe
Hello ChrisNZ,
Sorry for my delay. I accept your code as the solution. There is an issue, the outcomes came out only four categories. I played around but still give me only four.
You are welcome to test. BTW, I thank you for you time.
data HAVE;
input ID @;
do YEAR=2002 to 2004;
input FD CL HS HL @;
output;
end;
cards;
1 1 1 1 1 1 1 1 1 1 1 1 1
2 1 1 1 . 1 1 0 . 1 1 0 1
3 1 0 0 0 0 0 1 1 1 0 . 1
4 1 0 0 . 0 0 0 . 1 . 0 0
5 0 . 0 0 0 . 0 0 0 0 . 0
6 1 1 1 1 1 1 1 1 1 1 1 1
7 1 1 1 . 1 1 0 . 1 1 0 1
8 1 0 0 0 0 0 1 1 1 0 . 1
9 1 0 0 . 0 0 0 . 1 . 0 0
10 0 . 0 0 0 . 0 0 0 0 . 0
run;
proc sql;
create table WANT as
select ID
, case when min(NB_YES)=4 then 'Highly severe'
when 2=min(NB_YES)<4 then 'Severe'
when min(NB_YES)=1 then 'Moderate'
when min(NB_YES)=0 then 'None'
else 'Mild' end as OUTCOME_2
from (
select ID
, YEAR
, sum(FD, CL, HS, HL) as NB_YES
from HAVE
group by ID, YEAR
)
group by ID;
quit;
2=min(NB_YES)<4
is always true as 0 or 1 are < 4
Stick to the original syntax.
Dear Chris,
my apology to disturb you. I re-run the code, but I am not sure why "outcome" does exist in work.outcome.data.
I am looking for your advice. Thank you.
Phan S.
Code and log because we can't see your computer.
9653 data HAVE;
9654 input ID @;
9655 do YEAR=2002 to 2004;
9656 input FD CL HS HL @;
9657 output;
9658 end;
9659 cards;
NOTE: The data set WORK.HAVE has 30 observations and 6 variables.
NOTE: DATA statement used (Total process time):
real time 0.09 seconds
cpu time 0.00 seconds
9670 run;
9671
9672 proc sql;
9673 create table WANT as
9674 select ID
9675 , case when min(NB_YES)=4 then 'Highly severe'
9676 when min(NB_YES)>1 then 'Severe'
9677 when min(NB_YES)=1 then 'Moderate'
9678 when min(NB_YES)=0 then 'None'
9679 else 'Mild'
9680 end as OUTCOME
9681 from (
9682 select ID
9683 , YEAR
9684 , sum(FD, CL, HS, HL) as NB_YES
9685 from HAVE
9686 group by ID, YEAR
9687 )
9688 group by ID;
NOTE: A GROUP BY clause has been discarded because neither the SELECT clause nor the
optional HAVING clause of the associated table-expression referenced a summary
function.
NOTE: Table WORK.WANT created, with 10 rows and 2 columns.
9689 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 0.28 seconds
cpu time 0.04 seconds
9690 proc freq data=outcome;
ERROR: File WORK.OUTCOME.DATA does not exist.
9691 tables nb_yes;
ERROR: No data set open to look up variables.
9692 run;
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE FREQ used (Total process time):
real time 0.00 seconds
cpu time 0.01 seconds
Hello, Reeza: Please kindly see the log and code below.
data HAVE;
input ID @;
do YEAR=2002 to 2004;
input FD CL HS HL @;
output;
end;
cards;
1 1 1 1 1 1 1 1 1 1 1 1 1
2 1 1 1 . 1 1 0 . 1 1 0 1
3 1 0 0 0 0 0 1 1 1 0 . 1
4 1 0 0 . 0 0 0 . 1 . 0 0
5 0 . 0 0 0 . 0 0 0 0 . 0
6 1 1 1 1 1 1 1 1 1 1 1 1
7 1 1 1 . 1 1 0 . 1 1 0 1
8 1 0 0 0 0 0 1 1 1 0 . 1
9 1 0 0 . 0 0 0 . 1 . 0 0
10 0 . 0 0 0 . 0 0 0 0 . 0
run;
proc sql;
create table WANT as
select ID
, case when min(NB_YES)=4 then 'Highly severe'
when min(NB_YES)>1 then 'Severe'
when min(NB_YES)=1 then 'Moderate'
when min(NB_YES)=0 then 'None'
else 'Mild'
end as OUTCOME
from (
select ID
, YEAR
, sum(FD, CL, HS, HL) as NB_YES
from HAVE
group by ID, YEAR
)
group by ID;
quit;
proc sort data=outcome;
proc print data=want;
var id;
run;
proc freq data=outcome;
tables nb_yes;
run;
Really?
The table is called WANT in your code, not OUTCOME.
PROC SORT has no BY statement either.
I don't understand what the issue is.
Also: Wide data is seldom a good idea. I created a long table.
Appear not working.
How can I run proc freq from proc SQL?
Also, can you tell me how can I random extract a data?
Example, I want to randomize extract and have a data of 50 subjects out from the original data of 2000+ subject.
@PhanS wrote:
Appear not working.
What does that mean? Did SAS explode?
How can I run proc freq from proc SQL?
You can't and more importantly shouldn't.
Also, can you tell me how can I random extract a data?
Example, I want to randomize extract and have a data of 50 subjects out from the original data of 2000+ subject.
PROC SURVEYSELECT will select Random Samples.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.