DATA Step, Macro, Functions and more

Longitudinal data - programming

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 83
Accepted Solution

Longitudinal data - programming

Hi everyone,

I have a mock data that collect in 3 years: 2002, 2003 2005.

Variables are: fd, cl, hs, hl, all coded to 1 (yes) or 0 (no); with sub_id.

data test;
input id $ fd cl hs hl 2002 2003 2005;
datalines; 
1 1 1 0 . 1 0 0 . 1 1 0 1 
2 1 0 1 1 0 0 1 1 1 0 1 1
3 1 . 0 0 1 . 0 0 1 1 . 0 
5 1 1 1 0 0 1 1 1 . 1 0 0
...
;
run;

 

I want to have outcomes as below:

a) Highly severe: In EACH surveyed year, if a subject responded "yes" to ALL fd, cl, hs, hl.

b) Severe: In EACH surveyed year, if a subject responded "yes" at least TWO-THREE to fd, cl, hs, hl.

c) Moderate: In EACH surveyed year, if a subject responded "yes" at least ONE to fd, cl, hs, hl.

d) Mild: In ANY surveyed year, if a subject responded "yes" at least ONE to fd, cl, hs, hl.

e) None: In EACH surveyed year, if a subject responded "no" to ALL fd, cl, hs, hl.

 

Also, the data step, should I do them in wide data or long data. 

 

Much appreciate for your help.

 

Phan S.

 


Accepted Solutions
Solution
‎02-14-2018 06:58 PM
PROC Star
Posts: 2,233

Re: Longitudinal data - programming

Like this?

data HAVE; 
  input ID @; 
  do YEAR=2002 to 2004;
    input FD CL HS HL @;    
    output; 
  end;
cards; 
1 1 1 0 . 1 0 0 . 1 1 0 1 
2 1 0 1 1 0 0 1 1 1 0 1 1
3 0 . 0 0 0 . 0 0 0 0 . 0 
5 1 1 1 1 1 1 1 1 1 1 1 1
run;
proc sql;
  create table WANT as 
  select ID
       , case when min(NB_YES)=4 then 'Highly severe'
              when min(NB_YES)>1 then 'Severe'
              when min(NB_YES)=1 then 'Moderate'
              when min(NB_YES)=0 then 'None'
              else                    'Mild' end as OUTCOME
  from (
    select ID
         , YEAR
         , sum(FD, CL, HS, HL) as NB_YES
    from HAVE 
    group by ID, YEAR
       )
  group by ID;
quit;

ID OUTCOME
1  Moderate
2  Severe
3  None
5  Highly severe

View solution in original post


All Replies
Super User
Posts: 22,875

Re: Longitudinal data - programming

I'm not getting a feel for your data. You have more variables listed than in your datelines and you can't name variables 2003 so that wouldn't work anyways. 

 

Create at least two arrays, one for years, one for data. I'm assuming that if there is no survey then it's missing and not zero. 

 

a) Highly severe: In EACH surveyed year, if a subject responded "yes" to ALL fd, cl, hs, hl. -> Add them all together and see if they add to the expected number, sum of years * number of variables you're checking. 

b) Severe: In EACH surveyed year, if a subject responded "yes" at least TWO-THREE to fd, cl, hs, hl. -> add them all together and check if sum to higher than the minimum required number, 2*nYears

 

 

That should give you an idea of how these can be calculated.

 


PhanS wrote:

Hi everyone,

I have a mock data that collect in 3 years: 2002, 2003 2005.

Variables are: fd, cl, hs, hl, all coded to 1 (yes) or 0 (no); with sub_id.

data test;
input id $ fd cl hs hl 2002 2003 2005;
datalines; 
1 1 1 0 . 1 0 0 . 1 1 0 1 
2 1 0 1 1 0 0 1 1 1 0 1 1
3 1 . 0 0 1 . 0 0 1 1 . 0 
5 1 1 1 0 0 1 1 1 . 1 0 0
...
;
run;

 

I want to have outcomes as below:

a) Highly severe: In EACH surveyed year, if a subject responded "yes" to ALL fd, cl, hs, hl.

b) Severe: In EACH surveyed year, if a subject responded "yes" at least TWO-THREE to fd, cl, hs, hl.

c) Moderate: In EACH surveyed year, if a subject responded "yes" at least ONE to fd, cl, hs, hl.

d) Mild: In ANY surveyed year, if a subject responded "yes" at least ONE to fd, cl, hs, hl.

e) None: In EACH surveyed year, if a subject responded "no" to ALL fd, cl, hs, hl.

 

Also, the data step, should I do them in wide data or long data. 

 

Much appreciate for your help.

 

Phan S.

 


 

Solution
‎02-14-2018 06:58 PM
PROC Star
Posts: 2,233

Re: Longitudinal data - programming

Like this?

data HAVE; 
  input ID @; 
  do YEAR=2002 to 2004;
    input FD CL HS HL @;    
    output; 
  end;
cards; 
1 1 1 0 . 1 0 0 . 1 1 0 1 
2 1 0 1 1 0 0 1 1 1 0 1 1
3 0 . 0 0 0 . 0 0 0 0 . 0 
5 1 1 1 1 1 1 1 1 1 1 1 1
run;
proc sql;
  create table WANT as 
  select ID
       , case when min(NB_YES)=4 then 'Highly severe'
              when min(NB_YES)>1 then 'Severe'
              when min(NB_YES)=1 then 'Moderate'
              when min(NB_YES)=0 then 'None'
              else                    'Mild' end as OUTCOME
  from (
    select ID
         , YEAR
         , sum(FD, CL, HS, HL) as NB_YES
    from HAVE 
    group by ID, YEAR
       )
  group by ID;
quit;

ID OUTCOME
1  Moderate
2  Severe
3  None
5  Highly severe

Frequent Contributor
Posts: 83

Re: Longitudinal data - programming

Hello ChrisNZ,

 

Sorry for my delay. I accept your code as the solution. There is an issue, the outcomes came out only four categories. I played around but still give me only four. 

You are welcome to test. BTW, I thank you for you time.

 

data HAVE; 
  input ID @; 
  do YEAR=2002 to 2004;
    input FD CL HS HL @;    
    output; 
  end;
cards; 
1 1 1 1 1  1 1 1 1  1 1 1 1
2 1 1 1 .  1 1 0 .  1 1 0 1 
3 1 0 0 0  0 0 1 1  1 0 . 1
4 1 0 0 .  0 0 0 .  1 . 0 0
5 0 . 0 0  0 . 0 0  0 0 . 0 
6 1 1 1 1  1 1 1 1  1 1 1 1
7 1 1 1 .  1 1 0 .  1 1 0 1 
8 1 0 0 0  0 0 1 1  1 0 . 1
9 1 0 0 .  0 0 0 .  1 . 0 0
10 0 . 0 0  0 . 0 0  0 0 . 0 
run;
proc sql;
  create table WANT as 
  select ID
       , case when min(NB_YES)=4 then 'Highly severe'
              when 2=min(NB_YES)<4 then 'Severe'
              when min(NB_YES)=1 then 'Moderate'
              when min(NB_YES)=0 then 'None'
              else                    'Mild' end as OUTCOME_2
  from (
    select ID
         , YEAR
         , sum(FD, CL, HS, HL) as NB_YES
    from HAVE 
    group by ID, YEAR
       )
  group by ID;
quit;
PROC Star
Posts: 2,233

Re: Longitudinal data - programming

2=min(NB_YES)<4

is always true as 0 or 1 are < 4

Stick to the original syntax.

Frequent Contributor
Posts: 83

Re: Longitudinal data - programming

Dear Chris,

my apology to disturb you. I re-run the code, but I am not sure why "outcome" does exist in work.outcome.data.

 

I am looking for your advice. Thank you.

 

Phan S.   

 

Super User
Posts: 22,875

Re: Longitudinal data - programming

Code and log because we can't see your computer. 

Frequent Contributor
Posts: 83

Re: Longitudinal data - programming

9653  data HAVE;
9654    input ID @;
9655    do YEAR=2002 to 2004;
9656      input FD CL HS HL @;
9657      output;
9658    end;
9659  cards;

NOTE: The data set WORK.HAVE has 30 observations and 6 variables.
NOTE: DATA statement used (Total process time):
      real time           0.09 seconds
      cpu time            0.00 seconds


9670  run;
9671
9672  proc sql;
9673    create table WANT as
9674    select ID
9675         , case when min(NB_YES)=4 then 'Highly severe'
9676                when min(NB_YES)>1 then 'Severe'
9677                when min(NB_YES)=1 then 'Moderate'
9678                when min(NB_YES)=0 then 'None'
9679                else                    'Mild'
9680    end as OUTCOME
9681    from (
9682      select ID
9683           , YEAR
9684           , sum(FD, CL, HS, HL) as NB_YES
9685      from HAVE
9686      group by ID, YEAR
9687         )
9688    group by ID;
NOTE: A GROUP BY clause has been discarded because neither the SELECT clause nor the
      optional HAVING clause of the associated table-expression referenced a summary
      function.
NOTE: Table WORK.WANT created, with 10 rows and 2 columns.

9689  quit;
NOTE: PROCEDURE SQL used (Total process time):
      real time           0.28 seconds
      cpu time            0.04 seconds


9690  proc freq data=outcome;
ERROR: File WORK.OUTCOME.DATA does not exist.
9691  tables nb_yes;
ERROR: No data set open to look up variables.
9692  run;

NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE FREQ used (Total process time):
      real time           0.00 seconds
      cpu time            0.01 seconds

Hello, Reeza: Please kindly see the log and code below.

data HAVE; 
  input ID @; 
  do YEAR=2002 to 2004;
    input FD CL HS HL @;    
    output; 
  end;
cards; 
1 1 1 1 1  1 1 1 1  1 1 1 1
2 1 1 1 .  1 1 0 .  1 1 0 1 
3 1 0 0 0  0 0 1 1  1 0 . 1
4 1 0 0 .  0 0 0 .  1 . 0 0
5 0 . 0 0  0 . 0 0  0 0 . 0 
6 1 1 1 1  1 1 1 1  1 1 1 1
7 1 1 1 .  1 1 0 .  1 1 0 1 
8 1 0 0 0  0 0 1 1  1 0 . 1
9 1 0 0 .  0 0 0 .  1 . 0 0
10 0 . 0 0  0 . 0 0  0 0 . 0 
run;

proc sql;
  create table WANT as 
  select ID
       , case when min(NB_YES)=4 then 'Highly severe'
              when min(NB_YES)>1 then 'Severe'
              when min(NB_YES)=1 then 'Moderate'
              when min(NB_YES)=0 then 'None'
              else                    'Mild' 
  end as OUTCOME
  from (
    select ID
         , YEAR
         , sum(FD, CL, HS, HL) as NB_YES
    from HAVE 
    group by ID, YEAR
       )
  group by ID;
quit;

proc sort data=outcome;
proc print data=want;
var id;
run;

proc freq data=outcome;
tables nb_yes;
run;
PROC Star
Posts: 2,233

Re: Longitudinal data - programming

Really?

 

The table is called WANT in your code, not OUTCOME.

 

 

Super User
Posts: 22,875

Re: Longitudinal data - programming

PROC SORT has no BY statement either. 

 

Super User
Posts: 22,875

Re: Longitudinal data - programming

And the variable with the results is called OUTCOME not nb_yes, that was the original variable.
PROC Star
Posts: 2,233

Re: Longitudinal data - programming

I don't understand what the issue is.

PROC Star
Posts: 2,233

Re: Longitudinal data - programming

Also: Wide data is seldom a good idea. I created a long table.

Frequent Contributor
Posts: 83

Re: Longitudinal data - programming

Appear not working.

 

How can I run proc freq from proc SQL?

 

 

Also, can you tell me how can I random extract a data?

Example, I want to randomize extract and have a data of 50 subjects out from the original data of 2000+ subject.   

Super User
Posts: 22,875

Re: Longitudinal data - programming


PhanS wrote:

Appear not working.

What does that mean? Did SAS explode?

How can I run proc freq from proc SQL?

 


You can't and more importantly shouldn't.

 


 

Also, can you tell me how can I random extract a data?

Example, I want to randomize extract and have a data of 50 subjects out from the original data of 2000+ subject.   


 

PROC SURVEYSELECT will select Random Samples.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 20 replies
  • 187 views
  • 0 likes
  • 4 in conversation