Solved: Longitudinal data - programming

PhanS · Posted 02-12-2018 10:07 PM

Hi everyone,

I have a mock data that collect in 3 years: 2002, 2003 2005.

Variables are: fd, cl, hs, hl, all coded to 1 (yes) or 0 (no); with sub_id.

data test;
input id $ fd cl hs hl 2002 2003 2005;
datalines; 
1 1 1 0 . 1 0 0 . 1 1 0 1 
2 1 0 1 1 0 0 1 1 1 0 1 1
3 1 . 0 0 1 . 0 0 1 1 . 0 
5 1 1 1 0 0 1 1 1 . 1 0 0
...
;
run;

I want to have outcomes as below:

a) Highly severe: In EACH surveyed year, if a subject responded "yes" to ALL fd, cl, hs, hl.

b) Severe: In EACH surveyed year, if a subject responded "yes" at least TWO-THREE to fd, cl, hs, hl.

c) Moderate: In EACH surveyed year, if a subject responded "yes" at least ONE to fd, cl, hs, hl.

d) Mild: In ANY surveyed year, if a subject responded "yes" at least ONE to fd, cl, hs, hl.

e) None: In EACH surveyed year, if a subject responded "no" to ALL fd, cl, hs, hl.

Also, the data step, should I do them in wide data or long data.

Much appreciate for your help.

Phan S.

ChrisNZ · Posted 02-12-2018 10:33 PM

Like this?

data HAVE; 
  input ID @; 
  do YEAR=2002 to 2004;
    input FD CL HS HL @;    
    output; 
  end;
cards; 
1 1 1 0 . 1 0 0 . 1 1 0 1 
2 1 0 1 1 0 0 1 1 1 0 1 1
3 0 . 0 0 0 . 0 0 0 0 . 0 
5 1 1 1 1 1 1 1 1 1 1 1 1
run;
proc sql;
  create table WANT as 
  select ID
       , case when min(NB_YES)=4 then 'Highly severe'
              when min(NB_YES)>1 then 'Severe'
              when min(NB_YES)=1 then 'Moderate'
              when min(NB_YES)=0 then 'None'
              else                    'Mild' end as OUTCOME
  from (
    select ID
         , YEAR
         , sum(FD, CL, HS, HL) as NB_YES
    from HAVE 
    group by ID, YEAR
       )
  group by ID;
quit;

ID OUTCOME
1 Moderate
2 Severe
3 None
5 Highly severe

High-Performance SAS Coding - Third Edition

View solution in original post

Reeza · Posted 02-12-2018 10:19 PM

I'm not getting a feel for your data. You have more variables listed than in your datelines and you can't name variables 2003 so that wouldn't work anyways.

Create at least two arrays, one for years, one for data. I'm assuming that if there is no survey then it's missing and not zero.

a) Highly severe: In EACH surveyed year, if a subject responded "yes" to ALL fd, cl, hs, hl. -> Add them all together and see if they add to the expected number, sum of years * number of variables you're checking.

b) Severe: In EACH surveyed year, if a subject responded "yes" at least TWO-THREE to fd, cl, hs, hl. -> add them all together and check if sum to higher than the minimum required number, 2*nYears

That should give you an idea of how these can be calculated.

@PhanS wrote:

Hi everyone,

I have a mock data that collect in 3 years: 2002, 2003 2005.

Variables are: fd, cl, hs, hl, all coded to 1 (yes) or 0 (no); with sub_id.
data test;
input id $ fd cl hs hl 2002 2003 2005;
datalines; 
1 1 1 0 . 1 0 0 . 1 1 0 1 
2 1 0 1 1 0 0 1 1 1 0 1 1
3 1 . 0 0 1 . 0 0 1 1 . 0 
5 1 1 1 0 0 1 1 1 . 1 0 0
...
;
run;
I want to have outcomes as below:

a) Highly severe: In EACH surveyed year, if a subject responded "yes" to ALL fd, cl, hs, hl.

b) Severe: In EACH surveyed year, if a subject responded "yes" at least TWO-THREE to fd, cl, hs, hl.

c) Moderate: In EACH surveyed year, if a subject responded "yes" at least ONE to fd, cl, hs, hl.

d) Mild: In ANY surveyed year, if a subject responded "yes" at least ONE to fd, cl, hs, hl.

e) None: In EACH surveyed year, if a subject responded "no" to ALL fd, cl, hs, hl.

Also, the data step, should I do them in wide data or long data.

Much appreciate for your help.

Phan S.

ChrisNZ · Posted 02-12-2018 10:33 PM

Like this?

data HAVE; 
  input ID @; 
  do YEAR=2002 to 2004;
    input FD CL HS HL @;    
    output; 
  end;
cards; 
1 1 1 0 . 1 0 0 . 1 1 0 1 
2 1 0 1 1 0 0 1 1 1 0 1 1
3 0 . 0 0 0 . 0 0 0 0 . 0 
5 1 1 1 1 1 1 1 1 1 1 1 1
run;
proc sql;
  create table WANT as 
  select ID
       , case when min(NB_YES)=4 then 'Highly severe'
              when min(NB_YES)>1 then 'Severe'
              when min(NB_YES)=1 then 'Moderate'
              when min(NB_YES)=0 then 'None'
              else                    'Mild' end as OUTCOME
  from (
    select ID
         , YEAR
         , sum(FD, CL, HS, HL) as NB_YES
    from HAVE 
    group by ID, YEAR
       )
  group by ID;
quit;

ID OUTCOME
1 Moderate
2 Severe
3 None
5 Highly severe

High-Performance SAS Coding - Third Edition

PhanS · Posted 02-14-2018 07:02 PM

Hello ChrisNZ,

Sorry for my delay. I accept your code as the solution. There is an issue, the outcomes came out only four categories. I played around but still give me only four.

You are welcome to test. BTW, I thank you for you time.

data HAVE; 
  input ID @; 
  do YEAR=2002 to 2004;
    input FD CL HS HL @;    
    output; 
  end;
cards; 
1 1 1 1 1  1 1 1 1  1 1 1 1
2 1 1 1 .  1 1 0 .  1 1 0 1 
3 1 0 0 0  0 0 1 1  1 0 . 1
4 1 0 0 .  0 0 0 .  1 . 0 0
5 0 . 0 0  0 . 0 0  0 0 . 0 
6 1 1 1 1  1 1 1 1  1 1 1 1
7 1 1 1 .  1 1 0 .  1 1 0 1 
8 1 0 0 0  0 0 1 1  1 0 . 1
9 1 0 0 .  0 0 0 .  1 . 0 0
10 0 . 0 0  0 . 0 0  0 0 . 0 
run;
proc sql;
  create table WANT as 
  select ID
       , case when min(NB_YES)=4 then 'Highly severe'
              when 2=min(NB_YES)<4 then 'Severe'
              when min(NB_YES)=1 then 'Moderate'
              when min(NB_YES)=0 then 'None'
              else                    'Mild' end as OUTCOME_2
  from (
    select ID
         , YEAR
         , sum(FD, CL, HS, HL) as NB_YES
    from HAVE 
    group by ID, YEAR
       )
  group by ID;
quit;

ChrisNZ · Posted 02-14-2018 07:46 PM

2=min(NB_YES)<4

is always true as 0 or 1 are < 4

Stick to the original syntax.

High-Performance SAS Coding - Third Edition

PhanS · Posted 02-17-2018 12:54 PM

Dear Chris,

my apology to disturb you. I re-run the code, but I am not sure why "outcome" does exist in work.outcome.data.

I am looking for your advice. Thank you.

Phan S.

Reeza · Posted 02-17-2018 04:26 PM

Code and log because we can't see your computer.

PhanS · Posted 02-18-2018 10:04 PM

9653  data HAVE;
9654    input ID @;
9655    do YEAR=2002 to 2004;
9656      input FD CL HS HL @;
9657      output;
9658    end;
9659  cards;

NOTE: The data set WORK.HAVE has 30 observations and 6 variables.
NOTE: DATA statement used (Total process time):
      real time           0.09 seconds
      cpu time            0.00 seconds


9670  run;
9671
9672  proc sql;
9673    create table WANT as
9674    select ID
9675         , case when min(NB_YES)=4 then 'Highly severe'
9676                when min(NB_YES)>1 then 'Severe'
9677                when min(NB_YES)=1 then 'Moderate'
9678                when min(NB_YES)=0 then 'None'
9679                else                    'Mild'
9680    end as OUTCOME
9681    from (
9682      select ID
9683           , YEAR
9684           , sum(FD, CL, HS, HL) as NB_YES
9685      from HAVE
9686      group by ID, YEAR
9687         )
9688    group by ID;
NOTE: A GROUP BY clause has been discarded because neither the SELECT clause nor the
      optional HAVING clause of the associated table-expression referenced a summary
      function.
NOTE: Table WORK.WANT created, with 10 rows and 2 columns.

9689  quit;
NOTE: PROCEDURE SQL used (Total process time):
      real time           0.28 seconds
      cpu time            0.04 seconds


9690  proc freq data=outcome;
ERROR: File WORK.OUTCOME.DATA does not exist.
9691  tables nb_yes;
ERROR: No data set open to look up variables.
9692  run;

NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE FREQ used (Total process time):
      real time           0.00 seconds
      cpu time            0.01 seconds

Hello, Reeza: Please kindly see the log and code below.

data HAVE; 
  input ID @; 
  do YEAR=2002 to 2004;
    input FD CL HS HL @;    
    output; 
  end;
cards; 
1 1 1 1 1  1 1 1 1  1 1 1 1
2 1 1 1 .  1 1 0 .  1 1 0 1 
3 1 0 0 0  0 0 1 1  1 0 . 1
4 1 0 0 .  0 0 0 .  1 . 0 0
5 0 . 0 0  0 . 0 0  0 0 . 0 
6 1 1 1 1  1 1 1 1  1 1 1 1
7 1 1 1 .  1 1 0 .  1 1 0 1 
8 1 0 0 0  0 0 1 1  1 0 . 1
9 1 0 0 .  0 0 0 .  1 . 0 0
10 0 . 0 0  0 . 0 0  0 0 . 0 
run;

proc sql;
  create table WANT as 
  select ID
       , case when min(NB_YES)=4 then 'Highly severe'
              when min(NB_YES)>1 then 'Severe'
              when min(NB_YES)=1 then 'Moderate'
              when min(NB_YES)=0 then 'None'
              else                    'Mild' 
  end as OUTCOME
  from (
    select ID
         , YEAR
         , sum(FD, CL, HS, HL) as NB_YES
    from HAVE 
    group by ID, YEAR
       )
  group by ID;
quit;

proc sort data=outcome;
proc print data=want;
var id;
run;

proc freq data=outcome;
tables nb_yes;
run;

ChrisNZ · Posted 02-18-2018 10:11 PM

Really?

The table is called WANT in your code, not OUTCOME.

High-Performance SAS Coding - Third Edition

Reeza · Posted 02-18-2018 10:36 PM

PROC SORT has no BY statement either.

Reeza · Posted 02-18-2018 10:38 PM

And the variable with the results is called OUTCOME not nb_yes, that was the original variable.

ChrisNZ · Posted 02-18-2018 09:44 PM

I don't understand what the issue is.

High-Performance SAS Coding - Third Edition

ChrisNZ · Posted 02-12-2018 10:41 PM

Also: Wide data is seldom a good idea. I created a long table.

High-Performance SAS Coding - Third Edition

PhanS · Posted 02-13-2018 09:20 AM

Appear not working.

How can I run proc freq from proc SQL?

Also, can you tell me how can I random extract a data?

Example, I want to randomize extract and have a data of 50 subjects out from the original data of 2000+ subject.

Reeza · Posted 02-13-2018 10:46 AM

@PhanS wrote:

Appear not working.

What does that mean? Did SAS explode?

How can I run proc freq from proc SQL?

You can't and more importantly shouldn't.

Also, can you tell me how can I random extract a data?

Example, I want to randomize extract and have a data of 50 subjects out from the original data of 2000+ subject.

PROC SURVEYSELECT will select Random Samples.

Longitudinal data - programming

Re: Longitudinal data - programming

Re: Longitudinal data - programming

Re: Longitudinal data - programming

Re: Longitudinal data - programming

Re: Longitudinal data - programming

Re: Longitudinal data - programming

Re: Longitudinal data - programming

Re: Longitudinal data - programming

Re: Longitudinal data - programming

Re: Longitudinal data - programming

Re: Longitudinal data - programming

Re: Longitudinal data - programming

Re: Longitudinal data - programming

Re: Longitudinal data - programming

Re: Longitudinal data - programming

Catch up on SAS Innovate 2026

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away