BookmarkSubscribeRSS Feed
CathyVI
Pyrite | Level 9

Hello all, 

I have initially posted this inappropriately, am sorry. I am creating a better representation of the issue so I can get help on this. 

I have a dataset with category (teachers, students) and a list of forms (form2, event46, case32) which they are expected to fill within 60days after the school resume. Event46 has a group variable (grp=A, B, C). That is someone can be in any of the group. Among the teachers and students, some did not fill the form (0), some filled the form within 30 days (1), and some filled the form within 60 days (2). 

The 0,1,2 represents the days that the form was filled below:

No form filled=0

Form filled within 0-30days=1

Form filled within 31-60days=2

 

This is what I want

 

Table 1: 

1. If a teacher or student only filled form2 within 30days and did not fill event46 or case32 they should be in the form2 only column. Same apply for other forms.

If they filled the form after 30 days, they will be in the 31-60 days count by category. 

2. If anyone filled more than one(1) forms either within 30 days or within 60 days or both, they should be in the multiple column.

 

Table 2:

3. Finally, I want a total of those who did not fill any form in both category(teacher or student).

 

I have created a sample output and a demo dataset. Thanks so much for your help.

data have;
input category $ form2 case32 event46 grp $;
datalines;
Teacher 1 0 0 .
Student 1 0 0 .
Teacher 0 1 0 .
Teacher 1 0 1 A
Teacher 0 0 1 C 
Teacher 0 0 1 B
Teacher 1 0 0 .
Student 0 1 1 C
Student 0 0 1 C
Student 0 0 1 C
Student 1 0 0 .
Student 0 1 0 .
Student 0 2 0 .
Teacher 1 1 1 B
Teacher 0 0 2 A 
Teacher 0 0 2 B
Teacher 2 0 0 .
Student 0 1 1 C
Teacher 0 0 0 .
Student 0 0 0 .
Student 0 0 1 B 
Teacher 0 0 1 B 
Teacher 0 2 0 .
Teacher 2 2 0 .
Student 0 1 0 .
Teacher 2 0 1 A
Student 0 0 1 A 
Teacher 0 0 0 .
Student 2 0 0 .
Teacher 0 0 2 C
Student 0 0 2 C
Student 0 0 2 B
Student 0 0 2 A
;
run;

The expected output should look like this:

 

Form2

only

Case32 only

Event46 A

only

Event46 B

only

Event46 C

Only

Multiple

Total

Teacher fill form within 0-30 days

 

 

 

 

 

 

 

Teacher fill form within 31-60 days

 

 

 

 

 

 

 

Student fill form within 0-30days

 

 

 

 

 

 

 

Student fill form within 31-60days

 

 

 

 

 

 

 

Total

 

 

 

 

 

 

 

 

 

 Total

Teacher who did not complete any form within 0-60 days

 

Student who did not complete any form within 0-60 days

 

Total

 

 

8 REPLIES 8
Patrick
Opal | Level 21

One way to go is to reshape your data so it's easy to use with Proc's like Tabulate or Report.

Below code is fully working with the sample data you shared.

data prep;
  set have;
  array vars{*} form2 case32 event46;
  length form_name $32;
  length report_form_name $32;

  do _i=1 to dim(vars);
    form_cnt=sum(form_cnt, vars(_i)>0);
    max_day_grp=max(max_day_grp,vars(_i));
  end;

  if form_cnt=0 then 
    do;
      max_day_grp=99;
      output;
    end;

  else
  if form_cnt>0 then
    do;
      do form_id=1 to dim(vars);
        if vars(form_id)>0 then
          do;
            form_name=propcase(vname(vars(form_id)));
            day_grp=vars(form_id);
            if form_cnt=1 then report_form_id=form_id;
            else report_form_id=99;

            if form_cnt=1 then 
              do;
                report_form_name=catx('_',form_name,grp);
                report_grp=grp;
              end;
            else 
              do;
                report_form_name='Multiple';
                report_grp=' ';
              end;
            output;
          end;
      end;
  end;

  drop form2 case32 event46 _i;
run;

proc format;
  value form_type
    1 ='Form2'
    2 ='Case32'
    3 ='Event46'
    99='Multiple'
    ;
  value day_grp
    1 ='0-30 days'
    2 ='31-60 days'
    99='Did not complete any form within 0-60 days'
    ;
  value form_cnt
    1     ='Single'
    2-high='Multi-Form'
    ;
run;

%let sv_missing=%sysfunc(getoption(missing,keyword));
options missing=' ';

title 'Table 1: Option 1';
proc tabulate data=prep(where=(form_cnt>0)) missing;
  class category report_grp report_form_id form_cnt max_day_grp;
  format max_day_grp day_grp. report_form_id form_type. form_cnt form_cnt.;
  table
    category=' '*max_day_grp=' ' all='Total',
    (form_cnt=' '*report_grp=' '*report_form_id=' ' all='Total')*n=' '
    ;
run;
title;

title 'Table 1: Option 2';
proc tabulate data=prep(where=(form_cnt>0)) missing;
  class category report_form_name form_cnt max_day_grp;
  format max_day_grp day_grp. report_form_id form_type. form_cnt form_cnt.;
  table
    category=' '*max_day_grp=' ' all='Total',
    (form_cnt=' '*report_form_name=' ' all='Total')*n=' '
    ;
run;
title;

title 'Table 2';
proc tabulate data=prep(where=(form_cnt=0)) missing;
  class category max_day_grp;
  format max_day_grp day_grp.;
  table
    category=' '*max_day_grp=' ' all='Total',
    all='Total'*n=' '
    ;
run;
title;

options &sv_missing;

Patrick_0-1714353759018.png

 

 

CathyVI
Pyrite | Level 9

@Patrick  Why has the observation increases from 33 to 40?

Patrick
Opal | Level 21

@CathyVI wrote:

@Patrick  Why has the observation increases from 33 to 40?


That's because the data gets transformed from a wide to a long structure with one row per form - and a bit special a "placeholder" row for individuals that didn't provide any form.

Patrick
Opal | Level 21

To just transform the data from wide to long (=1 row per form whether filled out or not) without special treatment for cases with no form provided, you could also consider below approach. The main difference to the previous version is the addition of a form_ind variable (0 or 1) that then gets used in Proc Tabulate for SUM instead of N (row count).

data long;
  set have;
  array vars{*} form2 case32 event46;
  length form_name $32;
  length report_form_name $32;

  do _i=1 to dim(vars);
    form_cnt=sum(0,form_cnt, vars(_i)>0);
    max_day_grp=max(max_day_grp,vars(_i));
  end;

  if form_cnt=0 then max_day_grp=99;

  do form_id=1 to dim(vars);
    form_ind  =vars(form_id)>0;
    form_name =propcase(vname(vars(form_id)));
    if form_cnt=0 then report_form_name='No form';
    else if form_cnt>1 then report_form_name='Multiple Forms';
    else if form_name='Event46' then report_form_name=catx('_',form_name,grp);
    else report_form_name=form_name;
    day_grp=vars(form_id);
    output;
  end;

  drop form2 case32 event46 _i;
run;

proc format;
  value form_type
    1 ='Form2'
    2 ='Case32'
    3 ='Event46'
    99='Multiple'
    ;
  value day_grp
    0 ='No form'
    1 ='0-30 days'
    2 ='31-60 days'
    99='Did not complete any form within 0-60 days'
    ;
  value form_cnt
    1     ='Single'
    2-high='Multi-Form'
    ;
run;

%let sv_missing=%sysfunc(getoption(missing,keyword));
options missing=' ';

title 'Table 1';
proc tabulate data=long(where=(form_ind=1));
  class category report_form_name form_cnt max_day_grp;
  var form_ind;
  format form_cnt form_cnt. max_day_grp day_grp.;
  table
    category=' '*max_day_grp=' ' all='Total',
    (report_form_name=' ' all='Total')*form_ind=' '*sum=' '*f=best32.0
    ;
run;
title;

title 'Table 2';
proc tabulate data=long(where=(form_cnt=0 and form_id=1));
  class category max_day_grp;
  format max_day_grp day_grp.;
  table
    category=' '*max_day_grp=' ' all='Total',
    all='Total'*n=' '
    ;
run;
title;
options &sv_missing;

 

CathyVI
Pyrite | Level 9

@Patrick Can does blank spaces in table 1 be converted to 0. So I can as well have the count of missing in table 1 although I will actually be counting it in table 2. Thanks

Patrick
Opal | Level 21

To display missings as zero just use:

options missing='0';

 

CathyVI
Pyrite | Level 9

@Patrick  How can anyone with a value of 1 be counted as a single case. For example row 33 in our data. When form2=2,case32=0 and event46=1, the report_form_name should be Event46A instead of multiple. 

 

Anyone with 2 double digits e.g., 110, 220 should remain as multiple.

 

Patrick
Opal | Level 21

@CathyVI wrote:

@Patrick  How can anyone with a value of 1 be counted as a single case. For example row 33 in our data. When form2=2,case32=0 and event46=1, the report_form_name should be Event46A instead of multiple. 

 

Anyone with 2 double digits e.g., 110, 220 should remain as multiple.

 


So now it's about counting cases and no more forms which totally changes the story!

How would you want to report cases 112, 121, 211, 122, 212, 221? Please be specific and define all your cases. Provide both the variable they should appear under AND the time slot if there are multiple.

 

It would help if you could provide below table fully populated. With this data one can create formats which then should make the rest of the coding rather simple.

Multiple patterns can of course belong to the same case (case_no). It would then also help if you provide sample data for all the patterns.

Obs

pattern case_no time_slot description
1 000      
2 001      
3 002      
4 010      
5 011      
6 012      
7 020      
8 021      
9 022      
10 100      
11 101      
12 102      
13 110      
14 111      
15 112      
16 120      
17 121      
18 122      
19 200      
20 201      
21 202      
22 210      
23 211      
24 212      
25 220      
26 221      
27 222      

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 374 views
  • 2 likes
  • 2 in conversation