DATA Step, Macro, Functions and more

How to aggregate individual responses into cluster level - complicate cases

Reply
New Contributor
Posts: 4

How to aggregate individual responses into cluster level - complicate cases

Hi,

 

I have a dataset as below. There are different number of responses nested within each cluster, each response came from a subject with same or different job levels,  I would like to aggregate the information of V1 and V2 to the cluster level using the following criteria:

1. Use the values of V1 and V2 from the subject with the highest job level (1>2>3).

2. If there are more than one responses from the highest job level, the cluster level variable will be 1 if any response from the highest job level is 1.

3. If there are more than one responses from the highest job level and some of them are missing, the cluster level variable will use the response from the nonmissing values if it is 0.

4. If V1 or V2 from the highest job level are all missing, then move to the next job level.  


data a;
input clusterid joblevel v1 v2;
datalines;
1 1 . 1
1 1 . 1
1 3 0 0
1 3 0 .
1 3 1 0
2 1 1 1
2 2 0 .
2 3 . 1
2 3 0 0
2 3 1 .
3 1 . 1
3 1 0 0
3 2 1 1
3 2 1 0
3 3 0 0
3 3 . .
3 3 0 1
;
run;

 

I am hoping to get a data like this:

clusterid V1 V2

1 1 1

2 1 1

3 0 1

 

Thanks A lot~

PROC Star
Posts: 552

Re: How to aggregate individual responses into cluster level - complicate cases

Im not sure I understand your logic completely Smiley Happy

 

You write

 

3. If there are more than one responses from the highest job level and some of them are missing, the cluster level variable will use the response from the nonmissing values if it is 0.

 

But isn't that exactly the case in the first case where clusterid = 1? then you want to look only at responses from joblevel 1, where you have missing values, so you want to use the response from the nonmissing if it is zero? but the nonmissing value there is not zero, it is 1? Smiley Happy

 

New Contributor
Posts: 4

Re: How to aggregate individual responses into cluster level - complicate cases

Sorry it is confusing.  I should delete 'if it is 0'. It should be:

3. If there are more than one responses from the highest job level and some of them are missing, the cluster level variable will use the response from the nonmissing values. 

 

Cluster 3 is the scenario for the criteria. There are two responses from job level 1, The first record of joblevel 1 has missing on V1, but the second has value 0. So for cluster3 V1, we use the value 0.

 

Among Cluster 1, the two records from the highest joblevel (1) are all missing on V1, therefore we need to use records from joblevel 3 (next level).   

 

Does this make sense? 

Super User
Posts: 10,500

Re: How to aggregate individual responses into cluster level - complicate cases

If often helps to show what the actual output for your example input data would be.

New Contributor
Posts: 4

Re: How to aggregate individual responses into cluster level - complicate cases

Yes. The data I want is at the end of the original post.
Valued Guide
Posts: 797

Re: How to aggregate individual responses into cluster level - complicate cases

Question:  Is the program supposed to tread V1 and V2 independently?

 

mk

New Contributor
Posts: 4

Re: How to aggregate individual responses into cluster level - complicate cases

Yes. So there will be two cluster level variable V1 and V2.
Valued Guide
Posts: 797

Re: How to aggregate individual responses into cluster level - complicate cases

If appears that within each clusterid, you data is sorted by joblevel.  So if the desired result for V1 is produced within a given job level, then V1 is finished, and subsequent job levels should not be used.  Same for V2.  The program below nests of loop over each joblevel within its clusterid.  It

  1. At the end of processing the clusterid, it copies result values back to the original vars, ready for output.
  2. At the end of each joblevel, if the result is non-missing it declare that variable as finished.
  3. For every record in every subsequent joblevel, if a variable is marked finished, it is not processed further
  4. The CONTINUE statement tells the loop to skip directly to the next iteration, so that one var may be marked finished, but the other var is still processed. 
data want (drop=result: finished: I);
  
  do until (last.clusterid);
    do until (last.joblevel);
      set a;
      by clusterid joblevel;

      array v{2} v1 v2; 
      array result{2};  
      array finished{2}; 

      do I=1 to 2;
        if finished{I}=1 then continue;
        if v{I}=1 then result{I}=1;
        else if result{I}=. then result{I}=v{I};
      end; 
    end;   

    do I=1 to 2;
      if not(missing(result{I})) then finished{I}=1;
    end;
  end;     
  do I=1 to 2;
    v{I}=result{I};
  end;
  put (clusterid joblevel v1 v2) (:);
run;

 

Regards,

Mark

Super User
Posts: 9,681

Re: How to aggregate individual responses into cluster level - complicate cases

Your question is kind of hard to understand.



data a;
input clusterid joblevel v1 v2;
datalines;
1 1 . 1
1 1 . 1
1 3 0 0
1 3 0 .
1 3 1 0
2 1 1 1
2 2 0 .
2 3 . 1
2 3 0 0
2 3 1 .
3 1 . 1
3 1 0 0
3 2 1 1
3 2 1 0
3 3 0 0
3 3 . .
3 3 0 1
;
run;
data want;
do until(last.joblevel);
 set a;
 by clusterid joblevel notsorted;
 if first.clusterid then first=1;
 new_v1=coalesce(new_v1,v1);
 new_v2=coalesce(new_v2,v2);
end;
 new_v1=coalesce(new_v1,1);
 new_v2=coalesce(new_v2,1);
if first then output;
drop first;
run;

Ask a Question
Discussion stats
  • 8 replies
  • 239 views
  • 0 likes
  • 5 in conversation