Re: How to aggregate individual responses into cluster level - complic...

yqsmm · Posted 11-17-2016 01:54 PM

Hi,

I have a dataset as below. There are different number of responses nested within each cluster, each response came from a subject with same or different job levels, I would like to aggregate the information of V1 and V2 to the cluster level using the following criteria:

1. Use the values of V1 and V2 from the subject with the highest job level (1>2>3).

2. If there are more than one responses from the highest job level, the cluster level variable will be 1 if any response from the highest job level is 1.

3. If there are more than one responses from the highest job level and some of them are missing, the cluster level variable will use the response from the nonmissing values if it is 0.

4. If V1 or V2 from the highest job level are all missing, then move to the next job level.

data a;
input clusterid joblevel v1 v2;
datalines;
1 1 . 1
1 1 . 1
1 3 0 0
1 3 0 .
1 3 1 0
2 1 1 1
2 2 0 .
2 3 . 1
2 3 0 0
2 3 1 .
3 1 . 1
3 1 0 0
3 2 1 1
3 2 1 0
3 3 0 0
3 3 . .
3 3 0 1
;
run;

I am hoping to get a data like this:

clusterid V1 V2

1 1 1

2 1 1

3 0 1

Thanks A lot~

PeterClemmensen · Posted 11-17-2016 02:34 PM

Im not sure I understand your logic completely 🙂

You write

3. If there are more than one responses from the highest job level and some of them are missing, the cluster level variable will use the response from the nonmissing values if it is 0.

But isn't that exactly the case in the first case where clusterid = 1? then you want to look only at responses from joblevel 1, where you have missing values, so you want to use the response from the nonmissing if it is zero? but the nonmissing value there is not zero, it is 1? 🙂

The DATA to DATA Step Macro
Blog: SASnrd

yqsmm · Posted 11-17-2016 03:45 PM

Sorry it is confusing. I should delete 'if it is 0'. It should be:

3. If there are more than one responses from the highest job level and some of them are missing, the cluster level variable will use the response from the nonmissing values.

Cluster 3 is the scenario for the criteria. There are two responses from job level 1, The first record of joblevel 1 has missing on V1, but the second has value 0. So for cluster3 V1, we use the value 0.

Among Cluster 1, the two records from the highest joblevel (1) are all missing on V1, therefore we need to use records from joblevel 3 (next level).

Does this make sense?

ballardw · Posted 11-17-2016 04:15 PM

If often helps to show what the actual output for your example input data would be.

yqsmm · Posted 11-17-2016 05:33 PM

Yes. The data I want is at the end of the original post.

mkeintz · Posted 11-17-2016 04:30 PM

Question: Is the program supposed to tread V1 and V2 independently?

mk

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

yqsmm · Posted 11-17-2016 05:33 PM

Yes. So there will be two cluster level variable V1 and V2.

mkeintz · Posted 11-17-2016 08:58 PM

If appears that within each clusterid, you data is sorted by joblevel. So if the desired result for V1 is produced within a given job level, then V1 is finished, and subsequent job levels should not be used. Same for V2. The program below nests of loop over each joblevel within its clusterid. It

At the end of processing the clusterid, it copies result values back to the original vars, ready for output.
At the end of each joblevel, if the result is non-missing it declare that variable as finished.
For every record in every subsequent joblevel, if a variable is marked finished, it is not processed further
The CONTINUE statement tells the loop to skip directly to the next iteration, so that one var may be marked finished, but the other var is still processed.

data want (drop=result: finished: I);
  
  do until (last.clusterid);
    do until (last.joblevel);
      set a;
      by clusterid joblevel;

      array v{2} v1 v2; 
      array result{2};  
      array finished{2}; 

      do I=1 to 2;
        if finished{I}=1 then continue;
        if v{I}=1 then result{I}=1;
        else if result{I}=. then result{I}=v{I};
      end; 
    end;   

    do I=1 to 2;
      if not(missing(result{I})) then finished{I}=1;
    end;
  end;     
  do I=1 to 2;
    v{I}=result{I};
  end;
  put (clusterid joblevel v1 v2) (:);
run;

Regards,

Mark

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Ksharp · Posted 11-18-2016 04:24 AM

Your question is kind of hard to understand.



data a;
input clusterid joblevel v1 v2;
datalines;
1 1 . 1
1 1 . 1
1 3 0 0
1 3 0 .
1 3 1 0
2 1 1 1
2 2 0 .
2 3 . 1
2 3 0 0
2 3 1 .
3 1 . 1
3 1 0 0
3 2 1 1
3 2 1 0
3 3 0 0
3 3 . .
3 3 0 1
;
run;
data want;
do until(last.joblevel);
 set a;
 by clusterid joblevel notsorted;
 if first.clusterid then first=1;
 new_v1=coalesce(new_v1,v1);
 new_v2=coalesce(new_v2,v2);
end;
 new_v1=coalesce(new_v1,1);
 new_v2=coalesce(new_v2,1);
if first then output;
drop first;
run;

How to aggregate individual responses into cluster level - complicate cases

Re: How to aggregate individual responses into cluster level - complicate cases

Re: How to aggregate individual responses into cluster level - complicate cases

Re: How to aggregate individual responses into cluster level - complicate cases

Re: How to aggregate individual responses into cluster level - complicate cases

Re: How to aggregate individual responses into cluster level - complicate cases

Re: How to aggregate individual responses into cluster level - complicate cases

Re: How to aggregate individual responses into cluster level - complicate cases

Re: How to aggregate individual responses into cluster level - complicate cases

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away