turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- Base SAS Programming
- /
- How to aggregate individual responses into cluster...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

11-17-2016 01:54 PM

Hi,

I have a dataset as below. There are different number of responses nested within each cluster, each response came from a subject with same or different job levels, I would like to aggregate the information of V1 and V2 to the cluster level using the following criteria:

1. Use the values of V1 and V2 from the subject with the highest job level (1>2>3).

2. If there are more than one responses from the highest job level, the cluster level variable will be 1 if any response from the highest job level is 1.

3. If there are more than one responses from the highest job level and some of them are missing, the cluster level variable will use the response from the nonmissing values if it is 0.

4. If V1 or V2 from the highest job level are all missing, then move to the next job level.

data a;

input clusterid joblevel v1 v2;

datalines;

1 1 . 1

1 1 . 1

1 3 0 0

1 3 0 .

1 3 1 0

2 1 1 1

2 2 0 .

2 3 . 1

2 3 0 0

2 3 1 .

3 1 . 1

3 1 0 0

3 2 1 1

3 2 1 0

3 3 0 0

3 3 . .

3 3 0 1

;

run;

I am hoping to get a data like this:

clusterid V1 V2

1 1 1

2 1 1

3 0 1

Thanks A lot~

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

11-17-2016 02:34 PM

Im not sure I understand your logic completely

You write

3. If there are more than one responses from the highest job level and some of them are missing, the cluster level variable will use the response from the nonmissing values if it is 0.

But isn't that exactly the case in the first case where clusterid = 1? then you want to look only at responses from joblevel 1, where you have missing values, so you want to use the response from the nonmissing if it is zero? but the nonmissing value there is not zero, it is 1?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

11-17-2016 03:45 PM

Sorry it is confusing. I should delete 'if it is 0'. It should be:

3. If there are more than one responses from the highest job level and some of them are missing, the cluster level variable will use the response from the nonmissing values.

Cluster 3 is the scenario for the criteria. There are two responses from job level 1, The first record of joblevel 1 has missing on V1, but the second has value 0. So for cluster3 V1, we use the value 0.

Among Cluster 1, the two records from the highest joblevel (1) are all missing on V1, therefore we need to use records from joblevel 3 (next level).

Does this make sense?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

11-17-2016 04:15 PM

If often helps to show what the actual output for your example input data would be.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

11-17-2016 05:33 PM

Yes. The data I want is at the end of the original post.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

11-17-2016 04:30 PM

Question: Is the program supposed to tread V1 and V2 independently?

mk

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

11-17-2016 05:33 PM

Yes. So there will be two cluster level variable V1 and V2.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

11-17-2016 08:58 PM

If appears that within each clusterid, you data is sorted by joblevel. So if the desired result for V1 is produced within a given job level, then V1 is finished, and subsequent job levels should not be used. Same for V2. The program below nests of loop over each joblevel within its clusterid. It

- At the end of processing the clusterid, it copies result values back to the original vars, ready for output.
- At the end of each joblevel, if the result is non-missing it declare that variable as finished.
- For every record in every subsequent joblevel, if a variable is marked finished, it is not processed further
- The CONTINUE statement tells the loop to skip directly to the next iteration, so that one var may be marked finished, but the other var is still processed.

```
data want (drop=result: finished: I);
do until (last.clusterid);
do until (last.joblevel);
set a;
by clusterid joblevel;
array v{2} v1 v2;
array result{2};
array finished{2};
do I=1 to 2;
if finished{I}=1 then continue;
if v{I}=1 then result{I}=1;
else if result{I}=. then result{I}=v{I};
end;
end;
do I=1 to 2;
if not(missing(result{I})) then finished{I}=1;
end;
end;
do I=1 to 2;
v{I}=result{I};
end;
put (clusterid joblevel v1 v2) (:);
run;
```

Regards,

Mark

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

11-18-2016 04:24 AM

Your question is kind of hard to understand. data a; input clusterid joblevel v1 v2; datalines; 1 1 . 1 1 1 . 1 1 3 0 0 1 3 0 . 1 3 1 0 2 1 1 1 2 2 0 . 2 3 . 1 2 3 0 0 2 3 1 . 3 1 . 1 3 1 0 0 3 2 1 1 3 2 1 0 3 3 0 0 3 3 . . 3 3 0 1 ; run; data want; do until(last.joblevel); set a; by clusterid joblevel notsorted; if first.clusterid then first=1; new_v1=coalesce(new_v1,v1); new_v2=coalesce(new_v2,v2); end; new_v1=coalesce(new_v1,1); new_v2=coalesce(new_v2,1); if first then output; drop first; run;