Do-loops that loop over groups, and not observations?

ChrilleB · Posted 04-13-2010 09:32 AM

Hi!

I really need help with some SAS coding (and some logic thinking perhaps?) 🙂

I have a dataset which contains some variables, this one is named followUp_complete. From this dataset I extracted the variable ID to create a new dataset, this one is named bipolar.nr_followUpTemp.

If I group my dataset (bipolar.nr_followUpTemp) after ID, the groups are at different sizes (going from 1 to 5 observations per group). My next step is to make the groups the same size, so that only the two first observations recorded are saved, and discard the other observations (i.e. if there are obs. no. 3 and/or 4 and/or 5).

My thoughts was to create a loop that made a count for each observation within the group. The loop starts over again when a new group comes in the dataset. I would discard observations with a value above 2 - the only problem is that I cant create this loop on my own. All I get is crap 🙂

This is what I'm working with:

*Counts the number of observation each group has;
PROC SQL;
CREATE TABLE bipolar.nr_followUpTemp as
SELECT id, count(ID) AS nrFollowUp
FROM bipolar.followUp_complete
GROUP BY ID;
RUN;

*Sorts the complete dataset which contains stuff which I'd like to include in my final dataset;
PROC SORT DATA = bipolar.followUp_complete;
BY id;
RUN;

*Sort the number of observations in each group by ID;
PROC SORT DATA = bipolar.nr_followUpTemp;
BY id;
RUN;

/*Mergin the two datasets and also create a loop... The loop clearly doesnt work since it makes a count for each observation.
*/
DATA bipolar.test;
MERGE bipolar.followUp_complete bipolar.nr_followUpTemp;
BY id;
DO counter = 1 TO nrFollowUp;
OUTPUT;
END;
RUN;

Is it possible to adjust the loop somehow so it only does a run for each group, and not each observation?

Thank you for your time!

/Christian.

RickM · Posted 04-13-2010 10:14 AM

You most likely want to look into using "first."

Patrick · Posted 04-13-2010 10:42 AM

Your merge statement combines all records from bipolar.followUp_complete with bipolar.nr_followUpTemp. This is a many to one merge and the result set contains still all records of bipolar.followUp_complete.

Now you add a loop from 1 to the number of observations in this ID (nrFollowUp) and you write every single observation several times (nrFollowUp) to the output dataset. May be not quite what you had in mind...

As Rick proposes: You might want to use "first".

Something like the following (untested) code might do:

PROC SORT DATA = bipolar.followUp_complete;
BY id;
RUN;

data bipolar.test;
set bipolar.followUp_complete;
by id;
retain counter;
if first.id then counter=0; /* reset counter if a new by group starts */
counter+1; /* increment counter */
/* write obs to output dataset if first or second obs in by group */
if counter LE 2 then output;
run;

HTH
Patrick LE instead of "bracket equal" as this site has some issues with this...

Message was edited by: Patrick

ChrilleB · Posted 04-14-2010 05:02 AM

Thank you! It works like a charm 🙂
Much appreciated!

Do-loops that loop over groups, and not observations?

Re: Do-loops that loop over groups, and not observations?

Re: Do-loops that loop over groups, and not observations?

Re: Do-loops that loop over groups, and not observations?

Catch up on SAS Innovate 2026

Do-loops that loop over groups, and not observations?

Re: Do-loops that loop over groups, and not observations?

Re: Do-loops that loop over groups, and not observations?

Re: Do-loops that loop over groups, and not observations?

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away