BookmarkSubscribeRSS Feed
ChrilleB
Calcite | Level 5
Hi!

I really need help with some SAS coding (and some logic thinking perhaps?) 🙂

I have a dataset which contains some variables, this one is named followUp_complete. From this dataset I extracted the variable ID to create a new dataset, this one is named bipolar.nr_followUpTemp.

If I group my dataset (bipolar.nr_followUpTemp) after ID, the groups are at different sizes (going from 1 to 5 observations per group). My next step is to make the groups the same size, so that only the two first observations recorded are saved, and discard the other observations (i.e. if there are obs. no. 3 and/or 4 and/or 5).

My thoughts was to create a loop that made a count for each observation within the group. The loop starts over again when a new group comes in the dataset. I would discard observations with a value above 2 - the only problem is that I cant create this loop on my own. All I get is crap 🙂


This is what I'm working with:

*Counts the number of observation each group has;
PROC SQL;
CREATE TABLE bipolar.nr_followUpTemp as
SELECT id, count(ID) AS nrFollowUp
FROM bipolar.followUp_complete
GROUP BY ID;
RUN;

*Sorts the complete dataset which contains stuff which I'd like to include in my final dataset;
PROC SORT DATA = bipolar.followUp_complete;
BY id;
RUN;

*Sort the number of observations in each group by ID;
PROC SORT DATA = bipolar.nr_followUpTemp;
BY id;
RUN;

/*Mergin the two datasets and also create a loop... The loop clearly doesnt work since it makes a count for each observation.
*/
DATA bipolar.test;
MERGE bipolar.followUp_complete bipolar.nr_followUpTemp;
BY id;
DO counter = 1 TO nrFollowUp;
OUTPUT;
END;
RUN;

Is it possible to adjust the loop somehow so it only does a run for each group, and not each observation?

Thank you for your time!

/Christian.
3 REPLIES 3
RickM
Fluorite | Level 6
You most likely want to look into using "first."
Patrick
Opal | Level 21
Your merge statement combines all records from bipolar.followUp_complete with bipolar.nr_followUpTemp. This is a many to one merge and the result set contains still all records of bipolar.followUp_complete.

Now you add a loop from 1 to the number of observations in this ID (nrFollowUp) and you write every single observation several times (nrFollowUp) to the output dataset. May be not quite what you had in mind...

As Rick proposes: You might want to use "first".

Something like the following (untested) code might do:

PROC SORT DATA = bipolar.followUp_complete;
BY id;
RUN;

data bipolar.test;
set bipolar.followUp_complete;
by id;
retain counter;
if first.id then counter=0; /* reset counter if a new by group starts */
counter+1; /* increment counter */
/* write obs to output dataset if first or second obs in by group */
if counter LE 2 then output;
run;

HTH
Patrick LE instead of "bracket equal" as this site has some issues with this...


Message was edited by: Patrick
ChrilleB
Calcite | Level 5
Thank you! It works like a charm 🙂
Much appreciated!

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!

Submit your idea!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 4985 views
  • 0 likes
  • 3 in conversation