BookmarkSubscribeRSS Feed
ChrilleB
Calcite | Level 5
Hi!

I really need help with some SAS coding (and some logic thinking perhaps?) 🙂

I have a dataset which contains some variables, this one is named followUp_complete. From this dataset I extracted the variable ID to create a new dataset, this one is named bipolar.nr_followUpTemp.

If I group my dataset (bipolar.nr_followUpTemp) after ID, the groups are at different sizes (going from 1 to 5 observations per group). My next step is to make the groups the same size, so that only the two first observations recorded are saved, and discard the other observations (i.e. if there are obs. no. 3 and/or 4 and/or 5).

My thoughts was to create a loop that made a count for each observation within the group. The loop starts over again when a new group comes in the dataset. I would discard observations with a value above 2 - the only problem is that I cant create this loop on my own. All I get is crap 🙂


This is what I'm working with:

*Counts the number of observation each group has;
PROC SQL;
CREATE TABLE bipolar.nr_followUpTemp as
SELECT id, count(ID) AS nrFollowUp
FROM bipolar.followUp_complete
GROUP BY ID;
RUN;

*Sorts the complete dataset which contains stuff which I'd like to include in my final dataset;
PROC SORT DATA = bipolar.followUp_complete;
BY id;
RUN;

*Sort the number of observations in each group by ID;
PROC SORT DATA = bipolar.nr_followUpTemp;
BY id;
RUN;

/*Mergin the two datasets and also create a loop... The loop clearly doesnt work since it makes a count for each observation.
*/
DATA bipolar.test;
MERGE bipolar.followUp_complete bipolar.nr_followUpTemp;
BY id;
DO counter = 1 TO nrFollowUp;
OUTPUT;
END;
RUN;

Is it possible to adjust the loop somehow so it only does a run for each group, and not each observation?

Thank you for your time!

/Christian.
3 REPLIES 3
RickM
Fluorite | Level 6
You most likely want to look into using "first."
Patrick
Opal | Level 21
Your merge statement combines all records from bipolar.followUp_complete with bipolar.nr_followUpTemp. This is a many to one merge and the result set contains still all records of bipolar.followUp_complete.

Now you add a loop from 1 to the number of observations in this ID (nrFollowUp) and you write every single observation several times (nrFollowUp) to the output dataset. May be not quite what you had in mind...

As Rick proposes: You might want to use "first".

Something like the following (untested) code might do:

PROC SORT DATA = bipolar.followUp_complete;
BY id;
RUN;

data bipolar.test;
set bipolar.followUp_complete;
by id;
retain counter;
if first.id then counter=0; /* reset counter if a new by group starts */
counter+1; /* increment counter */
/* write obs to output dataset if first or second obs in by group */
if counter LE 2 then output;
run;

HTH
Patrick LE instead of "bracket equal" as this site has some issues with this...


Message was edited by: Patrick
ChrilleB
Calcite | Level 5
Thank you! It works like a charm 🙂
Much appreciated!

sas-innovate-white.png

Our biggest data and AI event of the year.

Don’t miss the livestream kicking off May 7. It’s free. It’s easy. And it’s the best seat in the house.

Join us virtually with our complimentary SAS Innovate Digital Pass. Watch live or on-demand in multiple languages, with translations available to help you get the most out of every session.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 5322 views
  • 0 likes
  • 3 in conversation