- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Posted 04-13-2010 09:32 AM
(5150 views)
Hi!
I really need help with some SAS coding (and some logic thinking perhaps?) 🙂
I have a dataset which contains some variables, this one is named followUp_complete. From this dataset I extracted the variable ID to create a new dataset, this one is named bipolar.nr_followUpTemp.
If I group my dataset (bipolar.nr_followUpTemp) after ID, the groups are at different sizes (going from 1 to 5 observations per group). My next step is to make the groups the same size, so that only the two first observations recorded are saved, and discard the other observations (i.e. if there are obs. no. 3 and/or 4 and/or 5).
My thoughts was to create a loop that made a count for each observation within the group. The loop starts over again when a new group comes in the dataset. I would discard observations with a value above 2 - the only problem is that I cant create this loop on my own. All I get is crap 🙂
This is what I'm working with:
*Counts the number of observation each group has;
PROC SQL;
CREATE TABLE bipolar.nr_followUpTemp as
SELECT id, count(ID) AS nrFollowUp
FROM bipolar.followUp_complete
GROUP BY ID;
RUN;
*Sorts the complete dataset which contains stuff which I'd like to include in my final dataset;
PROC SORT DATA = bipolar.followUp_complete;
BY id;
RUN;
*Sort the number of observations in each group by ID;
PROC SORT DATA = bipolar.nr_followUpTemp;
BY id;
RUN;
/*Mergin the two datasets and also create a loop... The loop clearly doesnt work since it makes a count for each observation.
*/
DATA bipolar.test;
MERGE bipolar.followUp_complete bipolar.nr_followUpTemp;
BY id;
DO counter = 1 TO nrFollowUp;
OUTPUT;
END;
RUN;
Is it possible to adjust the loop somehow so it only does a run for each group, and not each observation?
Thank you for your time!
/Christian.
I really need help with some SAS coding (and some logic thinking perhaps?) 🙂
I have a dataset which contains some variables, this one is named followUp_complete. From this dataset I extracted the variable ID to create a new dataset, this one is named bipolar.nr_followUpTemp.
If I group my dataset (bipolar.nr_followUpTemp) after ID, the groups are at different sizes (going from 1 to 5 observations per group). My next step is to make the groups the same size, so that only the two first observations recorded are saved, and discard the other observations (i.e. if there are obs. no. 3 and/or 4 and/or 5).
My thoughts was to create a loop that made a count for each observation within the group. The loop starts over again when a new group comes in the dataset. I would discard observations with a value above 2 - the only problem is that I cant create this loop on my own. All I get is crap 🙂
This is what I'm working with:
*Counts the number of observation each group has;
PROC SQL;
CREATE TABLE bipolar.nr_followUpTemp as
SELECT id, count(ID) AS nrFollowUp
FROM bipolar.followUp_complete
GROUP BY ID;
RUN;
*Sorts the complete dataset which contains stuff which I'd like to include in my final dataset;
PROC SORT DATA = bipolar.followUp_complete;
BY id;
RUN;
*Sort the number of observations in each group by ID;
PROC SORT DATA = bipolar.nr_followUpTemp;
BY id;
RUN;
/*Mergin the two datasets and also create a loop... The loop clearly doesnt work since it makes a count for each observation.
*/
DATA bipolar.test;
MERGE bipolar.followUp_complete bipolar.nr_followUpTemp;
BY id;
DO counter = 1 TO nrFollowUp;
OUTPUT;
END;
RUN;
Is it possible to adjust the loop somehow so it only does a run for each group, and not each observation?
Thank you for your time!
/Christian.
3 REPLIES 3
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You most likely want to look into using "first."
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Your merge statement combines all records from bipolar.followUp_complete with bipolar.nr_followUpTemp. This is a many to one merge and the result set contains still all records of bipolar.followUp_complete.
Now you add a loop from 1 to the number of observations in this ID (nrFollowUp) and you write every single observation several times (nrFollowUp) to the output dataset. May be not quite what you had in mind...
As Rick proposes: You might want to use "first".
Something like the following (untested) code might do:
PROC SORT DATA = bipolar.followUp_complete;
BY id;
RUN;
data bipolar.test;
set bipolar.followUp_complete;
by id;
retain counter;
if first.id then counter=0; /* reset counter if a new by group starts */
counter+1; /* increment counter */
/* write obs to output dataset if first or second obs in by group */
if counter LE 2 then output;
run;
HTH
Patrick LE instead of "bracket equal" as this site has some issues with this...
Message was edited by: Patrick
Now you add a loop from 1 to the number of observations in this ID (nrFollowUp) and you write every single observation several times (nrFollowUp) to the output dataset. May be not quite what you had in mind...
As Rick proposes: You might want to use "first".
Something like the following (untested) code might do:
PROC SORT DATA = bipolar.followUp_complete;
BY id;
RUN;
data bipolar.test;
set bipolar.followUp_complete;
by id;
retain counter;
if first.id then counter=0; /* reset counter if a new by group starts */
counter+1; /* increment counter */
/* write obs to output dataset if first or second obs in by group */
if counter LE 2 then output;
run;
HTH
Patrick LE instead of "bracket equal" as this site has some issues with this...
Message was edited by: Patrick
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you! It works like a charm 🙂
Much appreciated!
Much appreciated!