Hi,
From the below data, how to get first record based on pid and disease. (should work only on pid with duplicate disease condition)
data abc;
input pid age disease $ country $ sno;
cards;
101 23 sarcoma US 1
101 23 sarcoma US 2
102 43 pneumonia China 1
103 56 syphilis Russia 2
103 52 syphilis Russia 1
103 57 Cpox Russia 3
103 58 Cpox Russia 4
103 59 Cpox Russia 5
103 59 Rpox Russia 6
104 75 Spox Uzbek 1
104 82 Spox Uzbek 2
104 12 Asthma Uzbek 3
104 13 Asthma Uzbek 4
104 14 Asthma Uzbek 5
;
Output
pid | age | disease | country | sno |
101 | 23 | sarcoma | US | 1 |
103 | 52 | syphilis | Russia | 1 |
103 | 57 | Cpox | Russia | 3 |
104 | 75 | Spox | Uzbek | 1 |
104 | 12 | Asthma | Uzbek | 3 |
If you are allowed to sort your data, that would be the safest beginning:
proc sort data=have;
by pid disease sno;
run;
After that, you could continue:
data want;
set have;
by pid disease;
if first.disease=1 and last.disease=0;
run;
If you are not allowed to sort your data, you have to assume that they are properly grouped (all records for the same pid/disease appear together, in order). You could then code:
data want;
set have;
by pid disease notsorted;
if first.disease=1 and last.disease=0;
run;
So the result is safer if you are allowed to sort. But it's still obtainable if you can't sort, as long as the data behave.
Good luck.
UNTESTED CODE
proc sort data=abc;
by pid disease;
run;
data abc2;
set abc;
by pid disease;
if first.disease;
run;
If you are allowed to sort your data, that would be the safest beginning:
proc sort data=have;
by pid disease sno;
run;
After that, you could continue:
data want;
set have;
by pid disease;
if first.disease=1 and last.disease=0;
run;
If you are not allowed to sort your data, you have to assume that they are properly grouped (all records for the same pid/disease appear together, in order). You could then code:
data want;
set have;
by pid disease notsorted;
if first.disease=1 and last.disease=0;
run;
So the result is safer if you are allowed to sort. But it's still obtainable if you can't sort, as long as the data behave.
Good luck.
I have never heard of a situation where you are not allowed to sort your data.
Does that actually happen? For what reasons would you not be allowed to sort your data?
Paige,
There are situations where you wouldn't want to sort a data set ... size of the data set, existence of indices. But even without good reason, the world of SAS provides many sources of incredible situations. Here are just a few I have either encountered or heard about from others.
One supervisor would not allow a MERGE statement, forcing programmers to use IF/THEN instead. MERGE is just too difficult to master.
Student comments and questions ... well real life is stranger than you could imagine.
x=2;
Student question: Why would you want to do that?
a = b + c;
Student comment/question: You can't add letters.
Sometimes students even produce code that works (or at least generates no errors) but is suitable for a puzzle:
if a = 1 or 2 then b=3 and c=4;
In SAS, as in real life, the truth can be stranger than fiction.
To complement @Astounding's ideas, if your records are properly grouped (all records for the same pid/disease appear together) but not always in order, you could do:
data want;
do until(last.disease);
set abc; by pid disease notsorted;
firstSno = min(firstSno, sno);
end;
/* To skip groups with a single record */
if first.disease then call missing(firstSno);
do until(last.disease);
set abc; by pid disease notsorted;
if sno = firstSno then do;
output;
/* To get only the first record of ties */
call missing(firstSno);
end;
end;
drop firstSno;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.