Hi all,
Have
id Date Subject 1 Subject 2
1 July 10, 2020 Math Science
1 July 10, 2020 . Science
1 July 10, 2020 . Science
2 Feb 10, 2020 History Physics
2 Feb 10, 2020 Biology Physics
3 Jun 24, 2020 . Math
3 Jun 24, 2020 . Math
4 May 7, 2020 History Math
4 May 7, 2020 English Math
4 May 7, 2020 . Math
5 Feb 2, 2020 . French
5 Feb 2, 2020 . French
Want
id Date Subject
1 July 10, 2020 Math
2 Feb 10, 2020 History
2 Feb 10, 2020 Biology
3 Jun 24, 2020 Math
4 May 7, 2020 History
4 May 7, 2020 English
5 Feb 2, 2020 French
So, basically, I want to replace subject 1 with subject 2 only when there is no subject mentioned for each student id.
Any help is appreciated.
data have;
input (id Date Subject1 Subject2 ) (& $40.);
cards;
1 July 10, 2020 Math Science
1 July 10, 2020 . Science
1 July 10, 2020 . Science
2 Feb 10, 2020 History Physics
2 Feb 10, 2020 Biology Physics
3 Jun 24, 2020 . Math
3 Jun 24, 2020 . Math
4 May 7, 2020 History Math
4 May 7, 2020 English Math
4 May 7, 2020 . Math
5 Feb 2, 2020 . French
5 Feb 2, 2020 . French
;
proc sql;
create table want as
select id,date,Subject1 as Subject
from have
where Subject1 is not missing
union
select id,date,max(Subject2)
from have
group by id,date
having n(Subject1)=0;
quit;
Please explain why ID 1 has 3 rows in the input data set and one row in the output data set. Please explain why ID 4 has 3 rows in the input data set and 2 rows in the output data set. Please explain why ID 3 has 2 rows in the input data set and 1 row in the output data set.
For the final data that I want each unique subject 1 needs to be there in the dataset. Since ID1 has Math and 2 missing values in subject 1 the final data will have only math (unique values from subject 1).
For ID3 since subject 1 has 2 unique values I want to keep both of them.
And only if there is no information on subject 1 I want to replace it with subject 2.
Thank you!
Does the DATE value actually have any impact in this?
I understand your objective as this:
It appears that whenever condition 2 is the case, that subject2 is a constant for that ID.
No guarantees below, in the absence of sample data presented in the form of a working data step:
data want (drop=_:);
set have;
by id;
if first.id then call missing(_subject1_count);
_subject1_count+ (subject1^=' ');
if subject1^=' ' or (last.id=1 and _subject1_count=0);
subject=coalescec(subject1,subject2);
run;
Edited note: And it also appears that each ID has only a single value for DATE, which makes condition2 easier to honor.
data have;
input (id Date Subject1 Subject2 ) (& $40.);
cards;
1 July 10, 2020 Math Science
1 July 10, 2020 . Science
1 July 10, 2020 . Science
2 Feb 10, 2020 History Physics
2 Feb 10, 2020 Biology Physics
3 Jun 24, 2020 . Math
3 Jun 24, 2020 . Math
4 May 7, 2020 History Math
4 May 7, 2020 English Math
4 May 7, 2020 . Math
5 Feb 2, 2020 . French
5 Feb 2, 2020 . French
;
proc sql;
create table want as
select id,date,Subject1 as Subject
from have
where Subject1 is not missing
union
select id,date,max(Subject2)
from have
group by id,date
having n(Subject1)=0;
quit;
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.