Hi all,
Have
id Date Subject 1 Subject 2
1 July 10, 2020 Math Science
1 July 10, 2020 . Science
1 July 10, 2020 . Science
2 Feb 10, 2020 History Physics
2 Feb 10, 2020 Biology Physics
3 Jun 24, 2020 . Math
3 Jun 24, 2020 . Math
4 May 7, 2020 History Math
4 May 7, 2020 English Math
4 May 7, 2020 . Math
5 Feb 2, 2020 . French
5 Feb 2, 2020 . French
Want
id Date Subject
1 July 10, 2020 Math
2 Feb 10, 2020 History
2 Feb 10, 2020 Biology
3 Jun 24, 2020 Math
4 May 7, 2020 History
4 May 7, 2020 English
5 Feb 2, 2020 French
So, basically, I want to replace subject 1 with subject 2 only when there is no subject mentioned for each student id.
Any help is appreciated.
data have;
input (id Date Subject1 Subject2 ) (& $40.);
cards;
1 July 10, 2020 Math Science
1 July 10, 2020 . Science
1 July 10, 2020 . Science
2 Feb 10, 2020 History Physics
2 Feb 10, 2020 Biology Physics
3 Jun 24, 2020 . Math
3 Jun 24, 2020 . Math
4 May 7, 2020 History Math
4 May 7, 2020 English Math
4 May 7, 2020 . Math
5 Feb 2, 2020 . French
5 Feb 2, 2020 . French
;
proc sql;
create table want as
select id,date,Subject1 as Subject
from have
where Subject1 is not missing
union
select id,date,max(Subject2)
from have
group by id,date
having n(Subject1)=0;
quit;
Please explain why ID 1 has 3 rows in the input data set and one row in the output data set. Please explain why ID 4 has 3 rows in the input data set and 2 rows in the output data set. Please explain why ID 3 has 2 rows in the input data set and 1 row in the output data set.
For the final data that I want each unique subject 1 needs to be there in the dataset. Since ID1 has Math and 2 missing values in subject 1 the final data will have only math (unique values from subject 1).
For ID3 since subject 1 has 2 unique values I want to keep both of them.
And only if there is no information on subject 1 I want to replace it with subject 2.
Thank you!
Does the DATE value actually have any impact in this?
I understand your objective as this:
It appears that whenever condition 2 is the case, that subject2 is a constant for that ID.
No guarantees below, in the absence of sample data presented in the form of a working data step:
data want (drop=_:);
set have;
by id;
if first.id then call missing(_subject1_count);
_subject1_count+ (subject1^=' ');
if subject1^=' ' or (last.id=1 and _subject1_count=0);
subject=coalescec(subject1,subject2);
run;
Edited note: And it also appears that each ID has only a single value for DATE, which makes condition2 easier to honor.
data have;
input (id Date Subject1 Subject2 ) (& $40.);
cards;
1 July 10, 2020 Math Science
1 July 10, 2020 . Science
1 July 10, 2020 . Science
2 Feb 10, 2020 History Physics
2 Feb 10, 2020 Biology Physics
3 Jun 24, 2020 . Math
3 Jun 24, 2020 . Math
4 May 7, 2020 History Math
4 May 7, 2020 English Math
4 May 7, 2020 . Math
5 Feb 2, 2020 . French
5 Feb 2, 2020 . French
;
proc sql;
create table want as
select id,date,Subject1 as Subject
from have
where Subject1 is not missing
union
select id,date,max(Subject2)
from have
group by id,date
having n(Subject1)=0;
quit;
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.