Hi, I am working with a dataset that looks like this
Subject Base1 Base2 CHG1 CHG2 VisitID
1 4 5 1
1 4 5 2 3 2
2 4 5 1
3 4 5 1
I'd like to create a new dataset from this data where I can isolate the CHG variables without losing Subject 1 VisitID 1 observation, so that it would look like this
Subject Base1 Base2 CHG1 CHG2 VisitID
1 4 5 1
1 4 5 2 3 2
Any help would be much appreciated. Thank you!
Still not at all clear what you want. Your words say you just want to include records that have visitid=1 or non-missing CHGx values.
if CHG1 ne . or CHG2 ne . or visitid=1;
But that is not the output that you posted.
To get that you would change the last condition to be just the VISITID=1 to for Subject 1
if CHG1 ne . or CHG2 ne . or (subject=1 and visitid=1);
Note you didn't specify if VISITID or SUBJECT are numbers or character strings. I have coded as if they are numbers, but in most instances such identifiers would be character strings since you never need to run statistics on them.
What is it your trying to do here?
I would like to create a dataset with all the subjects that have observations for the CHG variables without losing my VisitID=1 observations.
I've been running this code
data final;
set t4;
if CHG1 ne .;
run;
which obviously doesn't work because then I am losing some of the observations for those subjects, but I'm not sure how else to do it
Still not at all clear what you want. Your words say you just want to include records that have visitid=1 or non-missing CHGx values.
if CHG1 ne . or CHG2 ne . or visitid=1;
But that is not the output that you posted.
To get that you would change the last condition to be just the VISITID=1 to for Subject 1
if CHG1 ne . or CHG2 ne . or (subject=1 and visitid=1);
Note you didn't specify if VISITID or SUBJECT are numbers or character strings. I have coded as if they are numbers, but in most instances such identifiers would be character strings since you never need to run statistics on them.
data have;
infile cards truncover;
input Subject Base1 Base2 CHG1 CHG2 VisitID;
cards;
1 4 5 . . 1
1 4 5 2 3 2
2 4 5 . . 1
3 4 5 . . 1
;
proc sql;
create table want as
select *
from have
group by subject
having max((n(CHG1 ,CHG2)>0))=1;
quit;
@kmardinian I think the above will work
Thank you! Do you mind explaining what this statement is doing?
max((n(CHG1 ,CHG2)>0))=1;
n function returns the number of non missing values, that's a check and the boolean value is returned is 0 or 1 basing on true of false. And then we filter using having clause to output only true (1) records
I am trying to do it with one CHG variable at a time, but for some reason it is giving me this error 😕
536
537 proc sql;
538 create table final_all as select * from t4
539 group by subnum having max((n(chg1)>0))=1;
ERROR: Summary functions nested in this way are not supported.
540 quit;
Not sure what's your requirement , now you seem to change from your initial. Can you clarify your have and want plz and make it comprehensive for the community
Sorry, it's difficult to properly explain what I was looking for, but Tom's line of code below is exactly what I needed.
Thank you so much for all your help!
if CHG1 ne . or CHG2 ne . or (subject=1 and visitid=1);
Maybe a subquery might help
proc sql;
select * from have
where subject in (select subject from have where chg1 is not null and chg2 is not null);
quit;
Thank you, I will try it this way as well!
@kmardinian wrote:
I am trying to do it with one CHG variable at a time, but for some reason it is giving me this error 😕
536
537 proc sql;
538 create table final_all as select * from t4
539 group by subnum having max((n(chg1)>0))=1;
ERROR: Summary functions nested in this way are not supported.
540 quit;
The reason is: The N function is both a SAS function (to be applied to values within an observation) and an SQL summary function (to be applied to values within a column). As long as there were two arguments it was clear that you meant the SAS function. Now with only one argument it could be the SQL function as well and indeed this is the preferred interpretation by the compiler. Therefore, n(chg1) is already the number of non-missing CHG1 values in the group and the additional SQL summary function MAX is unnecessary and even inappropriate (hence the error).
So, one of several possible corrections is:
having n(chg1);
(Similarly, the original HAVING clause could have been written as having n(CHG1) | n(CHG2).)
data have;
infile cards truncover;
input Subject Base1 Base2 CHG1 CHG2 VisitID;
cards;
1 4 5 . . 1
1 4 5 2 3 2
2 4 5 . . 1
3 4 5 . . 1
;
proc sort data=have out=want nodupkey;
by Base1 Base2 CHG1 CHG2 VisitID;
run;
proc print;run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.