Solved: Re: Creating a new datset with missing values

kmardinian · Posted 09-18-2018 02:09 PM

Hi, I am working with a dataset that looks like this

Subject Base1 Base2 CHG1 CHG2 VisitID

1 4 5 1

1 4 5 2 3 2

2 4 5 1

3 4 5 1

I'd like to create a new dataset from this data where I can isolate the CHG variables without losing Subject 1 VisitID 1 observation, so that it would look like this

Subject Base1 Base2 CHG1 CHG2 VisitID

1 4 5 1

1 4 5 2 3 2

Any help would be much appreciated. Thank you!

Tom · Posted 09-18-2018 02:45 PM

Still not at all clear what you want. Your words say you just want to include records that have visitid=1 or non-missing CHGx values.

    if CHG1 ne . or CHG2 ne . or visitid=1;

But that is not the output that you posted.

To get that you would change the last condition to be just the VISITID=1 to for Subject 1

   if CHG1 ne . or CHG2 ne . or (subject=1 and visitid=1);

Note you didn't specify if VISITID or SUBJECT are numbers or character strings. I have coded as if they are numbers, but in most instances such identifiers would be character strings since you never need to run statistics on them.

View solution in original post

SuryaKiran · Posted 09-18-2018 02:30 PM

What is it your trying to do here?

Thanks,
Suryakiran

kmardinian · Posted 09-18-2018 02:34 PM

I would like to create a dataset with all the subjects that have observations for the CHG variables without losing my VisitID=1 observations.

I've been running this code

data final;
set t4;
if CHG1 ne .;
run;

which obviously doesn't work because then I am losing some of the observations for those subjects, but I'm not sure how else to do it

Tom · Posted 09-18-2018 02:45 PM

Still not at all clear what you want. Your words say you just want to include records that have visitid=1 or non-missing CHGx values.

    if CHG1 ne . or CHG2 ne . or visitid=1;

But that is not the output that you posted.

To get that you would change the last condition to be just the VISITID=1 to for Subject 1

   if CHG1 ne . or CHG2 ne . or (subject=1 and visitid=1);

Note you didn't specify if VISITID or SUBJECT are numbers or character strings. I have coded as if they are numbers, but in most instances such identifiers would be character strings since you never need to run statistics on them.

novinosrin · Posted 09-18-2018 02:33 PM

data have;
infile cards truncover;
input Subject    Base1 Base2 CHG1 CHG2   VisitID;
cards;
1                4           5           . .                      1
1                4           5         2            3         2
2                4           5         . .                         1
3                4           5           .  .                       1
;


proc sql;
create table want as
select *
from have
group by subject
having max((n(CHG1 ,CHG2)>0))=1;
quit;

@kmardinian I think the above will work

kmardinian · Posted 09-18-2018 02:39 PM

Thank you! Do you mind explaining what this statement is doing?

max((n(CHG1 ,CHG2)>0))=1;

novinosrin · Posted 09-18-2018 02:41 PM

n function returns the number of non missing values, that's a check and the boolean value is returned is 0 or 1 basing on true of false. And then we filter using having clause to output only true (1) records

kmardinian · Posted 09-18-2018 02:45 PM

I am trying to do it with one CHG variable at a time, but for some reason it is giving me this error 😕

536
537 proc sql;
538 create table final_all as select * from t4
539 group by subnum having max((n(chg1)>0))=1;
ERROR: Summary functions nested in this way are not supported.
540 quit;

novinosrin · Posted 09-18-2018 02:47 PM

Not sure what's your requirement , now you seem to change from your initial. Can you clarify your have and want plz and make it comprehensive for the community

kmardinian · Posted 09-18-2018 02:51 PM

Sorry, it's difficult to properly explain what I was looking for, but Tom's line of code below is exactly what I needed.

Thank you so much for all your help!

   if CHG1 ne . or CHG2 ne . or (subject=1 and visitid=1);

SuryaKiran · Posted 09-18-2018 02:52 PM

Maybe a subquery might help

proc sql;
select * from have
where subject in (select subject from have where chg1 is not null and chg2 is not null);
quit;

Thanks,
Suryakiran

kmardinian · Posted 09-18-2018 02:59 PM

Thank you, I will try it this way as well!

FreelanceReinh · Posted 09-18-2018 05:29 PM

@kmardinian wrote:

I am trying to do it with one CHG variable at a time, but for some reason it is giving me this error 😕

536
537 proc sql;
538 create table final_all as select * from t4
539 group by subnum having max((n(chg1)>0))=1;
ERROR: Summary functions nested in this way are not supported.
540 quit;

The reason is: The N function is both a SAS function (to be applied to values within an observation) and an SQL summary function (to be applied to values within a column). As long as there were two arguments it was clear that you meant the SAS function. Now with only one argument it could be the SQL function as well and indeed this is the preferred interpretation by the compiler. Therefore, n(chg1) is already the number of non-missing CHG1 values in the group and the additional SQL summary function MAX is unnecessary and even inappropriate (hence the error).

So, one of several possible corrections is:

having n(chg1);

(Similarly, the original HAVING clause could have been written as having n(CHG1) | n(CHG2).)

Ksharp · Posted 09-19-2018 10:22 AM

data have;
infile cards truncover;
input Subject    Base1 Base2 CHG1 CHG2   VisitID;
cards;
1                4           5           . .                      1
1                4           5         2            3         2
2                4           5         . .                         1
3                4           5           .  .                       1
;

proc sort data=have out=want nodupkey;
by  Base1 Base2 CHG1 CHG2   VisitID;
run;
proc print;run;

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away