Hi experts, i have a dataset similar to the dataset(with 5000 subjids) below.
i need to extract the unique subjid who doesn't have cnsdt missing for vis1 only for which i have written a sql code below. kindly let me know the possible reason behind group by is getting converted in to order by as the warning appears in the log and the values i'm getting are all wrong.
data a;
input subjid event cnsdt;
01001 vis1 12/03/2018
01001 vis2
01001 vis3
01001 vis4
01002 vis1 08/08/2018
01002 vis2
01002 vis3
01002 vis4
01003 vis1 .
01003 vis2
01003 vis3
01003 vis4
01004 vis1 23/02/2018
01004 vis2
01004 vis3
01004 vis4
01005 vis1 .
01005 vis2
01005 vis3
01005 vis4
;
run;
proc sql;
create table missconsdt as select distinct subjid from a where cnsdt ^=. group by subjid;
quit;
any other better ways to do this in datastep, without using sql?Kindly suggest
data a;
infile cards truncover;
input subjid $ event $ cnsdt :ddmmyy10.;
format cnsdt ddmmyy10.;
cards;
01001 vis1 12/03/2018
01001 vis2
01001 vis3
01001 vis4
01002 vis1 08/08/2018
01002 vis2
01002 vis3
01002 vis4
01003 vis1 .
01003 vis2
01003 vis3
01003 vis4
01004 vis1 23/02/2018
01004 vis2
01004 vis3
01004 vis4
01005 vis1 .
01005 vis2
01005 vis3
01005 vis4
;
run;
proc sql;
create table missconsdt as
select subjid
from a
where (event='vis1')*cnsdt ;
quit;
proc print noobs;run;
data missconsdt;
set a;
where (event='vis1')*cnsdt ;
run;
proc print noobs;run;
subjid |
---|
01001 |
01002 |
01004 |
Thanks @novinosrin for the wonderful response. it's working perfectly . before i close your response as a solution, i would like to understand the syntax behind using the line
"where (event='vis1')*cnsdt "
Does cnsdt ^=. will be directly interpreted like that by SAS, by mentioning just the cnsdt (which i can see though from the results ,but what if i would like to do the vice versa , where (event='vis1')*-cnsdt, will this work )? or am i missing something.
Looking forward to having your response. Any paper for reference can be shared from your end would be helpful.
where (event='vis1')*cnsdt ;
is a boolean expression equivalent to
where event='vis1' and not missing(dnsdt);
Boolean expressions are mathematical expressions that standardises values into binary 1's and 0' for true and false. I use these a lot considering the utility is phenomenal in the data prep for logistic regression for my credit risk team. You will pick up by practice. The people who inspired me to use this are @SASKiwi , @hashman and @data_null__ They showed me the usage in terms of brevity but it furthered to get me using in Probability models. So I would recommend reading their posts.
PS Honestly they were gracious to lend me their time and I think they have gotten old now to do the same for today's newbies. lol 😁
A slight, but important correction:
where event='vis1' and cnsdt not in (0,.);
A non-missing value of zero will cause the multiplication to also result in zero, which is considered "false".
Indeed that's sharp. Agreed 🙂
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.