Hi,
I'm a new user of SAS 9.4 program and I am working with a data of my research. I have this character column which looks like:
Var1
XY_001
XY_021
XY_001
XY_000
XY_015 .......etc
it is around 633 obsrvations and have also 577 missings! All observations are 1210
and if I run a proc freq for this variable the frequencies of the readings are different ranging from 1 to 38 and the missings
Now I want to create a new variable called Var2 I want to make it binary (0/1). for those categories who have 1 as a frequency to be 0
and those categories who have frequency more or equal than 2 to be 1. and the missings as missing
and to call 0: not group
and 1: group
How can I create the program??
thank you
Let's start by extending the PROC FREQ that you ran, so it also creates an output data set:
proc freq data=have;
tables var1 / out=counts (keep=var1 count);
run;
Next, sort your original data set:
proc sort data=have;
by var1;
run;
Finally, merge the PROC FREQ results with your data:
data want;
merge have
counts (where=(var1 > ' '));
by var1;
if count=1 then flag=0;
else if count > 1 then flag=1;
drop count;
run;
Let's start by extending the PROC FREQ that you ran, so it also creates an output data set:
proc freq data=have;
tables var1 / out=counts (keep=var1 count);
run;
Next, sort your original data set:
proc sort data=have;
by var1;
run;
Finally, merge the PROC FREQ results with your data:
data want;
merge have
counts (where=(var1 > ' '));
by var1;
if count=1 then flag=0;
else if count > 1 then flag=1;
drop count;
run;
Hi Astounding,
Thank you for your reply I have an issue:
I run this code:
/* to create new cluster status variable*/
proc freq data=genotyped_1210;
tables clustername / out=counts (keep=clustername count);
run;
/*Next, sort your original data set:*/
Proc sort data= genotyped_1210;
by clustername;
run;
The 1st part wet well, the sorting part gave me this error:
80 Proc sort data= genotyped_1210;
81 by clustername;
82 run;
ERROR: You cannot open WORK.GENOTYPED_1210.DATA for output access with member-level control because
WORK.GENOTYPED_1210.DATA is in use by you in resource environment SORT.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE SORT used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
What do you suggest?
Thank you
You probably have the original data set open in a data viewer (EG? Check Tools->View Open Data Sets.)
To avoid, you can work from a copy by sorting the original into another file and work forward from there -- or else just make sure all data grid views are closed before running the next step.
/* to create new cluster status variable*/
proc freq data=genotyped_1210;
tables clustername / out=counts (keep=clustername count);
run;
/*Next, sort your original data set:*/
Proc sort data= genotyped_1210 out=sorted_genotyped;
by clustername;
run;
Thank you ChrisHemedinger,
I closed all the data viewer and it worked!
However, when doing the next program as follows
/*Finally, merge the PROC FREQ results with your data:*/
data TB;
merge sorted_genotyped
counts (where=(clustername > ' '));
by clustername;
if count=1 then unique=0;
else if count > 1 then cluster=1;
drop count;
run;
The following error was:
58 data TB;
59 merge sorted_genotyped
60 counts (where=(clustername > ' '));
ERROR: The value is not a valid SAS name.
61 by clustername;
62 if count=1 then unique=0;
63 else if count > 1 then cluster=1;
64 drop count;
65 run;
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.TB may be incomplete. When this step was stopped there were 0
observations and 239 variables.
WARNING: Data set WORK.TB was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
attached picture of example of the character values of the variable: clustername
This won't explain the error, but it is a small required fix. You now have:
if count=1 then unique=0;
else if count > 1 then cluster=1;
That needs to become:
if count=1 then unique=0;
else if count > 1 then unique=1;
The error message itself it not really explainable. You can try changing the WHERE condition from this:
(where=(clustername > ' '))
to this:
(where=(clustername ne ' '))
Also make sure the quotes around the blank space are really quotes ... you can try deleting them, then typing them back in again. It shouldn't make a difference, but you never know.
Thanks all for your help
The problem still didn't solve it could be because I have an ID numbers in the original file thatcouldaffect the merge step ?!
I will try to use R instead to see if this could solve the problem!
I appreciatr all your suggestions and programs you provided I tried them all and it didn't work.
Thank you so much
What is it that you want as your output?
data have ;
length Var1 $10;
input Var1 @@ ;
cards;
XY_001 XY_021 . XY_001 . XY_000 . XY_015 .
;
proc sql noprint ;
create table want as
select var1,count(*) as nobs
, (calculated nobs > 1) as var2
from have
group by 1
;
quit;
proc print; run;
Obs Var1 nobs var2 1 4 1 2 XY_000 1 0 3 XY_001 2 1 4 XY_015 1 0 5 XY_021 1 0
Thank you for your all contributions the problem is solved by R!
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.