DATA Step, Macro, Functions and more

How to create binary variable from existing categorical variable acording to frquency of categories

Reply
Occasional Contributor
Posts: 5

How to create binary variable from existing categorical variable acording to frquency of categories

Hi, 

 

I'm a new user of SAS 9.4 program and I am working with a data of my research. I have this character column which looks like: 

Var1 

XY_001

XY_021

XY_001 

XY_000

XY_015 .......etc

 

it is around 633 obsrvations and have also 577 missings! All observations are 1210

 

and if I run a proc freq for this variable the frequencies of the readings are different ranging from 1 to 38 and  the missings 

 

Now I want to create a new variable called Var2 I want to make it binary (0/1). for those categories who have 1 as a frequency to be 0 

and those categories who have frequency more or equal than 2 to be 1. and the missings as missing

and to call 0: not group 

and 1: group

 

How can I create the program??

 

thank you 

 

 

 

 

Super User
Posts: 5,509

Re: How to create binary variable from existing categorical variable acording to frquency of categor

Let's start by extending the PROC FREQ that you ran, so it also creates an output data set:

 

proc freq data=have;

tables var1 / out=counts (keep=var1 count);

run;

 

Next, sort your original data set:

 

proc sort data=have;

by var1;

run;

 

Finally, merge the PROC FREQ results with your data:

 

data want;

merge have

       counts (where=(var1 > ' '));

by var1;

if count=1 then flag=0;

else if count > 1 then flag=1;

drop count;

run;

Occasional Contributor
Posts: 5

Re: How to create binary variable from existing categorical variable acording to frquency of categor

Posted in reply to Astounding

Hi Astounding, 

 

Thank you for your reply I have an issue: 

 

I run this code: 

/* to create new cluster status variable*/
proc freq data=genotyped_1210;
tables clustername / out=counts (keep=clustername count);
run;
 
/*Next, sort your original data set:*/
 
Proc sort data= genotyped_1210;
by clustername;
run;
 

The 1st part wet well, the sorting part gave me this error: 


80   Proc sort data= genotyped_1210;
81   by clustername;
82   run;

ERROR: You cannot open WORK.GENOTYPED_1210.DATA for output access with member-level control because
WORK.GENOTYPED_1210.DATA is in use by you in resource environment SORT.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE SORT used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds

 

What do you suggest?

Thank you 

Community Manager
Posts: 2,954

Re: How to create binary variable from existing categorical variable acording to frquency of categor

You probably have the original data set open in a data viewer (EG? Check Tools->View Open Data Sets.)

 

To avoid, you can work from a copy by sorting the original into another file and work forward from there -- or else just make sure all data grid views are closed before running the next step.

 

/* to create new cluster status variable*/
proc freq data=genotyped_1210;
tables clustername / out=counts (keep=clustername count);
run;
 
/*Next, sort your original data set:*/
 
Proc sort data= genotyped_1210 out=sorted_genotyped;
by clustername;
run;
Occasional Contributor
Posts: 5

Re: How to create binary variable from existing categorical variable acording to frquency of categor

Posted in reply to ChrisHemedinger

Thank you ChrisHemedinger,

 

I closed all the data viewer and it worked! 

However, when doing the next program as follows 

/*Finally, merge the PROC FREQ results with your data:*/
 
data TB;
merge sorted_genotyped
       counts (where=(clustername > ' '));
by clustername;
if count=1 then unique=0;
else if count > 1 then cluster=1;
drop count;
run;

The following error was: 


58   data TB;
59   merge sorted_genotyped
60          counts (where=(clustername > ' '));
ERROR: The value        is not a valid SAS name.
61   by clustername;
62   if count=1 then unique=0;
63   else if count > 1 then cluster=1;
64   drop count;
65   run;

NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.TB may be incomplete.  When this step was stopped there were 0
         observations and 239 variables.
WARNING: Data set WORK.TB was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds

 

attached picture of example of the character values of the variable: clusternameissue1.jpg

 

 

Super User
Posts: 5,509

Re: How to create binary variable from existing categorical variable acording to frquency of categor

This won't explain the error, but it is a small required fix.  You now have:

 

  if count=1 then unique=0;
  else if count > 1 then cluster=1;

 

That needs to become:


  if count=1 then unique=0;
  else if count > 1 then unique=1;

 

The error message itself it not really explainable.  You can try changing the WHERE condition from this:

 

 (where=(clustername > ' '))

to this:

 

 (where=(clustername ne ' '))

 

Also make sure the quotes around the blank space are really quotes ... you can try deleting them, then typing them back in again.  It shouldn't make a difference, but you never know.

Occasional Contributor
Posts: 5

Re: How to create binary variable from existing categorical variable acording to frquency of categor

Posted in reply to Astounding

Thanks all for your help

 

The problem still didn't solve it could be because I have an ID numbers in the original file thatcouldaffect the merge step ?!

 

I will try to use R instead to see if this could solve the problem!

 

I appreciatr all your suggestions and programs you provided I tried them all and it didn't work.

 

Thank you so much

 

Super User
Super User
Posts: 7,050

Re: How to create binary variable from existing categorical variable acording to frquency of categor

What is it that you want as your output?

data have ;
  length Var1 $10;
  input Var1 @@ ;
cards;  
XY_001 XY_021 . XY_001 . XY_000 . XY_015 .
;

proc sql noprint ;
  create table want as 
    select var1,count(*) as nobs
         , (calculated nobs > 1) as var2
    from have
    group by 1
  ;
quit;

proc print; run;
Obs     Var1     nobs    var2

 1                 4       1
 2     XY_000      1       0
 3     XY_001      2       1
 4     XY_015      1       0
 5     XY_021      1       0
Occasional Contributor
Posts: 5

Re: How to create binary variable from existing categorical variable acording to frquency of categor

Thank you for your all contributions the problem is solved by R! 

 

 

 

 

Ask a Question
Discussion stats
  • 8 replies
  • 151 views
  • 3 likes
  • 4 in conversation