BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
alamoubm
Fluorite | Level 6

Hi, 

 

I'm a new user of SAS 9.4 program and I am working with a data of my research. I have this character column which looks like: 

Var1 

XY_001

XY_021

XY_001 

XY_000

XY_015 .......etc

 

it is around 633 obsrvations and have also 577 missings! All observations are 1210

 

and if I run a proc freq for this variable the frequencies of the readings are different ranging from 1 to 38 and  the missings 

 

Now I want to create a new variable called Var2 I want to make it binary (0/1). for those categories who have 1 as a frequency to be 0 

and those categories who have frequency more or equal than 2 to be 1. and the missings as missing

and to call 0: not group 

and 1: group

 

How can I create the program??

 

thank you 

 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

Let's start by extending the PROC FREQ that you ran, so it also creates an output data set:

proc freq data=have;
 tables var1 / out=counts (keep=var1 count);
run;

 

Next, sort your original data set:

proc sort data=have;
 by var1;
run;

 

Finally, merge the PROC FREQ results with your data:

data want;
merge have
      counts (where=(var1 > ' '));
by var1;
if count=1 then flag=0;
else if count > 1 then flag=1;
drop count;
run;

 

View solution in original post

9 REPLIES 9
Astounding
PROC Star

Let's start by extending the PROC FREQ that you ran, so it also creates an output data set:

proc freq data=have;
 tables var1 / out=counts (keep=var1 count);
run;

 

Next, sort your original data set:

proc sort data=have;
 by var1;
run;

 

Finally, merge the PROC FREQ results with your data:

data want;
merge have
      counts (where=(var1 > ' '));
by var1;
if count=1 then flag=0;
else if count > 1 then flag=1;
drop count;
run;

 

alamoubm
Fluorite | Level 6

Hi Astounding, 

 

Thank you for your reply I have an issue: 

 

I run this code: 

/* to create new cluster status variable*/
proc freq data=genotyped_1210;
tables clustername / out=counts (keep=clustername count);
run;
 
/*Next, sort your original data set:*/
 
Proc sort data= genotyped_1210;
by clustername;
run;
 

The 1st part wet well, the sorting part gave me this error: 


80   Proc sort data= genotyped_1210;
81   by clustername;
82   run;

ERROR: You cannot open WORK.GENOTYPED_1210.DATA for output access with member-level control because
WORK.GENOTYPED_1210.DATA is in use by you in resource environment SORT.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE SORT used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds

 

What do you suggest?

Thank you 

ChrisHemedinger
Community Manager

You probably have the original data set open in a data viewer (EG? Check Tools->View Open Data Sets.)

 

To avoid, you can work from a copy by sorting the original into another file and work forward from there -- or else just make sure all data grid views are closed before running the next step.

 

/* to create new cluster status variable*/
proc freq data=genotyped_1210;
tables clustername / out=counts (keep=clustername count);
run;
 
/*Next, sort your original data set:*/
 
Proc sort data= genotyped_1210 out=sorted_genotyped;
by clustername;
run;
It's time to register for SAS Innovate! Join your SAS user peers in Las Vegas on April 16-19 2024.
alamoubm
Fluorite | Level 6

Thank you ChrisHemedinger,

 

I closed all the data viewer and it worked! 

However, when doing the next program as follows 

/*Finally, merge the PROC FREQ results with your data:*/
 
data TB;
merge sorted_genotyped
       counts (where=(clustername > ' '));
by clustername;
if count=1 then unique=0;
else if count > 1 then cluster=1;
drop count;
run;

The following error was: 


58   data TB;
59   merge sorted_genotyped
60          counts (where=(clustername > ' '));
ERROR: The value        is not a valid SAS name.
61   by clustername;
62   if count=1 then unique=0;
63   else if count > 1 then cluster=1;
64   drop count;
65   run;

NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.TB may be incomplete.  When this step was stopped there were 0
         observations and 239 variables.
WARNING: Data set WORK.TB was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds

 

attached picture of example of the character values of the variable: clusternameissue1.jpg

 

 

Astounding
PROC Star

This won't explain the error, but it is a small required fix.  You now have:

 

  if count=1 then unique=0;
  else if count > 1 then cluster=1;

 

That needs to become:


  if count=1 then unique=0;
  else if count > 1 then unique=1;

 

The error message itself it not really explainable.  You can try changing the WHERE condition from this:

 

 (where=(clustername > ' '))

to this:

 

 (where=(clustername ne ' '))

 

Also make sure the quotes around the blank space are really quotes ... you can try deleting them, then typing them back in again.  It shouldn't make a difference, but you never know.

alamoubm
Fluorite | Level 6

Thanks all for your help

 

The problem still didn't solve it could be because I have an ID numbers in the original file thatcouldaffect the merge step ?!

 

I will try to use R instead to see if this could solve the problem!

 

I appreciatr all your suggestions and programs you provided I tried them all and it didn't work.

 

Thank you so much

 

Tom
Super User Tom
Super User

What is it that you want as your output?

data have ;
  length Var1 $10;
  input Var1 @@ ;
cards;  
XY_001 XY_021 . XY_001 . XY_000 . XY_015 .
;

proc sql noprint ;
  create table want as 
    select var1,count(*) as nobs
         , (calculated nobs > 1) as var2
    from have
    group by 1
  ;
quit;

proc print; run;
Obs     Var1     nobs    var2

 1                 4       1
 2     XY_000      1       0
 3     XY_001      2       1
 4     XY_015      1       0
 5     XY_021      1       0
alamoubm
Fluorite | Level 6

Thank you for your all contributions the problem is solved by R! 

 

 

 

 

raheleh22
Obsidian | Level 7
Hi,
i have three categorical variables in my data set:
1. alcohol (current=1, other=0)
2. smk (current=1, other=0)
3.drug (current=1, other=0)
now i want to creat a variable that only includes level1 of these variables:
here is what I wrote:
if alcohol=1 & smk=1 & drug=1 then (new variable name)= 1; else (new variable)=0;
however this is not showing in my proc freq.
any advice is appreciated.
Thanks

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 5194 views
  • 3 likes
  • 5 in conversation