BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
NewUsrStat
Pyrite | Level 9

Hi all suppose to have the following two cases: 

dataset1: 

ID   Condition    N_data_exp     N_data_dis    N_data_reg
 A        2           23             54             65
 B        1           27             23             67
 C        5           32             43             78
 D        3           2              23             70
 E        8           87             1              4
..       ..           ..             ..             ..

I would replace all values < 5 in columns starting with "N_" with 999. I used the

following code but it does not work. Of course I'm doing something wrong:

 

data output; 
  set dataset1; 
  if N_: < 5 then N_: = 999; 
run;

 I cannot use the condition "numeric" because there is the variable "Condition" that must not change.

 

 

Then the second case is the following: 

dataset2:

 

 ID   Condition    N_data_exp     N_data_dis    N_data_reg    data_exp_res  data_dis_res   ....
 A        2           23             54             65            345           34
 B        1           27             23             67            52            987
 C        5           32             43             78            675           321
 D        3           2              23             70            0.2           41
 E        8           87             1              4             34            0.001
..       ..           ..             ..             ..            ..            ..

 

I would replace all values < 5 in columns starting with "N_" with 999 in corresponding columns ending with "_res". 

 

Can anyone help me please? I have to manage the two cases separately. 

 

Really thank you in advance

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

@NewUsrStat wrote:
Yes, so the condition should be verified in columns starting with N_ while the replacement should be done in columns (corresponding!!!!) ending with _res

So you want to make NEW variables with names that are derived from the names of other variables?

You can use an SQL query to help with that.

 

So get a list of the variables in your dataset. The easiest way is to use PROC CONTENTS.  So if your existing dataset is named HAVE you can make a dataset named CONTENTS with the names (and other metadata) for the variables.

proc contents noprint data=HAVE out=CONTENTS;
run;

Now you can use that dataset to find the variables you want and also generate their names into macro variables by using PROC SQL.

proc sql noprint;
select nliteral(name)
     , nliteral(cats(substr(name,3),'_res'))
  into :old separated by ' '
     , :new separated by ' '
  from contents
  where upcase(name) eqt 'N_'
    and type='num'
;
quit;

Now you can use those two macro variables to define arrays to allow you to run the same code for many variable pairs by indexing into the arrays.

 

You can then use a DO loop to iterate an index variable over the number of variables in the array and do what ever calculations you need.

 

It looks like you are trying to HIDE the values of the NEW list when the values in the OLD list are less than 5. 

 

So you might do something like this to replace such values with special missing .B (B for blinded).

data want;
  set have;
  array old &old;
  array new &new;
  do index=1 to dim(old);
      if 0 < old[index] < 5 then new[index]=.B ;
  end;
run;

 

View solution in original post

11 REPLIES 11
PaigeMiller
Diamond | Level 26

Are these numeric variables? If so, they cannot have the value "Outlier", that is character. Cannot be done.

 

But you can DISPLAY values less than 5 as some text string ... their value will still be a number, but it is displayed as Outlier. Would that work for you? The only reason I can think of for putting Outlier into these fields is to display the table for people.

 

proc format;
    value outlf low-5='Outlier';
run;
proc print data=have;
    var id condition n_:;
    format n_: outlf.;
run;

 

Note: if you do arithmetic on these variables, such as computing a sum, a number less than 5 will still be that number.

--
Paige Miller
NewUsrStat
Pyrite | Level 9
I will edit the post. Instead of "outlier" also 999 is ok
PaigeMiller
Diamond | Level 26

@NewUsrStat wrote:
I will edit the post. Instead of "outlier" also 999 is ok

NOOOOOO! NOOOOOO! NOOOOOO! NOOOOOO!

NOOOOOO! NOOOOOO! NOOOOOO! NOOOOOO!

 

You should never put a potentially valid number into a dataset to indicate outlier. Then any sum or average or other computation will not be correct.

 

Please discuss the purpose of doing this. Is it to display a table for people? Is it for some other reason? If so, what?


What is wrong with the PROC FORMAT solution I gave?

--
Paige Miller
NewUsrStat
Pyrite | Level 9

Nothing wrong with proc format. No computations will be done on the tables. They result from computations already. Will be only displayed.

PaigeMiller
Diamond | Level 26

Then PROC FORMAT is the solution I would recommend.

--
Paige Miller
PaigeMiller
Diamond | Level 26

So, if I am understanding you properly, you want only to apply this format to variables in DATASET2 whose name ends with a particular suffix.

 

Thus, you need a list of those variables. PROC SQL can create that list (as the macro variable &names, by finding names that end in _RES)

 

proc sql noprint;
    select name into :names separated by ' ' from dictionary.columns
    where reverse(trim(name)) eqt 'ser_' and libname='WORK' and memname='DATASET2';
quit;
%put &=names;

 

Then

 

proc print data=dataset2;
     format &names outf.;
run;

 

--
Paige Miller
NewUsrStat
Pyrite | Level 9
Yes, so the condition should be verified in columns starting with N_ while the replacement should be done in columns (corresponding!!!!) ending with _res
Tom
Super User Tom
Super User

@NewUsrStat wrote:
Yes, so the condition should be verified in columns starting with N_ while the replacement should be done in columns (corresponding!!!!) ending with _res

So you want to make NEW variables with names that are derived from the names of other variables?

You can use an SQL query to help with that.

 

So get a list of the variables in your dataset. The easiest way is to use PROC CONTENTS.  So if your existing dataset is named HAVE you can make a dataset named CONTENTS with the names (and other metadata) for the variables.

proc contents noprint data=HAVE out=CONTENTS;
run;

Now you can use that dataset to find the variables you want and also generate their names into macro variables by using PROC SQL.

proc sql noprint;
select nliteral(name)
     , nliteral(cats(substr(name,3),'_res'))
  into :old separated by ' '
     , :new separated by ' '
  from contents
  where upcase(name) eqt 'N_'
    and type='num'
;
quit;

Now you can use those two macro variables to define arrays to allow you to run the same code for many variable pairs by indexing into the arrays.

 

You can then use a DO loop to iterate an index variable over the number of variables in the array and do what ever calculations you need.

 

It looks like you are trying to HIDE the values of the NEW list when the values in the OLD list are less than 5. 

 

So you might do something like this to replace such values with special missing .B (B for blinded).

data want;
  set have;
  array old &old;
  array new &new;
  do index=1 to dim(old);
      if 0 < old[index] < 5 then new[index]=.B ;
  end;
run;

 

NewUsrStat
Pyrite | Level 9
Unfortunately it does not work. The output of proc sql is empty
PaigeMiller
Diamond | Level 26

Never say "it doesn't work" and provide no further information. We don't know what you did; we don't know what happened.

 

Always show us the code you used. Always show us (a portion of) the data set as WORKING data step code (examples and instructions). If there are errors in the log, provide the ENTIRE log for this PROC, not just the error messages.

--
Paige Miller

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 11 replies
  • 945 views
  • 1 like
  • 3 in conversation