- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Dear all,
if I have a dataset that looks like this:
data have;
infile datalines;
input var1 $ var2 $ var3 $ var4 $ ;
datalines;
Y Y N N
N Y N Y
N N N N
N Y Y Y
N Y N Y
Y Y Y Y
;
run;
I want to select only the N values of each var. Can anyone help?
This is how the result should look like
data want;
infile datalines truncover;
input var1 $ var2 $ var3 $ var4 $ ;
datalines;
N N
N N
N N N N
N
N N
;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Try this
data have;
infile datalines;
input var1 $ var2 $ var3 $ var4 $ ;
datalines;
Y Y N N
N Y N Y
N N N N
N Y Y Y
N Y N Y
Y Y Y Y
;
run;
data want;
set have;
array v var:;
do over v;
if v = 'Y' then v = '';
end;
run;
Result:
var1 var2 var3 var4 N N N N N N N N N N N
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@PeterClemmensen , thanks for the quick reply. I have a question does this select all variables whose corresponding to values of N or does this just delect the value Y. I want to know this because I have over 1000 variables in the dataset and I just need only the values of all variables having var1-var4 = N
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@Anita_n This code goes through all variables with the prefix 'var' and sets the value to ' ' when the original value is 'Y'.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@PeterClemmensen okay, I tested that and realise it just delects Y but I want it to select N and its corresponding values like for example if this were to be my data have
data have;
infile datalines;
input var1 $ num1 var2 $ num2 var3 $ num3 var4 $ num4;
datalines;
Y 1 Y 7 N 1 N 8
N 2 Y 2 N 2 Y 7
N 3 N 1 N 5 N 6
N 4 Y 2 Y 5 Y 7
N 5 Y 3 N 4 Y 4
Y 6 Y 4 Y 3 Y 9
;
run;
My data want should be like this:
data want;
infile datalines truncover;
input var1 $ num1 var2 $ num2 var3 $ num3 var4 $ num4;
datalines;
N 1 N 8
N 2 N 2
N 3 N 1 N 5 N 6
N 4
N 5 N 4
;
run;
ed
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@Anita_n wrote:
@PeterClemmensen okay, I tested that and realise it just delects Y but I want it to select N and its corresponding values like for example if this were to be my data have
If a variable is in a SAS data set at all then there will be a value for it, which means missing with what you show for a "want".
The variable exists so is must have something.
It is much easier to set to missing for the unwanted than to attempt to "keep" the desired values.
data want; set have; array v (*) var1- var4; array nn (*) num1- num4; do i=1 to dim(v); if v[i]='Y' then call missing(v[i],nn[i]); end; drop i; run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Is there no way to do that using proc sql. I don't know if arrays can be used in proc sql. I know using OR or AND doesn't work but aybe there is a specific way doing that
proc sql;
create table want as select * from have where var1="N" or var2="N" or var3="N" and var4="N";
quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@Anita_n wrote:
Is there no way to do that using proc sql. I don't know if arrays can be used in proc sql. I know using OR or AND doesn't work but aybe there is a specific way doing that
proc sql; create table want as select * from have where var1="N" or var2="N" or var3="N" and var4="N"; quit;
SQL does not support arrays.
You would need to write 8 Case statements, one for each variable involved, testing the Var variable to set the value of the Num variable.
If you had a nice variable for Joining on you might get away with 4 bits like (select id, var1,num1 from set where Var1='N').
Multiple variables requiring similar treatment easily is not an SQL strength.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
okay, thankyou
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
May I ask what the actual application is?
I hope that the real data has som kind of row identifier.
If, so you could transpose the data, and then jsut do the transformation on one column, or just add a format that hides the "Y"
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@LinusH Is all about patients who received certain treatments on each visit (let's say visits 1 to 10), But there a certain patients who took part in a clinical trial who should be excluded from this project. Now for each visit there is a variable which indicate if they took part in a clinical trial at that specific visit (if yes then the variable= Y and if no then it's =N)
That was what I tried to depict above with var1 to var4 . Now I need to select all patients who did not take take in a clinical trial for all visits 1 to 10 (that means var1 to var10 has to be N, there shouldn't be any Y's in the dataset)
All this selection should include all data of the patients like the id, age, sex, diagnosis date, deathdate, type of treatment etc.....
I did this with sas macro that looks like this and later merged them:
%macro test (num);
proc sql,
create table want&num as select id, therapy&num, visit&num, clintrial&num where clintrial&num="N";
quit;
%mend;
%test(1); %test(2); %test(3); %test(4); %test(5);%test(6); %test(7); %test(8); %test(9); %test(10);*merge all 10 datasets;data want,merge want:;by id;run;
My problem here is I can't select all variables, since if I do that for test1 for example then I will have test2 to test10 with undesired values in there. So I was thinking there is a better way doing that. But it seems to be complicated
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Not sure if removing the value "Y" will help you with this logic (if understand your use case correcrtly).
If your flags were 0/1 instead a simple
if sum(of var1-var10) = 0 then delete;
suffice.
I guess array as described earlier in the thread is a valid way of doing it, but just add some logic to delete the record if all elements are "N".