Hi, I have a dataset that looks like shown below.
I want to put 1 if name of colume (data1 data2 and data3 is 4 column in the example (col1, col2, col3,col4) if column doesn't match i want to put 0
Dataset I have:
id | data1 | data2 | data3 | col1 | col2 | col3 | col4 | fname | lname |
100 | data1 | data2 | data3 | apple | asdf | ||||
200 | data2 | ball | asdf | ||||||
300 | data1 | cat | asdf | ||||||
400 | data2 | data | asdf | ||||||
500 | data | asdf | |||||||
600 | data1 | data3 | asdfasd | asdf | |||||
700 | asdfasd | asdf |
Result I want:
id | data1 | data2 | data3 | col1 | col2 | col3 | col4 | fname | lname |
100 | 1 | 1 | 1 | data1 | data2 | data3 | apple | asdf | |
200 | 0 | 1 | 0 | data2 | ball | asdf | |||
300 | 1 | 0 | 0 | data1 | cat | asdf | |||
400 | 0 | 1 | 0 | data2 | data | asdf | |||
500 | 0 | 0 | 0 | data | asdf | ||||
600 | 1 | 0 | 1 | data1 | data3 | asdfasd | asdf | ||
700 | 0 | 1 | 1 | data2 | data3 | asdfasd | asdf |
How can I compare few columns with data in another columns? This is a sample so data1. data2, data3 could be in 100s and col1 and col2 can also be in 100s. I am looking into millions of rows of data. Any help or suggestion would be really great.
Thank you,
Shone
I think this may get you started:
data have; input id data1 data2 data3 col1 $ col2 $ col3 $; datalines ; 100 . . . data1 . data2 200 . . . . data2 . 300 . . . . data1 . 400 . . . data1 . data3 ; run; data want; set have; array c col1-col3; array d data1-data3; length v $ 32; do i= 1 to dim(d); v= vname(d[i]); d[i] = whichc(v, of c(*))>0; end; drop v i; run;
You may have to play some games with the CASE of the variable v and the col values to ensure comparsion as if your data may have
"Data1" that does not match "data1" so you may have to ensure that all of your Col variables are all lower case or uppercase and that the V is set the same.
I think this may get you started:
data have; input id data1 data2 data3 col1 $ col2 $ col3 $; datalines ; 100 . . . data1 . data2 200 . . . . data2 . 300 . . . . data1 . 400 . . . data1 . data3 ; run; data want; set have; array c col1-col3; array d data1-data3; length v $ 32; do i= 1 to dim(d); v= vname(d[i]); d[i] = whichc(v, of c(*))>0; end; drop v i; run;
You may have to play some games with the CASE of the variable v and the col values to ensure comparsion as if your data may have
"Data1" that does not match "data1" so you may have to ensure that all of your Col variables are all lower case or uppercase and that the V is set the same.
Thank you so much. What does length v $ 32; do?
Thank you,
Shone
How can I set those 0s and 1 be set as number instead of character? Thank you!!
@Shone wrote:
How can I set those 0s and 1 be set as number instead of character? Thank you!!
In the code I wrote and run then the data1 - date3 values are numeric. You would have to show your actual data and code to see why you are getting character values. Likely your initial variable data1-data3 are character.
The Length v $ 32; sets the maximum length of the variable to hold a single variable name to 32 characters, the maximum possible currently in SAS. The function VNAME would assign a length of 200 by default. Which if you keep the value, such as during debugging, can mean much wasted space in whatever tool is used to examine the values.
Generally it is a good idea to set lengths of character variables as the default assignments could lead to truncation. Examine the code below. Determine what you think the final value of X is going to be. Then run the code and view or print the result. Does X meet that expected value?
data example; y = 3; x = 'abc'; if y=3 then x='Should be 3'; run;
Fully dynamic approach that doesn't require you to know anything about the data beyond the variable names.
data have;
input id data1 data2 data3 col1 $ col2 $ col3 $;
datalines;
100 . . . data1 . data2
200 . . . . data2 .
300 . . . . data1 .
400 . . . data1 . data3
;
run;
*flip to long;
proc transpose data=have (Drop=data1-data3) out=long;
by Id;
var col1-col3;
run;
*remove missing and add 1;
data long2;
set long (where=(not missing(col1)));
val=1;
run;
proc sort data=long2;
by id col1;
run;
proc transpose data=long2 out=wide;
by id;
var val;
id col1;
run;
@Shone wrote:
Hi, I have a dataset that looks like shown below.
I want to put 1 if name of colume (data1 data2 and data3 is 4 column in the example (col1, col2, col3,col4) if column doesn't match i want to put 0
Dataset I have:
id data1 data2 data3 col1 col2 col3 col4 fname lname 100 data1 data2 data3 apple asdf 200 data2 ball asdf 300 data1 cat asdf 400 data2 data asdf 500 data asdf 600 data1 data3 asdfasd asdf 700 asdfasd asdf
Result I want:
id data1 data2 data3 col1 col2 col3 col4 fname lname 100 1 1 1 data1 data2 data3 apple asdf 200 0 1 0 data2 ball asdf 300 1 0 0 data1 cat asdf 400 0 1 0 data2 data asdf 500 0 0 0 data asdf 600 1 0 1 data1 data3 asdfasd asdf 700 0 1 1 data2 data3 asdfasd asdf
How can I compare few columns with data in another columns? This is a sample so data1. data2, data3 could be in 100s and col1 and col2 can also be in 100s. I am looking into millions of rows of data. Any help or suggestion would be really great.
Thank you,
Shone
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.