Solved: If column name matches data in other column map 1 else map 0

Shone · Posted 12-05-2018 07:05 PM

Hi, I have a dataset that looks like shown below.

I want to put 1 if name of colume (data1 data2 and data3 is 4 column in the example (col1, col2, col3,col4) if column doesn't match i want to put 0

Dataset I have:

id	data1	data2	data3	col1	col2	col3	col4	fname	lname
100				data1		data2	data3	apple	asdf
200					data2			ball	asdf
300					data1			cat	asdf
400						data2		data	asdf
500								data	asdf
600					data1		data3	asdfasd	asdf
700								asdfasd	asdf

Result I want:

id	data1	data2	data3	col1	col2	col3	col4	fname	lname
100	1	1	1	data1		data2	data3	apple	asdf
200	0	1	0		data2			ball	asdf
300	1	0	0		data1			cat	asdf
400	0	1	0			data2		data	asdf
500	0	0	0					data	asdf
600	1	0	1		data1		data3	asdfasd	asdf
700	0	1	1			data2	data3	asdfasd	asdf

How can I compare few columns with data in another columns? This is a sample so data1. data2, data3 could be in 100s and col1 and col2 can also be in 100s. I am looking into millions of rows of data. Any help or suggestion would be really great.

Thank you,

Shone

ballardw · Posted 12-05-2018 07:46 PM

I think this may get you started:

data have;
   input id data1 data2 data3 col1 $ col2 $ col3 $;
datalines ;
100 . . . data1 . data2
200 . . . .  data2 .
300 . . . .  data1 .
400 . . . data1 . data3
;
run;

data want;
   set have;
   array c  col1-col3;
   array d data1-data3;
   length v $ 32;
   do i= 1 to dim(d);
      v= vname(d[i]);
      d[i] = whichc(v, of c(*))>0;
   end;
   drop v i;
run;

You may have to play some games with the CASE of the variable v and the col values to ensure comparsion as if your data may have

"Data1" that does not match "data1" so you may have to ensure that all of your Col variables are all lower case or uppercase and that the V is set the same.

View solution in original post

ballardw · Posted 12-05-2018 07:46 PM

I think this may get you started:

data have;
   input id data1 data2 data3 col1 $ col2 $ col3 $;
datalines ;
100 . . . data1 . data2
200 . . . .  data2 .
300 . . . .  data1 .
400 . . . data1 . data3
;
run;

data want;
   set have;
   array c  col1-col3;
   array d data1-data3;
   length v $ 32;
   do i= 1 to dim(d);
      v= vname(d[i]);
      d[i] = whichc(v, of c(*))>0;
   end;
   drop v i;
run;

You may have to play some games with the CASE of the variable v and the col values to ensure comparsion as if your data may have

"Data1" that does not match "data1" so you may have to ensure that all of your Col variables are all lower case or uppercase and that the V is set the same.

Shone · Posted 12-05-2018 08:09 PM

Thank you so much. What does length v $ 32; do?

Thank you,

Shone

Shone · Posted 12-06-2018 10:54 AM

How can I set those 0s and 1 be set as number instead of character? Thank you!!

ballardw · Posted 12-06-2018 12:34 PM

@Shone wrote:

How can I set those 0s and 1 be set as number instead of character? Thank you!!

In the code I wrote and run then the data1 - date3 values are numeric. You would have to show your actual data and code to see why you are getting character values. Likely your initial variable data1-data3 are character.

The Length v $ 32; sets the maximum length of the variable to hold a single variable name to 32 characters, the maximum possible currently in SAS. The function VNAME would assign a length of 200 by default. Which if you keep the value, such as during debugging, can mean much wasted space in whatever tool is used to examine the values.

Generally it is a good idea to set lengths of character variables as the default assignments could lead to truncation. Examine the code below. Determine what you think the final value of X is going to be. Then run the code and view or print the result. Does X meet that expected value?

data example;
   y = 3;
   x = 'abc';
   if y=3 then x='Should be 3';
run;

Reeza · Posted 12-05-2018 08:00 PM

Fully dynamic approach that doesn't require you to know anything about the data beyond the variable names.

data have;
    input id data1 data2 data3 col1 $ col2 $ col3 $;
    datalines;
100 . . . data1 . data2
200 . . . .  data2 .
300 . . . .  data1 .
400 . . . data1 . data3
;
run;

*flip to long;
proc transpose data=have (Drop=data1-data3) out=long;
    by Id;
    var col1-col3;
run;

*remove missing and add 1;
data long2;
    set long (where=(not missing(col1)));
    val=1;
run;

proc sort data=long2;
    by id col1;
run;

proc transpose data=long2 out=wide;
    by id;
    var val;
    id col1;
run;

@Shone wrote:

Hi, I have a dataset that looks like shown below.

I want to put 1 if name of colume (data1 data2 and data3 is 4 column in the example (col1, col2, col3,col4) if column doesn't match i want to put 0

Dataset I have:

id data1 data2 data3 col1 col2 col3 col4 fname lname

100 data1 data2 data3 apple asdf

200 data2 ball asdf

300 data1 cat asdf

400 data2 data asdf

500 data asdf

600 data1 data3 asdfasd asdf

700 asdfasd asdf

Result I want:

id data1 data2 data3 col1 col2 col3 col4 fname lname

100 1 1 1 data1 data2 data3 apple asdf

200 0 1 0 data2 ball asdf

300 1 0 0 data1 cat asdf

400 0 1 0 data2 data asdf

500 0 0 0 data asdf

600 1 0 1 data1 data3 asdfasd asdf

700 0 1 1 data2 data3 asdfasd asdf

How can I compare few columns with data in another columns? This is a sample so data1. data2, data3 could be in 100s and col1 and col2 can also be in 100s. I am looking into millions of rows of data. Any help or suggestion would be really great.

Thank you,

Shone

If column name matches data in other column map 1 else map 0

Re: If column name matches data in other column map 1 else map 0

Re: If column name matches data in other column map 1 else map 0

Re: If column name matches data in other column map 1 else map 0

Re: If column name matches data in other column map 1 else map 0

Re: If column name matches data in other column map 1 else map 0

Re: If column name matches data in other column map 1 else map 0

If column name matches data in other column map 1 else map 0

Re: If column name matches data in other column map 1 else map 0

Re: If column name matches data in other column map 1 else map 0

Re: If column name matches data in other column map 1 else map 0

Re: If column name matches data in other column map 1 else map 0

Re: If column name matches data in other column map 1 else map 0

Re: If column name matches data in other column map 1 else map 0

Registration is open