Re: How to compare values within the variable

don21 · Posted 07-13-2015 05:01 AM

Hello,

Am in middle of some project and am trying to find a logic for the below. Please advise.

Dataset

col1 col2 col3

123

123 001

456

456 101

123

123 033

123 101

456

456 001

there are set of numbers in col1 and some 3 dig codes in col2 which starts with a blank value. I need the unique set of codes(col2) which are either repetitive or non repetitve. all the unique sets needs to be printed either into new variable or a new dataset. Please advise.

RW9 · Posted 07-13-2015 06:48 AM

Hi,

Not sure I am quite following you. Numbers cannot have a blank value? For duplicates, use:

proc sort data=have out=want (keep=col2) nodupkey;

by col2;

run;

ballardw · Posted 07-13-2015 11:41 AM

Please provide what you would expect the output to look like as there are several ways to interpret your written requirement.

don21 · Posted 07-14-2015 06:10 AM

Hello,

The dataset I have looks like:

Number	Set_of_codes
123456	.
123456	001
123456	001
456789	.
456789	022
456789	001
456789	339
456789	010
456789	022
456789	110
456789	110
789102	.
789102	111
789102	110
789102	022
123456	.
123456	022
123456	001
123456	339
123456	010
123456	022
123456	110
123456	110
456789	.
456789	001
456789	001

Expected Output is :

Number	Set_of_codes	unique_set_of_codes	Count
123456	.	.	3
123456	001	001
123456	001	001
456789	.	.
456789	022	022
456789	001	001
456789	339	339
456789	010	010
456789	022	022
456789	110	110
456789	110	110
789102	.	.
789102	111	111
789102	110	110
789102	022	022
123456	.
123456	022
123456	001
123456	339
123456	010
123456	022
123456	110
123456	110
456789	.
456789	001
456789	001

Every sequence of code starts with a blank. Need to identify such unique sets based on card numbers in col1 and the unique set count.

Many thanks in advance.

dcruik · Posted 07-13-2015 11:57 AM

In addition to RW9's response using the nodupkey in the sort procedure you can also use the select distinct in a sql procedure:

proc sql;

create table want as

select unique col2

from have;

quit;

don21 · Posted 07-13-2015 11:19 PM

Hello,

sorry for the confusion, Below is the detailed explanation:

Number	Set_of_codes
123456	.
123456	001
123456	022
456789	.
456789	022
456789	022
456789	339
456789	068
456789	110
456789	001
123456	.
123456	010
123456	022
123456	339
123456	068
3598754	.
3598754	001
3598754	022
456789	.
456789	339
3598754	.
3598754	339

Above table contains number and some set of codes(starts with missing values). one set of code is for one card number (number). I need to find the unique set of codes in that column (set of codes) as below:

expected:

unique_set_of_codes	Unique_Count
.	4
001
022
.
022
022
339
068
110
001
.
010
022
339
068
.
339

Thanks in advance.

don21 · Posted 07-14-2015 06:12 AM

Hello,

The dataset that I have looks like:

Number	Set_of_codes
123456	.
123456	001
123456	001
456789	.
456789	022
456789	001
456789	339
456789	010
456789	022
456789	110
456789	110
789102	.
789102	111
789102	110
789102	022
123456	.
123456	022
123456	001
123456	339
123456	010
123456	022
123456	110
123456	110
456789	.
456789	001
456789	001

Expected Output is :

Number	Set_of_codes	unique_set_of_codes	Count
123456	.	.	3
123456	001	001
123456	001	001
456789	.	.
456789	022	022
456789	001	001
456789	339	339
456789	010	010
456789	022	022
456789	110	110
456789	110	110
789102	.	.
789102	111	111
789102	110	110
789102	022	022
123456	.
123456	022
123456	001
123456	339
123456	010
123456	022
123456	110
123456	110
456789	.
456789	001
456789	001

Every sequence of code starts with a blank. Need to identify such unique sets based on card numbers in col1 and the unique set count.

Note: There might be double missing values in sequence in between the data (col2) instead of single missing value.

Many thanks in advance.

Patrick · Posted 07-14-2015 08:17 AM

Your expected output has the same number of rows like your "have" data. The values for "Set_of_codes" are identical to "unique_set_of_codes". I don't understand what you want to do here. I also don't understand what you count of 3 is based on.

don21 · Posted 07-15-2015 12:06 AM

I need only col3 and col4 in the output. the count 3 is nothing but the count of unique set of codes.

SAS Training: Just a Click Away