- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
Am in middle of some project and am trying to find a logic for the below. Please advise.
Dataset
col1 col2 col3
123
123 001
123 001
456
456 101
123
123 033
123 101
456
456 001
456 001
there are set of numbers in col1 and some 3 dig codes in col2 which starts with a blank value. I need the unique set of codes(col2) which are either repetitive or non repetitve. all the unique sets needs to be printed either into new variable or a new dataset. Please advise.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Not sure I am quite following you. Numbers cannot have a blank value? For duplicates, use:
proc sort data=have out=want (keep=col2) nodupkey;
by col2;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Please provide what you would expect the output to look like as there are several ways to interpret your written requirement.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
The dataset I have looks like:
Number | Set_of_codes |
123456 | . |
123456 | 001 |
123456 | 001 |
456789 | . |
456789 | 022 |
456789 | 001 |
456789 | 339 |
456789 | 010 |
456789 | 022 |
456789 | 110 |
456789 | 110 |
789102 | . |
789102 | 111 |
789102 | 110 |
789102 | 022 |
123456 | . |
123456 | 022 |
123456 | 001 |
123456 | 339 |
123456 | 010 |
123456 | 022 |
123456 | 110 |
123456 | 110 |
456789 | . |
456789 | 001 |
456789 | 001 |
Expected Output is :
Number | Set_of_codes | unique_set_of_codes | Count |
123456 | . | . | 3 |
123456 | 001 | 001 | |
123456 | 001 | 001 | |
456789 | . | . | |
456789 | 022 | 022 | |
456789 | 001 | 001 | |
456789 | 339 | 339 | |
456789 | 010 | 010 | |
456789 | 022 | 022 | |
456789 | 110 | 110 | |
456789 | 110 | 110 | |
789102 | . | . | |
789102 | 111 | 111 | |
789102 | 110 | 110 | |
789102 | 022 | 022 | |
123456 | . | ||
123456 | 022 | ||
123456 | 001 | ||
123456 | 339 | ||
123456 | 010 | ||
123456 | 022 | ||
123456 | 110 | ||
123456 | 110 | ||
456789 | . | ||
456789 | 001 | ||
456789 | 001 |
Every sequence of code starts with a blank. Need to identify such unique sets based on card numbers in col1 and the unique set count.
Many thanks in advance.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
In addition to RW9's response using the nodupkey in the sort procedure you can also use the select distinct in a sql procedure:
proc sql;
create table want as
select unique col2
from have;
quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
sorry for the confusion, Below is the detailed explanation:
Number | Set_of_codes |
123456 | . |
123456 | 001 |
123456 | 022 |
456789 | . |
456789 | 022 |
456789 | 022 |
456789 | 339 |
456789 | 068 |
456789 | 110 |
456789 | 001 |
123456 | . |
123456 | 010 |
123456 | 022 |
123456 | 339 |
123456 | 068 |
3598754 | . |
3598754 | 001 |
3598754 | 022 |
456789 | . |
456789 | 339 |
3598754 | . |
3598754 | 339 |
Above table contains number and some set of codes(starts with missing values). one set of code is for one card number (number). I need to find the unique set of codes in that column (set of codes) as below:
expected:
unique_set_of_codes | Unique_Count |
. | 4 |
001 | |
022 | |
. | |
022 | |
022 | |
339 | |
068 | |
110 | |
001 | |
. | |
010 | |
022 | |
339 | |
068 | |
. | |
339 |
Thanks in advance.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
The dataset that I have looks like:
Number | Set_of_codes |
123456 | . |
123456 | 001 |
123456 | 001 |
456789 | . |
456789 | 022 |
456789 | 001 |
456789 | 339 |
456789 | 010 |
456789 | 022 |
456789 | 110 |
456789 | 110 |
789102 | . |
789102 | 111 |
789102 | 110 |
789102 | 022 |
123456 | . |
123456 | 022 |
123456 | 001 |
123456 | 339 |
123456 | 010 |
123456 | 022 |
123456 | 110 |
123456 | 110 |
456789 | . |
456789 | 001 |
456789 | 001 |
Expected Output is :
Number | Set_of_codes | unique_set_of_codes | Count |
123456 | . | . | 3 |
123456 | 001 | 001 | |
123456 | 001 | 001 | |
456789 | . | . | |
456789 | 022 | 022 | |
456789 | 001 | 001 | |
456789 | 339 | 339 | |
456789 | 010 | 010 | |
456789 | 022 | 022 | |
456789 | 110 | 110 | |
456789 | 110 | 110 | |
789102 | . | . | |
789102 | 111 | 111 | |
789102 | 110 | 110 | |
789102 | 022 | 022 | |
123456 | . | ||
123456 | 022 | ||
123456 | 001 | ||
123456 | 339 | ||
123456 | 010 | ||
123456 | 022 | ||
123456 | 110 | ||
123456 | 110 | ||
456789 | . | ||
456789 | 001 | ||
456789 | 001 |
Every sequence of code starts with a blank. Need to identify such unique sets based on card numbers in col1 and the unique set count.
Note: There might be double missing values in sequence in between the data (col2) instead of single missing value.
Many thanks in advance.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Your expected output has the same number of rows like your "have" data. The values for "Set_of_codes" are identical to "unique_set_of_codes". I don't understand what you want to do here. I also don't understand what you count of 3 is based on.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I need only col3 and col4 in the output. the count 3 is nothing but the count of unique set of codes.