Hi,
I'm looking for a way to compare data between rows, and only show the distinct values, the idea is to create a repot that shows the all the possible contact numbers. I have the data from multiple sources and the same number is used as another contact number. The idea is to have all the unique contact numbers.
Is there a way to achieve this?, any help would be appreciated
Example of the data:
id | ph1 | ph2 | ph3 | ph4 | ph5 |
---|---|---|---|---|---|
1 | 111 | 111 | |||
1 | 222 | ||||
1 | 222 | ||||
2 | 333 | ||||
2 | 444 | 555 | |||
2 | 555 | ||||
3 | 666 | ||||
3 | 777 | ||||
3 | 888 | 888 |
Desired Output
id | ph1 | ph2 | ph3 | ph4 | ph5 |
---|---|---|---|---|---|
1 | 111 | 222 | |||
2 | 444 | 333 | 555 | ||
3 | 666 | 888 | 777 |
Mmm, seems to be a lot of these questions at the moment. You are struggling due to the nature of the thinking. The data does not need to be transposed in this way, you will find life much easier in a normalised table. Consider this, the second datastep creates a normalised structure with the results going down the page. This can then easily be nodupkey sorted to give the unique values.
data have;
retain id ph1-ph5;
id=1;
ph1=111;
ph4=111;
output;
ph1=.;
ph4=.;
ph3=222;
output;
ph2=222;
output;
run;
data want (keep=id res);
set have;
array ph{5};
do i=1 to 5;
if ph{i} ne . then do;
res=ph{i};
output;
end;
end;
run;
proc sort data=want nodupkey;
by id res;
run;
Hi, thank you for your response, I had that though and I was able to convert the data to use the same column for the phone number, and another column to display the type of phone number:
Example:
id | number | type |
---|---|---|
1 | 111 | ph1 |
1 | 222 | ph2 |
1 | 111 | ph5 |
2 | 333 | ph3 |
2 | 444 | ph4 |
My problem here is removing the duplicate numbers, and since every ID has at least 3 numbers, I have a lot of data. The result was would not fit on an excel spreadsheet.
Is there a way to "clean" the data in this format?
Why does the type matter? In your example in the first post you take 222 from ph3, why not from ph2? Doesn't seem to make sense to me. You basically just want to find a list of unique numbers per id, so drop type and nodupkey sort it, i.e. id=1 only has 2, 1 is 111, the other 222.
On the second point, you don't really want to use Excel, I mean what reason would you have. PDF for output, CSV or XML for transfer.
I'm being asked to display the type of contact information, for example if it is the house number, a cell, fax, etc, etc.
If there is a duplicate number, I should take just one, it does not matter witch type I choose.
So, just proc sort, nodupkey, by id and number. This will give you 1 row per unique id+number, with a ype (probably the first which appears in the dataset.
Thank you, I'm generating the information, so I can try what you suggested. I will let you know of the results.
Using the proc sort nodupkey by id and number, i was able to get 1 row, per id and number, Thanks!
I'm still removing numbers the users type just to fill in the fields, then I will need to transpose the result.
Thank you for your help!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.