Hi.
I need to remove repeating data that are being generated to fill in blanks in a data set. I have several numeric vars and characters that correlate to each X coded (using only a few in the example). Vars D and E below are the ones filling in blanks. The numeric vars are not in any order, including var X. So some ids might have X going from 1 to 10 others only 1 and 2, others 10 and 11 etc X is 1 to to N. Dates repeat and it is okay to repeat, as I need the dates to be linked to each of these variables that are linked to one another. But D and E should be "cleaned" and be only in the first record of each ID. This does not work as I need to keep the repeated Ids.
if not first.id then call missing (D, E);
Is there a way to remove the duplicates in this case? Thank you!
Example
ID X A B C D E
1 3 date3 1 ‘character varB1’ 101 ‘character not linked to varB’
1 3 date 3 2 ‘character varB2’ 101 ‘character not linked to varB’
1 10 date10 5 ‘character varB5’ 300 ‘character not linked to varB’
2 1 date1 13 ‘character var13’ 500 ‘character not linked to varB’
2 1 date1 10 ‘character varB10’ 500 ‘character not linked to varB’
2 2 date2 10 ‘character varB10’ 700 ‘character not linked to varB’
2 2 date2 1 ‘character varB1’ 700 ‘character not linked to varB’
3 10 date10 5 ‘character varB5’ 400 ‘character not linked to varB’
3 11 date11 6 ‘character varB6’ 600 ‘character not linked to varB’
3 11 date11 3 ‘character varB3’ 600 ‘character not linked to varB’
Should look like:
ID X A B C D E
1 3 date3 1 ‘charactervarB1’ 101 ‘character not linked to varB’
1 3 date 3 2 ‘character varB2’
1 10 date10 5 ‘character varB5’ 300 ‘character not linked to varB’
2 1 date1 13 ‘character varB13’ 500 ‘character not linked to varB’
2 1 date1 10 ‘character varB10’
2 2 date2 10 ‘character varB10’ 700 ‘character not linked to varB’
2 2 date2 1 ‘character varB1’
3 10 date10 5 ‘character varB5’ 400 ‘character not linked to varB’
3 11 date11 6 'character varB6’ 600 ‘character not linked to varB’
3 11 date11 3 'character varB3’
It appears that you want to use A not ID for identifying duplicates.
BY ID A;
if not first.A then call missing(D, E);
maybe. You don't show actual values for date an may not work.
Without an actual data set can't test.
Thank you. I can't use A, as I need all the dates, even the repeating ones.
When I use A all goes to missing except the very first row:
ID X A B C D E
1 1 2/25/2004 2 ‘yellow’ 100 ‘long text1’
1 1 2/25/2004 14 ‘black’ 100 ‘long text1’
1 2 2/26/2004 2 ‘green’ 300 ‘longtext2’
1 2 2/26/2004 13 ‘white’ 300 ‘longtext2’
1 2 2/26/2004 14 ‘blue’ 300 ‘longtext2’
@Mscarboncopy wrote:
When I use A all goes to missing except the very first row:
ID X A B C D E
1 1 2/25/2004 2 ‘yellow’ 100 ‘long text1’
1 1 2/25/2004 14 ‘black’ 100 ‘long text1’
1 2 2/26/2004 2 ‘green’ 300 ‘longtext2’
1 2 2/26/2004 13 ‘white’ 300 ‘longtext2’
1 2 2/26/2004 14 ‘blue’ 300 ‘longtext2’
IF that is output you need to show the corresponding INPUT and the actual code you ran, not one line of code but the entire datastep. Your example here isn't very good at showing "all goes to missing".
I don't have an input in my sas code, as I am using a very long file that is given to me in SPSS format which I transform it to a sas file. What I showed previously is my input for ID 1. I am trying to have the following output:
ID X A B C D E
1 1 2/25/2004 2 ‘yellow’ 100 ‘long text1’
1 1 2/25/2004 14 ‘black’
1 2 2/26/2004 2 ‘green’ 300 ‘longtext2’
1 2 2/26/2004 13 ‘white’
1 2 2/26/2004 14 ‘blue’
Thank you.
@Mscarboncopy wrote:
I don't have an input in my sas code, as I am using a very long file that is given to me in SPSS format which I transform it to a sas file. What I showed previously is my input for ID 1. I am trying to have the following output:
ID X A B C D E
1 1 2/25/2004 2 ‘yellow’ 100 ‘long text1’
1 1 2/25/2004 14 ‘black’
1 2 2/26/2004 2 ‘green’ 300 ‘longtext2’
1 2 2/26/2004 13 ‘white’
1 2 2/26/2004 14 ‘blue’
Thank you.
And the code attempted? The entire data step please.
I used your suggestion but made a slight mistake. When Reeza mentioned the output would have been correct, I went back and found my mistake. I answered above with my apologies. Your suggestion worked perfectly. It is the solution. Thank you.
Then you aren't using BallardW code, I suspect you're still using If First.A
if not first.A then call missing(D, E);
Thank you Reeza. My apologies to both of you, as I made a mistake when I tried BallardW code. it worked. That is what I needed.
Thank you so much. I tried your suggestion but made a slight mistake at first.
It works perfectly.
Thank you. Yes, that would be another good way to go about it. I was under the impression that the data set could be cleaned, but it is okay if it is a limitation. I can definitely create reports that would take care of that as you suggested. Good point. Much appreciated.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.