BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
wajmsu
Obsidian | Level 7

Hi,

I have a dataset about the blood sample withdrawal with more information than I need.  I want to reduce the duplicates and change the way data are presented.  Here is an example of the format I have data in (table 1) below and I want to change it as is shown in table 2 (both tables are not real datasets).

 

Thanks for your guidance!

Table 1: Existing format

subject_idlabelsample_addresssample_iddatesubj_datevisitsample type
122227/23/20042 0407231Plasma
122q47/23/20042 0407231Plasma
122e62/10/20052 0502102Plasma
122D,1182/10/20052 0502102Plasma
122s107/23/20042 0407231Plasma
256c204/3/20076 0704031Plasma
256x144/3/20076 0704031Plasma
256D,111612/10/20076 0712102Plasma
256y1812/10/20076 0712102Plasma

 

Table 2: The format I need

subject_idlabeldate of 1st sampledate of 2nd samplesamp_typ
1227/23/20042/10/2005Plasma
2564/3/200712/10/2007Plasma
     
1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

What are your key variables that define a duplicate? It looks like ID, Label and Type?

 

So your proc sort would be something like:

 

proc sort data=have nodupkey out=deduped;
by id label type Date;
run;


*then run a proc transpose;

proc transpose data=deduped out=want prefix=date;
by id label type;
var date;
run;

View solution in original post

5 REPLIES 5
Reeza
Super User

Is 2 the maximum number of duplicates? Or will there be an indeterminate number of samples? 

 

You could do a proc sort, with the nodupkey option, and a proc transpose. 

wajmsu
Obsidian | Level 7

No, there are more than 2 duplicates.  Here is some problem.  The whole row is not duplicate; as you can see the column C and D are different.  I do not want them.  I want those variables which I showed in table 2. 

I tried to use sort procedure with no duplicates but it does not remove the duplicates, probably these are not 'real' duplicates as I mentioned above that one or two columns are different.

 

Reeza
Super User

What are your key variables that define a duplicate? It looks like ID, Label and Type?

 

So your proc sort would be something like:

 

proc sort data=have nodupkey out=deduped;
by id label type Date;
run;


*then run a proc transpose;

proc transpose data=deduped out=want prefix=date;
by id label type;
var date;
run;
wajmsu
Obsidian | Level 7

Hi Reeza,

I could reorganized the dat variables.  However, when I looked at the data found some IDs were with both dates and some were with one date.  How can I remove the IDs with one date?

 

thanks

 

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Would not something along the lines of (and no test data provided in a useable format to test this - i.e. a datastep so I don't have to type all that in) this work:

data want (keep=subject_id label date1 date2 samp_type);
  set have;
  by subject_id date;
  retain cnt date1 date2;
  if first.subject_id then do;
    cnt=1;
    date1=date;
  end;
  else if cnt=1 then do;
    cnt=2;
    date2=date;
  end;
  if last.subject_id then output;
  format date1 date2 date9.;
run;

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1765 views
  • 1 like
  • 3 in conversation