<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Sorting observations with duplicate values into various bins in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Sorting-observations-with-duplicate-values-into-various-bins/m-p/580916#M165055</link>
    <description>&lt;P&gt;Not sure if you can do it in one step, but I tried this and it worked.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Only goes to one bin but it is a start&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;PROC SORT data=have;
  BY s_id mylastname descending c_id;
RUN;

DATA have0;  
  SET have;
  BY s_id mylastname descending c_id;
  RETAIN c_id1;
  IF first.mylastname THEN c_id1="   ";
  IF c_id ne "" THEN c_id1=c_id;
RUN;

PROC SORT data=have0;
  BY s_id c_id1 mylastname;
RUN;

DATA have1;
  SET have0;
  WHERE c_id1 ne "";
  BY s_id c_id1;
  IF first.c_id1;
RUN;

DATA have2;
  SET have0;
  WHERE c_id1="";
  BY s_id c_id1 mylastname;
  IF first.mylastname;
RUN;

DATA want;
  SET have1 have2;
RUN;&lt;/CODE&gt;&lt;/PRE&gt;</description>
    <pubDate>Tue, 13 Aug 2019 18:01:50 GMT</pubDate>
    <dc:creator>SwissC</dc:creator>
    <dc:date>2019-08-13T18:01:50Z</dc:date>
    <item>
      <title>Sorting observations with duplicate values into various bins</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sorting-observations-with-duplicate-values-into-various-bins/m-p/580879#M165045</link>
      <description>&lt;P&gt;I have a data set with a number of observations that have duplicate unique values. Yes, I know- duplicate uniques.&amp;nbsp; Lol. The entire data set is pairs of records with duplicate s_ids. &amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Anyway, I need to sort the out the duplicates in such a way that I can match them to another data set.&amp;nbsp; Nice little separate bins would help my brain. The relevant fields are an s_id, a c_id and name as shown below:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;data have;
input s_id c_id $ mylastname $ want $ record_num;
datalines;
111	Q11	JONES	Y	1
111	Q11	JONES	N	2
222	Q21	DOE	Y	3
222	Q21	SMITH	N	4
333	Q31	FRANKLIN	Y	5
333	Q32	MADISON	Y	6
444	Q41	POTTER	Y	7
444	Q42	POTTER	Y	8
555	.	CLYDE	Y	9
555	.	CLYDE	N	10
666	.	PITT	Y	11
666	.	JOLIE	Y	12
777	Q71	LIBERTY	Y	13
777	.	LIBERTY	N	14
888	.	TIMON	Y	15
888	Q81	PUUMBA	Y	16
;
run;
&lt;/PRE&gt;
&lt;P&gt;For this purpose, i have added the fields WANT and RECORD_NUM.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Walking through the pairs of records:&lt;/P&gt;
&lt;P&gt;I want record 1 but not record 2.&amp;nbsp; Those two records have the same s_id, the same c_id, and the same lastname.&amp;nbsp; I just need one of them.&amp;nbsp; I'll match this bin on s_id.&lt;/P&gt;
&lt;P&gt;I want record 3 but not record 4. &amp;nbsp;Those two records have the same s_id and the same c_id.&amp;nbsp; c_id trumps lastname. I'll match on this bin c_id.&lt;/P&gt;
&lt;P&gt;I want 5, 6, 7 and 8.&amp;nbsp; c_id's are all different and trump whatever is going on in last name.&amp;nbsp; I'll also match this bin on c_id.&lt;/P&gt;
&lt;P&gt;I want 9 but not 10.&amp;nbsp; No c_id to work with but the same lastname.&amp;nbsp; We'll skip 10. I'll match this on s_id.&lt;/P&gt;
&lt;P&gt;I'd like both 11 and 12.&amp;nbsp; Again no c_id to work with but different last names.&amp;nbsp; I might match on an s_id-name key. Don't know yet.&lt;/P&gt;
&lt;P&gt;I'd like 13 and not 14. They are likely the same person but 13 has a c_id and is a more complete record.&lt;/P&gt;
&lt;P&gt;I'd like 15.&amp;nbsp; Match on s_id.&lt;/P&gt;
&lt;P&gt;I'd like 16. &amp;nbsp; Match on c_id.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I decided I could sort the records and maybe use the properties of first. and last. to bin them out.&amp;nbsp; But I fell down doing it.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;proc sort data = have;
	by s_id c_id mylastname;
run;


data values;
	set have;
	by s_id c_id mylastname;
	uidf = first.s_id;
	uidl = last.s_id;
	cidf = first.c_id; 
	cidl = last.c_id; 
	namef = first.mylastname;
	namel = last.mylastname;
run;
&lt;/PRE&gt;
&lt;P&gt;Dataset VALUES (made above, shown below) shows what I am trying to say:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;s_id	c_id	mylastname	want	record_num	uidf	uidl	cidf	cidl	namef	namel
111	Q11	JONES		Y	1		1	0	1	0	1	0
111	Q11	JONES		N	2		0	1	0	1	0	1
222	Q21	DOE		Y	3		1	0	1	0	1	1
222	Q21	SMITH		N	4		0	1	0	1	1	1
333	Q31	FRANKLIN	Y	5		1	0	1	1	1	1
333	Q32	MADISON		Y	6		0	1	1	1	1	1
444	Q41	POTTER		Y	7		1	0	1	1	1	1
444	Q42	POTTER		Y	8		0	1	1	1	1	1
555		CLYDE		Y	9		1	0	1	0	1	0
555		CLYDE		N	10		0	1	0	1	0	1
666		JOLIE		Y	12		1	0	1	0	1	1
666		PITT		Y	11		0	1	0	1	1	1
777		LIBERTY		N	14		1	0	1	1	1	1
777	Q71	LIBERTY		Y	13		0	1	1	1	1	1
888		TIMON		Y	15		1	0	1	1	1	1
888	Q81	PUUMBA		Y	16		0	1	1	1	1	1
&lt;/PRE&gt;
&lt;P&gt;I then thought, okay i can pick up the 1,0 1,0 records and the 1,0 1,1 records and skip the 1,0 0,1 records etc .&amp;nbsp; So I started down that road:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data jones_doe potter_franklin_madison clyde;
	set have;
		by s_id c_id mylastname;

		*grabbing the first record of a pair where s_id and c_id are the same;
		if 
			not missing(c_id)
			and first.s_id = 1 
			and last.s_id = 0 
			and first.c_id = 1 
			and last.c_id = 0 
				then output jones_doe;
				* gets 1,3  leaves 2,4   match on s_id;

		* grabbing both records of a pair if s_id is the same and c_id is different;
		if 
			(not missing(c_id)
			and	first.s_id = 1 
			and last.s_id = 0 
			and first.c_id = 1 
			and last.c_id = 1) 
			or
			(not missing(c_id)
			and first.s_id = 0 
			and last.s_id = 1 
			and first.c_id = 1 
			and last.c_id = 1) 
				then output potter_franklin_madison;
				* gets 5,6,7,8  also picks up 13,16 but might be better to do them another way    match on c_id;

		*grabbing the first record of a pair where s_id and lname are the same and but c_id is missing;
		if 
			missing(c_id)
			and first.s_id = 1 
			and last.s_id = 0 
			and first.mylastname = 1 
			and last.mylastname = 0 
				then output clyde;
				* gets 9  leaves 10   match on s_id;

run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;This actually works and then it doesn't.&amp;nbsp; It gets me a bin with 1 and 3 in it (jones_doe) skipping 2 and 4, a bin with 5, 6, 7, and 8 (potter_franklin_madison), and a bin with 9 (clyde) skipping 10.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;But then I fall down.&amp;nbsp; I can't seem to pick up 11, 12, 13, 15, and 16 in such a way that 14 doesn't come along for the ride.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;So....&lt;/P&gt;
&lt;P&gt;can you help?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Should I abandon this method and do something else/&amp;nbsp; What else?&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Can I tweak this method and yet make it work?&amp;nbsp; How to tweak?&lt;/P&gt;
&lt;P&gt;Have i been completely obtuse in explaining what i am trying to do and you need more info?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 13 Aug 2019 16:36:04 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sorting-observations-with-duplicate-values-into-various-bins/m-p/580879#M165045</guid>
      <dc:creator>HB</dc:creator>
      <dc:date>2019-08-13T16:36:04Z</dc:date>
    </item>
    <item>
      <title>Re: Sorting observations with duplicate values into various bins</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sorting-observations-with-duplicate-values-into-various-bins/m-p/580891#M165051</link>
      <description>&lt;P&gt;Without testing it seems that you miss ELSE between IF statements:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data want;
 set have;
      by s_id c_id ...;
      if &amp;lt;conditions-1&amp;gt; then ...; ELSE
      if &amp;lt;conditions-2&amp;gt; then ...; ELSE
      if &amp;lt;conditions-3&amp;gt; then ...;
         else do; ...???... ; end;
run;
      &lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Tue, 13 Aug 2019 17:04:36 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sorting-observations-with-duplicate-values-into-various-bins/m-p/580891#M165051</guid>
      <dc:creator>Shmuel</dc:creator>
      <dc:date>2019-08-13T17:04:36Z</dc:date>
    </item>
    <item>
      <title>Re: Sorting observations with duplicate values into various bins</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sorting-observations-with-duplicate-values-into-various-bins/m-p/580916#M165055</link>
      <description>&lt;P&gt;Not sure if you can do it in one step, but I tried this and it worked.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Only goes to one bin but it is a start&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;PROC SORT data=have;
  BY s_id mylastname descending c_id;
RUN;

DATA have0;  
  SET have;
  BY s_id mylastname descending c_id;
  RETAIN c_id1;
  IF first.mylastname THEN c_id1="   ";
  IF c_id ne "" THEN c_id1=c_id;
RUN;

PROC SORT data=have0;
  BY s_id c_id1 mylastname;
RUN;

DATA have1;
  SET have0;
  WHERE c_id1 ne "";
  BY s_id c_id1;
  IF first.c_id1;
RUN;

DATA have2;
  SET have0;
  WHERE c_id1="";
  BY s_id c_id1 mylastname;
  IF first.mylastname;
RUN;

DATA want;
  SET have1 have2;
RUN;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Tue, 13 Aug 2019 18:01:50 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sorting-observations-with-duplicate-values-into-various-bins/m-p/580916#M165055</guid>
      <dc:creator>SwissC</dc:creator>
      <dc:date>2019-08-13T18:01:50Z</dc:date>
    </item>
    <item>
      <title>Re: Sorting observations with duplicate values into various bins</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sorting-observations-with-duplicate-values-into-various-bins/m-p/580927#M165059</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/285012"&gt;@SwissC&lt;/a&gt;&amp;nbsp;That gets close, I don't know why, but it does. Let me modify that, run it on the real data, and get back to you.&amp;nbsp; I think I can match your have1 on c_id (I'd prefer to do the jones case on s_id but c_id may work) and I'll try your have2 on s_id-name key.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 13 Aug 2019 19:08:53 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sorting-observations-with-duplicate-values-into-various-bins/m-p/580927#M165059</guid>
      <dc:creator>HB</dc:creator>
      <dc:date>2019-08-13T19:08:53Z</dc:date>
    </item>
  </channel>
</rss>

