BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Etoo12121
Obsidian | Level 7

Hi all,

 

I have a bit of a tricky situation. In the dataset below, I have multiple ID's, dates and codes. I'm trying to figure out the best way to manipulate it such that:

 

for each ID and DATE that are the same, keep rows where CODE in (0441T, 0442T). 

OR

if ID and DATE are the same, keep rows that have a combination of both CODES 64640 AND C2618.

 

Data in bold is what should be kept if the logic works correctly.

 

CODEIDDATE
646401234506/30/2020
646401234506/30/2020
646402345602/10/2020
C26182345602/10/2020
C26183456706/24/2020
C26184567801/06/2020
0442T5678902/11/2020
C26186789002/11/2020
646407890102/14/2020
0441T8901202/10/2020
C26189012302/17/2020
C26181123401/22/2020
646402234502/10/2020
646402234502/10/2020
C26182234502/10/2020
646402234502/10/2020
646402234502/10/2020
646402234502/10/2020
0441T3345603/16/2020
C26183345603/16/2020
646404455503/22/2020
C26184455504/18/2020

 

Any ideas on how to approach this?

1 ACCEPTED SOLUTION

Accepted Solutions
jimbarbour
Meteorite | Level 14

Ah, well, you didn't say that in your original post, but it can be accommodated.  Take a look at the below code and results. 

 

I'm basically scanning through all the rows and setting Boolean flags for each of the four codes you specified.  When LAST. is true, I evaluate the flags to see what combination of codes has been encountered.  If the combination meets the criteria, a row is added to the Keys dataset which is later inner joined with the Have dataset to obtain our final results.  I've found the Boolean flag technique for situations like this.  It's worth knowing about.

 

Jim

 

DATA	Have;
	FORMAT	ID	$5.	Date	MMDDYY10.	Code	$5.;
	INFILE	DATALINES	DSD	DLM='09'X;
	INPUT	CODE	$	ID	$	DATE	:	MMDDYY10.;
DATALINES;
64640	12345	06/30/2020
64640	12345	06/30/2020
64640	23456	02/10/2020
C2618	23456	02/10/2020
C2618	34567	06/24/2020
C2618	45678	01/06/2020
0442T	56789	02/11/2020
C2618	67890	02/11/2020
64640	78901	02/14/2020
0441T	89012	02/10/2020
C2618	90123	02/17/2020
C2618	11234	01/22/2020
64640	22345	02/10/2020
64640	22345	02/10/2020
C2618	22345	02/10/2020
64640	22345	02/10/2020
64640	22345	02/10/2020
64640	22345	02/10/2020
0441T	33456	03/16/2020
C2618	33456	03/16/2020
64640	44555	03/22/2020
C2618	44555	04/18/2020
;
RUN;

PROC	SORT	DATA=Have;
	BY	ID	DATE;
RUN;

DATA	Keys;
	DROP	_:	Code;
	SET	Have;
		BY	ID	DATE;
	RETAIN	_CD0441T	0
			_CD0442T	0 
			_CD64640	0
			_CDC2618	0
			;

	SELECT	(UPCASE(CODE));
		WHEN	('0441T')
			_CD0441T	=	1;
		WHEN	('0442T')
			_CD0442T	=	1;
		WHEN	('64640')
			_CD64640	=	1;
		WHEN	('C2618')
			_CDC2618	=	1;
		OTHERWISE
			DO;
			END;
	END;

	IF	Last.DATE	THEN
		DO;
			IF	(_CD0441T	AND	NOT	(_CD64640	OR	_CDC2618))	OR
				(_CD0442T	AND	NOT	(_CD64640	OR	_CDC2618))	OR
				(_CD64640	AND	_CDC2618)	THEN
				DO;
					CALL	MISSING(_CD0441T,_CD0442T,_CD64640,_CDC2618);
					OUTPUT;
				END;
			ELSE
				DO;
					CALL	MISSING(_CD0441T,_CD0442T,_CD64640,_CDC2618);
					DELETE;
				END;
		END;
	ELSE
		DO;
			DELETE;
		END;
RUN;

PROC	SQL	NOPRINT;
	CREATE	TABLE	Want	AS
		SELECT	Have.*
			FROM	Have
			INNER	JOIN	Keys
				ON	Have.ID		=	Keys.ID	
				AND	Have.Date	=	Keys.Date
			;
QUIT;

Results:

jimbarbour_0-1628798051446.png

 

View solution in original post

5 REPLIES 5
jimbarbour
Meteorite | Level 14

OK, interesting problem.

 

But why are the following exluded?

0441T 33456 03/16/2020
C2618 33456 03/16/2020

I see a 0441T.  Shouldn't those two be included?

 

Here's a sample program, but it's going to pick up the above two rows because of the 0441T.  Results are below the code.

 

Jim

 

DATA	Have;
	FORMAT	ID	$5.	Date	MMDDYY10.	Code	$5.;
	INFILE	DATALINES	DSD	DLM='09'X;
	INPUT	CODE	$	ID	$	DATE	:	MMDDYY10.;
DATALINES;
64640	12345	06/30/2020
64640	12345	06/30/2020
64640	23456	02/10/2020
C2618	23456	02/10/2020
C2618	34567	06/24/2020
C2618	45678	01/06/2020
0442T	56789	02/11/2020
C2618	67890	02/11/2020
64640	78901	02/14/2020
0441T	89012	02/10/2020
C2618	90123	02/17/2020
C2618	11234	01/22/2020
64640	22345	02/10/2020
64640	22345	02/10/2020
C2618	22345	02/10/2020
64640	22345	02/10/2020
64640	22345	02/10/2020
64640	22345	02/10/2020
0441T	33456	03/16/2020
C2618	33456	03/16/2020
64640	44555	03/22/2020
C2618	44555	04/18/2020
;
RUN;

PROC	SORT	DATA=Have;
	BY	ID	DATE;
RUN;

DATA	Keys;
	DROP	_:	Code;
	SET	Have;
		BY	ID	DATE;
	RETAIN	_CD0441T	0
			_CD0442T	0 
			_CD64640	0
			_CDC2618	0
			;

	SELECT	(UPCASE(CODE));
		WHEN	('0441T')
			_CD0441T	=	1;
		WHEN	('0442T')
			_CD0442T	=	1;
		WHEN	('64640')
			_CD64640	=	1;
		WHEN	('C2618')
			_CDC2618	=	1;
		OTHERWISE
			DO;
			END;
	END;

	IF	Last.DATE	THEN
		DO;
			IF	_CD0441T	OR
				_CD0442T	OR
				(_CD64640	AND	_CDC2618)	THEN
				DO;
					CALL	MISSING(_CD0441T,_CD0442T,_CD64640,_CDC2618);
					OUTPUT;
				END;
			ELSE
				DO;
					CALL	MISSING(_CD0441T,_CD0442T,_CD64640,_CDC2618);
					DELETE;
				END;
		END;
	ELSE
		DO;
			DELETE;
		END;
RUN;

PROC	SQL	NOPRINT;
	CREATE	TABLE	Want	AS
		SELECT	Have.*
			FROM	Have
			INNER	JOIN	Keys
				ON	Have.ID		=	Keys.ID	
				AND	Have.Date	=	Keys.Date
				;
QUIT;

Results:

jimbarbour_0-1628794674415.png

 

Etoo12121
Obsidian | Level 7
Hi @jimbarbour

Those two should't be included because the logic for 0441T should be just that code alone. It can't be in combination with any other code. The only combination allowed should be 64640 and C2618.
jimbarbour
Meteorite | Level 14

Ah, well, you didn't say that in your original post, but it can be accommodated.  Take a look at the below code and results. 

 

I'm basically scanning through all the rows and setting Boolean flags for each of the four codes you specified.  When LAST. is true, I evaluate the flags to see what combination of codes has been encountered.  If the combination meets the criteria, a row is added to the Keys dataset which is later inner joined with the Have dataset to obtain our final results.  I've found the Boolean flag technique for situations like this.  It's worth knowing about.

 

Jim

 

DATA	Have;
	FORMAT	ID	$5.	Date	MMDDYY10.	Code	$5.;
	INFILE	DATALINES	DSD	DLM='09'X;
	INPUT	CODE	$	ID	$	DATE	:	MMDDYY10.;
DATALINES;
64640	12345	06/30/2020
64640	12345	06/30/2020
64640	23456	02/10/2020
C2618	23456	02/10/2020
C2618	34567	06/24/2020
C2618	45678	01/06/2020
0442T	56789	02/11/2020
C2618	67890	02/11/2020
64640	78901	02/14/2020
0441T	89012	02/10/2020
C2618	90123	02/17/2020
C2618	11234	01/22/2020
64640	22345	02/10/2020
64640	22345	02/10/2020
C2618	22345	02/10/2020
64640	22345	02/10/2020
64640	22345	02/10/2020
64640	22345	02/10/2020
0441T	33456	03/16/2020
C2618	33456	03/16/2020
64640	44555	03/22/2020
C2618	44555	04/18/2020
;
RUN;

PROC	SORT	DATA=Have;
	BY	ID	DATE;
RUN;

DATA	Keys;
	DROP	_:	Code;
	SET	Have;
		BY	ID	DATE;
	RETAIN	_CD0441T	0
			_CD0442T	0 
			_CD64640	0
			_CDC2618	0
			;

	SELECT	(UPCASE(CODE));
		WHEN	('0441T')
			_CD0441T	=	1;
		WHEN	('0442T')
			_CD0442T	=	1;
		WHEN	('64640')
			_CD64640	=	1;
		WHEN	('C2618')
			_CDC2618	=	1;
		OTHERWISE
			DO;
			END;
	END;

	IF	Last.DATE	THEN
		DO;
			IF	(_CD0441T	AND	NOT	(_CD64640	OR	_CDC2618))	OR
				(_CD0442T	AND	NOT	(_CD64640	OR	_CDC2618))	OR
				(_CD64640	AND	_CDC2618)	THEN
				DO;
					CALL	MISSING(_CD0441T,_CD0442T,_CD64640,_CDC2618);
					OUTPUT;
				END;
			ELSE
				DO;
					CALL	MISSING(_CD0441T,_CD0442T,_CD64640,_CDC2618);
					DELETE;
				END;
		END;
	ELSE
		DO;
			DELETE;
		END;
RUN;

PROC	SQL	NOPRINT;
	CREATE	TABLE	Want	AS
		SELECT	Have.*
			FROM	Have
			INNER	JOIN	Keys
				ON	Have.ID		=	Keys.ID	
				AND	Have.Date	=	Keys.Date
			;
QUIT;

Results:

jimbarbour_0-1628798051446.png

 

Etoo12121
Obsidian | Level 7
perfect!!!! Thank you so much @jimbarbour
Ksharp
Super User
DATA	Have;
	FORMAT	ID	$5.	Date	MMDDYY10.	Code	$5.;
	INFILE	DATALINES expandtabs truncover;
	INPUT	CODE	$	ID	$	DATE	:	MMDDYY10.;
DATALINES;
64640	12345	06/30/2020
64640	12345	06/30/2020
64640	23456	02/10/2020
C2618	23456	02/10/2020
C2618	34567	06/24/2020
C2618	45678	01/06/2020
0442T	56789	02/11/2020
C2618	67890	02/11/2020
64640	78901	02/14/2020
0441T	89012	02/10/2020
C2618	90123	02/17/2020
C2618	11234	01/22/2020
64640	22345	02/10/2020
64640	22345	02/10/2020
C2618	22345	02/10/2020
64640	22345	02/10/2020
64640	22345	02/10/2020
64640	22345	02/10/2020
0441T	33456	03/16/2020
C2618	33456	03/16/2020
64640	44555	03/22/2020
C2618	44555	04/18/2020
;
RUN;

proc sql;
create table want as
select *
 from have 
  group by id,date
   having sum(CODE in ('0441T' '0442T'))=count(*) or 
          (sum(code='64640') ne 0 and sum(code='C2618') ne 0 and count(distinct code)=2 );
quit;

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 862 views
  • 2 likes
  • 3 in conversation