BookmarkSubscribeRSS Feed
Hemamalini
Calcite | Level 5

Hello Everyone,

I am new to this forum and currently learning SAS. 

I want to do QC for data obtained from different round like 1,2,3,4 etc. by comparison each value for each round (round 2 vs round 1, round 3 vs round 2 vs round 1) and placed all mismatch value into a new column at the end as 'QC comment' based on the condition of PID and visit.

1)First the program should check whether any new PID is added or removed when compared review from round2~round1. If found, keep entire row as a single colour with comment as 'PID missing or added'

2)For other column value mismatch, program should display comment available in 'QC comment' column.

3)'QC comment' column should be added as last column as more than 200 columns are available for comparison and dynamic row to be added.

 

Attaching excel file for your reference.

 

data patients;
input PID	AGE	Visit	Round_review	AE $	DRUG$	FRUG21	S3	S4 $	S5	S6	S7	T1	T2 $ T3 $20.;
cards;
101	33	1	1	asthma	X	1	2	present		2	1	2	1	available	not showing positivity >2.4
101	45	2	1	pain	X	1	2	present		3	1	2	2	notavailable	not showing positivity >2.5
101	57	1	1	asthma	X	1	2	present		4	1	2	3	available	showing positivity >2.6
101	69	3	1	Vomit	X	1	2	present		5	1	2	1	notavailable	not showing positivity >2.7
102	81	1	1	asthma	X	1	2	present		3	1	2	2	available	not showing positivity >2.8
102	93	1	1	pain	X	1	2	present		7	1	2	3	notavailable	not showing positivity >2.9
103	21	2	1	pain	X	1	2	present		8	1	2	2	available	not showing positivity >2.10
104	24	3	1	pain	X	1	2	present		9	1	2	3	notavailable	not showing positivity >2.11
101	33	1	2	asthma	X	1	2	present		2	1	2	1	available	not showing positivity >2.4
101	45	2	2	pain	X	1	2	present		12		13	3	notavailable	not showing positivity >2.5
101	57	1	2	asthma	X	1	2	present		4	1	2	3	available	showing positivity >2.6
101	69	3	2	Vomit	X	1	2	present		5	1	2	1	notavailable	not showing positivity >2.7
102	81	1	2	asthma	X	1	2	present		6	1	2	2	available	not showing positivity >2.8
102	93	1	2	pain	X	1	2	present		7	1	2	3	notavailable	not showing positivity >2.9
103	21	2	2	pain	X	1	2	present		8	1	2	2	available	not showing positivity >2.10
105	24	3	2	Vomit	X	1	2	present		9	1	2	3	notavailable	not showing positivity >2.17
108	21	2	2	pain	X	1	2	present		8	1	2	2	available	not showing positivity >2.10
102	81	7	2	asthma	X	1	2	present		6	1	2	2	available	
101	57		3	asthma	X	1	2	present		4	1	2	3	available	showing positivity >2.6
101	69	3	3	Vomit	X	1	2	present		5	1	2	1	notavailable	not showing positivity >2.7
102	81	1	3	asthma	SALBUTAMOL	1	2	NOTFOUND		7	1	34	2	available	not showing positivity >2.8

;
run;

 

I tried using Proc sql and do loop but it is not working for every value data.

 

 

 

2 REPLIES 2
ballardw
Super User

Many users here don't want to download Excel files because of virus potential, others have such things blocked by security software. Also if you give us Excel we have to create a SAS data set and due to the non-existent constraints on Excel data cells the result we end up with may not have variables of the same type (numeric or character) and even values.

 

Some hints: Only include variables in the examples that are needed for your problem or only one or two with the values.

Only provide enough rows of data to demonstrate 2 or 3 use cases (such as identifiers and and enough "rounds" to show the results).

Provide an expected output for the given example data.

In your problem description refer to the variable names in the example data. We have to guess that "round" refers to "round_review.

Make sure that the code as pasted into the forum runs correctly. When I copy that data step and run it I get two Invalid data warnings. When you have MISSING values for data and use List input place a . where the data is missing. Also your Drug, S4, T2 and T3 variables are truncated making very few values for T3.

data patients;
infile datalines truncover;
input PID	AGE	Visit	Round_review	AE $	DRUG :$15. FRUG21	S3	S4 :$10.	S5	S6	S7	T1	T2 :$12. T3 $30.;
cards;
101	33	1	1	asthma	X	1	2	present		2	1	2	1	available	not showing positivity >2.4
101	45	2	1	pain	X	1	2	present		3	1	2	2	notavailable	not showing positivity >2.5
101	57	1	1	asthma	X	1	2	present		4	1	2	3	available	showing positivity >2.6
101	69	3	1	Vomit	X	1	2	present		5	1	2	1	notavailable	not showing positivity >2.7
102	81	1	1	asthma	X	1	2	present		3	1	2	2	available	not showing positivity >2.8
102	93	1	1	pain	X	1	2	present		7	1	2	3	notavailable	not showing positivity >2.9
103	21	2	1	pain	X	1	2	present		8	1	2	2	available	not showing positivity >2.10
104	24	3	1	pain	X	1	2	present		9	1	2	3	notavailable	not showing positivity >2.11
101	33	1	2	asthma	X	1	2	present		2	1	2	1	available	not showing positivity >2.4
101	45	2	2	pain	X	1	2	present		12	.	13	3	notavailable	not showing positivity >2.5
101	57	1	2	asthma	X	1	2	present		4	1	2	3	available	showing positivity >2.6
101	69	3	2	Vomit	X	1	2	present		5	1	2	1	notavailable	not showing positivity >2.7
102	81	1	2	asthma	X	1	2	present		6	1	2	2	available	not showing positivity >2.8
102	93	1	2	pain	X	1	2	present		7	1	2	3	notavailable	not showing positivity >2.9
103	21	2	2	pain	X	1	2	present		8	1	2	2	available	not showing positivity >2.10
105	24	3	2	Vomit	X	1	2	present		9	1	2	3	notavailable	not showing positivity >2.17
108	21	2	2	pain	X	1	2	present		8	1	2	2	available	not showing positivity >2.10
102	81	7	2	asthma	X	1	2	present		6	1	2	2	available	 .
101	57	.	3	asthma	X	1	2	present		4	1	2	3	available	showing positivity >2.6
101	69	3	3	Vomit	X	1	2	present		5	1	2	1	notavailable	not showing positivity >2.7
102	81	1	3	asthma	SALBUTAMOL	1	2	NOTFOUND		7	1	34	2	available	not showing positivity >2.8
;

 

Now, what variable is compared to which variable on which records. SHOW the output of that comparison for your example data. And exactly what will you use that added information for?

 

If you only want to compare the same variable on different records this is one way if most of the records are expected to be very similar except for the ROUND_REVIEW to compare one round with another. I am not going to place "values" into a "single column" as I don't know what you expect that to actually be. When you say you have 200 columns to compare then trying to compare those and place all of the information in a single variable could take thousands of characters to be useable in any form.

Proc sort data=patients;
   by pid round_review visit;
run;

proc compare base=patients (where=(round_review=1))
             compare=patients (where=(round_review=2))
;
   by pid;

run;

This creates a summary for each PID. If there are no matches then either the PID is new or there are no following rounds.

There are many options in Proc Compare for creating output data sets containing differing things. I'll leave that there for you to research.

 

Kurt_Bremser
Super User

Never use tabs in code, unless you're fond of finding such log lines:

71         input PIDAGEVisitRound_reviewAE $DRUG$FRUG21S3S4 $S5S6S7T1T2 $ T3 $20.;

which make debugging harder than necessary.

And you also have tabs in your CARDS block, so I needed to add this INFILE statement:

infile datalines dlm="09"x dsd truncover;

to get the data step to work correctly.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 353 views
  • 0 likes
  • 3 in conversation