Re: Comparing two values in a column

Anita_n · Posted 12-16-2021 12:16 PM

Dear all,

I have a data set that looks like this:

data test;
input patid  visit  var;
datalines;
1 1 5
2 1 2
5 3 9
1 2 10
3 1 8
7 4 6
1 3 15
2 2 4
5 1 14
5 2 11
3 2 16
;
run;

I wish to search for the patid's and their corresponding var 's in visit 1 which also occurs in visit 2 and find their mean, standard deviation, min, max, and N

In this case it will be the following patid's

data mydat;
input patid  visite  var;
datalines;
1 1 5
1 2 10
2 1 2
2 2 4
3 1 8
3 2 16
5 1 14
5 2 11
;
run;

I will appreciate any help. Thanks

PaigeMiller · Posted 12-16-2021 12:19 PM

Could you please explain this part?

"... and their corresponding var 's in visit 1 which also occurs in visit 2"

--
Paige Miller

Anita_n · Posted 12-16-2021 12:43 PM

I mean for example the patient with the id=1 has at visit 1 var=5 and at visit 2 var=10. I only wish to search for patients whose patid occurs in visit1 then search for exactly these patids in visit2. That means if a patient has visit 1 but does not occur in visit2, this is not considered. At the end, wish to do the above statistics on the column 'var'

data mydat;
input patid  visite  var;
datalines;
1 1 5
1 2 10
2 1 2
2 2 4
3 1 8
3 2 16
5 1 14
5 2 11
;
run;

I hope I could explain better

Reeza · Posted 12-16-2021 12:53 PM

Sounds like you want patients who have both a visit 1 and 2 and then want to summarize the variable VAR? Visits over two are disregarded?

ballardw · Posted 12-16-2021 12:41 PM

Which do you want, the filtered data set or the mean, std and such? The later doesn't require much work:

proc means data=test mean std min max n;
   where visit in (1 2);
   class patid;
   var var;
run;

The where statement in the proc means restricts the data to only visit numbers 1 and 2 and doesn't require any special coding to handle the cases where there are visit number 1 without a 2 , or possibly a visit number 2 without a visit 1 for each patient id.

Procedures Report and Tabulate would also do this.

If you need a data set with the summaries, say so.

Anita_n · Posted 12-16-2021 12:55 PM

@ballardw thanks for that. I will try that now to see if it works

Anita_n · Posted 12-16-2021 01:15 PM

@ballardw : I tried that your code but that is not exactly what I want. If I use your code as below I get N=8

proc means data=test mean std min max n;
   where visit in (1 2) ;
     var var;
run;

if I use the following code to select the number of patids in visit1, I get N=4

proc sql;
create table test2 as select patid as patid, var , visit from test
where visit =1;
quit;

That means my N can be less than 4 but not greater 4. From these 4. I now need to search if these 4 patid's also occur in visit 2. If yes then they will be included in my statistics. I hope you can understand what I mean

ballardw · Posted 12-16-2021 01:25 PM

@Anita_n wrote:

@ballardw : I tried that your code but that is not exactly what I want. If I use your code as below I get N=8
proc means data=test mean std min max n;
   where visit in (1 2) ;
     var var;
run;
if I use the following code to select the number of patids in visit1, I get N=4
proc sql;
create table test2 as select patid as patid, var , visit from test
where visit =1;
quit;
That means my N can be less than 4 but not greater 4. From these 4. I now need to search if these 4 patid's also occur in visit 2. If yes then they will be included in my statistics. I hope you can understand what I mean

Which means your example data is insufficient for your problem description. The N i got for each Id with your example data was 2 for each Patientid. AND your description appears very odd which is why @PaigeMiller asked for clarification of that phrase. Which means describe by selecting observations and showing the desired calculation.

The where clause supplied exactly matches your requested example Mydat data set:

data selected;
   set test;
   where visit in (1 2);
run;
proc sort data=selected;
  by patid visit;
run;

which results in:

patid    visit    var

  1        1        5
  1        2       10
  2        1        2
  2        2        4
  3        1        8
  3        2       16
  5        1       14
  5        2       11

Reeza · Posted 12-16-2021 01:02 PM

data test;
input patid  visit  var;
datalines;
1 1 5
2 1 2
5 3 9
1 2 10
3 1 8
7 4 6
1 3 15
2 2 4
5 1 14
5 2 11
3 2 16
;
run;

proc sort data=test;
by patid visit;
run;

proc sql;
create table valid_records as
select * 
from test
/*filters only visit=1 or 2 into data*/
where visit in (1, 2)
group by patid
/*ensures record has both a 1 & 2, ie 2 distinct values*/
having count(distinct visit)=2;
quit;

*summarizes across all values;
proc means data=valid_records n mean median std min max;
var var;
ods output summary = summary_visits;
run;

proc print data=summary_visits;
run;

*summarizes by patid values;
proc means data=valid_records NWAY n mean median std min max;
class patid;
var var;
ods output summary = summary_visits_ID;
run;

proc print data=summary_visits_ID;
run;

*Summarize by visit;
proc means data=valid_records NWAY n mean median std min max;
class visit;
var var;
ods output summary = summary_visits_by_visit;
run;


*transpose to a wide format for example - may make more sense to you in this format;
proc transpose data=test out=wide (where = (not missing(visit_2))) prefix=visit_;
where visit in (1, 2);
by patid;
id visit;
var var;
run;

proc means data=wide NWAY n mean median std min max;
var visit_:;
ods output summary = summary_visits2;
run;

proc print data=summary_visits2;
run;

HTH

@Anita_n wrote:

Dear all,

I have a data set that looks like this:
data test;
input patid  visit  var;
datalines;
1 1 5
2 1 2
5 3 9
1 2 10
3 1 8
7 4 6
1 3 15
2 2 4
5 1 14
5 2 11
3 2 16
;
run;
I wish to search for the patid's and their corresponding var 's in visit 1 which also occurs in visit 2 and find their mean, standard deviation, min, max, and N

In this case it will be the following patid's
data mydat;
input patid  visite  var;
datalines;
1 1 5
1 2 10
2 1 2
2 2 4
3 1 8
3 2 16
5 1 14
5 2 11
;
run;
I will appreciate any help. Thanks

Anita_n · Posted 12-16-2021 01:27 PM

@Reeza @ballardw @PaigeMiller : It should something Iike this, first of all search for the patids that occurs in visit1 then something like this statement:

if (patids in visit1 occurs in visit2) then include this in my dataset

PaigeMiller · Posted 12-16-2021 01:54 PM

@Anita_n wrote:

@Reeza @ballardw @PaigeMiller : It should something Iike this, first of all search for the patids that occurs in visit1 then something like this statement:

if (patids in visit1 occurs in visit2) then include this in my dataset

Honestly, this doesn't help me at all understand, as "visit1 occurs in visit2" means nothing to me. I think your earlier explanation was more helpful "That means if a patient has visit 1 but does not occur in visit2, this is not considered." But you still have not answered the questions from @Reeza:

Sounds like you want patients who have both a visit 1 and 2 and then want to summarize the variable VAR? Visits over two are disregarded?

--
Paige Miller

Anita_n · Posted 12-16-2021 02:09 PM

@PaigeMiller : Yes this statement from @Reeza :Sounds like you want patients who have both a visit 1 and 2 and then want to summarize the variable VAR? Visits over two are disregarded?

is right

Reeza · Posted 12-16-2021 03:00 PM

Have you had a chance to verify my solution?

Anita_n · Posted 12-16-2021 03:27 PM

@Reeza Yes I did, I found it okay till where you created the valid_records but then you summarized across all values which is not what I want, then you summarized by patid, I did not understand why.

Your tranposed dataset is close to what I want. But I would then wish to do a summarized statistics on these two cloumns (visit_1 and visit_2). Is there any way to do a summarized statistics on both columns?

Reeza · Posted 12-16-2021 03:39 PM

Please check the edited posts - note that my proc means was originally wrong, I had it pointed at your original data set not the filtered data. Try these ones and one should be the correct answer. Mostly because I don't think there's any possible way to slice the data 🙂

SAS Training: Just a Click Away