BookmarkSubscribeRSS Feed
Anita_n
Pyrite | Level 9

Dear all,

I have a data set that looks like this:

data test;
input patid  visit  var;
datalines;
1 1 5
2 1 2
5 3 9
1 2 10
3 1 8
7 4 6
1 3 15
2 2 4
5 1 14
5 2 11
3 2 16
;
run;

I wish to search for the patid's and their corresponding var 's in visit 1 which also occurs in visit 2 and find their mean, standard deviation, min, max, and N

 

In this case it will be the following patid's

data mydat;
input patid  visite  var;
datalines;
1 1 5
1 2 10
2 1 2
2 2 4
3 1 8
3 2 16
5 1 14
5 2 11
;
run;

I will appreciate any help. Thanks

 

19 REPLIES 19
PaigeMiller
Diamond | Level 26

Could you please explain this part?

 

"... and their corresponding var 's in visit 1 which also occurs in visit 2"

--
Paige Miller
Anita_n
Pyrite | Level 9

I mean for example the patient with the  id=1 has at visit 1 var=5 and at visit 2 var=10. I only wish to search for patients whose patid occurs in visit1 then search for exactly these patids in visit2. That means if a patient has visit 1 but does not occur in visit2, this is not considered. At the end, wish to do the above statistics on the column 'var'

data mydat;
input patid  visite  var;
datalines;
1 1 5
1 2 10
2 1 2
2 2 4
3 1 8
3 2 16
5 1 14
5 2 11
;
run;

I hope I could explain better

 

Reeza
Super User
Sounds like you want patients who have both a visit 1 and 2 and then want to summarize the variable VAR? Visits over two are disregarded?



ballardw
Super User

Which do you want, the filtered data set or the mean, std and such? The later doesn't require much work:

proc means data=test mean std min max n;
   where visit in (1 2);
   class patid;
   var var;
run;

The where statement in the proc means restricts the data to only visit numbers 1 and 2 and doesn't require any special coding to handle the cases where there are visit number 1 without a 2 , or possibly a visit number 2 without a visit 1 for each patient id.

Procedures Report and Tabulate would also do this.

If you need a data set with the summaries, say so.

 

 

Anita_n
Pyrite | Level 9

@ballardw thanks for that. I will try that now to see if it works

Anita_n
Pyrite | Level 9

@ballardw : I tried that your code but that is not exactly  what I want. If I use your code as below I get N=8

proc means data=test mean std min max n;
   where visit in (1 2) ;
     var var;
run;

if I use the following code to select the number of patids in visit1, I get N=4 

proc sql;
create table test2 as select patid as patid, var , visit from test
where visit =1;
quit;

That means my N can be less than 4 but not greater 4. From these 4. I now need to search if these 4 patid's also occur in visit 2. If yes then they will be included in my statistics. I hope you can understand what I mean

 

 

 

ballardw
Super User

@Anita_n wrote:

@ballardw : I tried that your code but that is not exactly  what I want. If I use your code as below I get N=8

proc means data=test mean std min max n;
   where visit in (1 2) ;
     var var;
run;

if I use the following code to select the number of patids in visit1, I get N=4 

proc sql;
create table test2 as select patid as patid, var , visit from test
where visit =1;
quit;

That means my N can be less than 4 but not greater 4. From these 4. I now need to search if these 4 patid's also occur in visit 2. If yes then they will be included in my statistics. I hope you can understand what I mean

 

 

 


Which means your example data is insufficient for your problem description. The N i got for each Id with your example data was 2 for each Patientid. AND your description appears very odd which is why @PaigeMiller asked for clarification of that phrase. Which means describe by selecting observations and showing the desired calculation.

The where clause supplied exactly matches your requested example Mydat data set:

data selected;
   set test;
   where visit in (1 2);
run;
proc sort data=selected;
  by patid visit;
run;

which results in:

patid    visit    var

  1        1        5
  1        2       10
  2        1        2
  2        2        4
  3        1        8
  3        2       16
  5        1       14
  5        2       11
Reeza
Super User
data test;
input patid  visit  var;
datalines;
1 1 5
2 1 2
5 3 9
1 2 10
3 1 8
7 4 6
1 3 15
2 2 4
5 1 14
5 2 11
3 2 16
;
run;

proc sort data=test;
by patid visit;
run;

proc sql;
create table valid_records as
select * 
from test
/*filters only visit=1 or 2 into data*/
where visit in (1, 2)
group by patid
/*ensures record has both a 1 & 2, ie 2 distinct values*/
having count(distinct visit)=2;
quit;

*summarizes across all values;
proc means data=valid_records n mean median std min max;
var var;
ods output summary = summary_visits;
run;

proc print data=summary_visits;
run;

*summarizes by patid values;
proc means data=valid_records NWAY n mean median std min max;
class patid;
var var;
ods output summary = summary_visits_ID;
run;

proc print data=summary_visits_ID;
run;

*Summarize by visit;
proc means data=valid_records NWAY n mean median std min max;
class visit;
var var;
ods output summary = summary_visits_by_visit;
run;


*transpose to a wide format for example - may make more sense to you in this format;
proc transpose data=test out=wide (where = (not missing(visit_2))) prefix=visit_;
where visit in (1, 2);
by patid;
id visit;
var var;
run;

proc means data=wide NWAY n mean median std min max;
var visit_:;
ods output summary = summary_visits2;
run;

proc print data=summary_visits2;
run;

HTH


@Anita_n wrote:

Dear all,

I have a data set that looks like this:

data test;
input patid  visit  var;
datalines;
1 1 5
2 1 2
5 3 9
1 2 10
3 1 8
7 4 6
1 3 15
2 2 4
5 1 14
5 2 11
3 2 16
;
run;

I wish to search for the patid's and their corresponding var 's in visit 1 which also occurs in visit 2 and find their mean, standard deviation, min, max, and N

 

In this case it will be the following patid's

data mydat;
input patid  visite  var;
datalines;
1 1 5
1 2 10
2 1 2
2 2 4
3 1 8
3 2 16
5 1 14
5 2 11
;
run;

I will appreciate any help. Thanks

 


 

Anita_n
Pyrite | Level 9

@Reeza @ballardw @PaigeMiller : It should something Iike this, first of all search for the patids that occurs in visit1 then something like this statement:

 

if (patids in visit1 occurs in visit2) then include this in my dataset

PaigeMiller
Diamond | Level 26

@Anita_n wrote:

@Reeza @ballardw @PaigeMiller : It should something Iike this, first of all search for the patids that occurs in visit1 then something like this statement:

 

if (patids in visit1 occurs in visit2) then include this in my dataset


Honestly, this doesn't help me at all understand, as "visit1 occurs in visit2" means nothing to me. I think your earlier explanation was more helpful "That means if a patient has visit 1 but does not occur in visit2, this is not considered." But you still have not answered the questions from @Reeza:

 

Sounds like you want patients who have both a visit 1 and 2 and then want to summarize the variable VAR? Visits over two are disregarded?

--
Paige Miller
Anita_n
Pyrite | Level 9

@PaigeMiller : Yes this statement from @Reeza :Sounds like you want patients who have both a visit 1 and 2 and then want to summarize the variable VAR? Visits over two are disregarded?

is right

Reeza
Super User
Have you had a chance to verify my solution?
Anita_n
Pyrite | Level 9

@Reeza Yes I did, I found it okay till where you created the valid_records but then you summarized across all values which is not what I want, then you summarized by patid, I did not understand why.

 

Your tranposed dataset is close to what I want. But I would then wish to do a summarized statistics on these two cloumns (visit_1 and visit_2). Is there any way to do a summarized statistics on both columns? 

Reeza
Super User
Please check the edited posts - note that my proc means was originally wrong, I had it pointed at your original data set not the filtered data. Try these ones and one should be the correct answer. Mostly because I don't think there's any possible way to slice the data 🙂

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 19 replies
  • 2990 views
  • 2 likes
  • 5 in conversation