Dear all,
I have a data set that looks like this:
data test;
input patid visit var;
datalines;
1 1 5
2 1 2
5 3 9
1 2 10
3 1 8
7 4 6
1 3 15
2 2 4
5 1 14
5 2 11
3 2 16
;
run;
I wish to search for the patid's and their corresponding var 's in visit 1 which also occurs in visit 2 and find their mean, standard deviation, min, max, and N
In this case it will be the following patid's
data mydat;
input patid visite var;
datalines;
1 1 5
1 2 10
2 1 2
2 2 4
3 1 8
3 2 16
5 1 14
5 2 11
;
run;
I will appreciate any help. Thanks
Could you please explain this part?
"... and their corresponding var 's in visit 1 which also occurs in visit 2"
I mean for example the patient with the id=1 has at visit 1 var=5 and at visit 2 var=10. I only wish to search for patients whose patid occurs in visit1 then search for exactly these patids in visit2. That means if a patient has visit 1 but does not occur in visit2, this is not considered. At the end, wish to do the above statistics on the column 'var'
data mydat;
input patid visite var;
datalines;
1 1 5
1 2 10
2 1 2
2 2 4
3 1 8
3 2 16
5 1 14
5 2 11
;
run;
I hope I could explain better
Which do you want, the filtered data set or the mean, std and such? The later doesn't require much work:
proc means data=test mean std min max n; where visit in (1 2); class patid; var var; run;
The where statement in the proc means restricts the data to only visit numbers 1 and 2 and doesn't require any special coding to handle the cases where there are visit number 1 without a 2 , or possibly a visit number 2 without a visit 1 for each patient id.
Procedures Report and Tabulate would also do this.
If you need a data set with the summaries, say so.
@ballardw thanks for that. I will try that now to see if it works
@ballardw : I tried that your code but that is not exactly what I want. If I use your code as below I get N=8
proc means data=test mean std min max n;
where visit in (1 2) ;
var var;
run;
if I use the following code to select the number of patids in visit1, I get N=4
proc sql;
create table test2 as select patid as patid, var , visit from test
where visit =1;
quit;
That means my N can be less than 4 but not greater 4. From these 4. I now need to search if these 4 patid's also occur in visit 2. If yes then they will be included in my statistics. I hope you can understand what I mean
@Anita_n wrote:
@ballardw : I tried that your code but that is not exactly what I want. If I use your code as below I get N=8
proc means data=test mean std min max n; where visit in (1 2) ; var var; run;
if I use the following code to select the number of patids in visit1, I get N=4
proc sql; create table test2 as select patid as patid, var , visit from test where visit =1; quit;
That means my N can be less than 4 but not greater 4. From these 4. I now need to search if these 4 patid's also occur in visit 2. If yes then they will be included in my statistics. I hope you can understand what I mean
Which means your example data is insufficient for your problem description. The N i got for each Id with your example data was 2 for each Patientid. AND your description appears very odd which is why @PaigeMiller asked for clarification of that phrase. Which means describe by selecting observations and showing the desired calculation.
The where clause supplied exactly matches your requested example Mydat data set:
data selected; set test; where visit in (1 2); run; proc sort data=selected; by patid visit; run;
which results in:
patid visit var 1 1 5 1 2 10 2 1 2 2 2 4 3 1 8 3 2 16 5 1 14 5 2 11
data test;
input patid visit var;
datalines;
1 1 5
2 1 2
5 3 9
1 2 10
3 1 8
7 4 6
1 3 15
2 2 4
5 1 14
5 2 11
3 2 16
;
run;
proc sort data=test;
by patid visit;
run;
proc sql;
create table valid_records as
select *
from test
/*filters only visit=1 or 2 into data*/
where visit in (1, 2)
group by patid
/*ensures record has both a 1 & 2, ie 2 distinct values*/
having count(distinct visit)=2;
quit;
*summarizes across all values;
proc means data=valid_records n mean median std min max;
var var;
ods output summary = summary_visits;
run;
proc print data=summary_visits;
run;
*summarizes by patid values;
proc means data=valid_records NWAY n mean median std min max;
class patid;
var var;
ods output summary = summary_visits_ID;
run;
proc print data=summary_visits_ID;
run;
*Summarize by visit;
proc means data=valid_records NWAY n mean median std min max;
class visit;
var var;
ods output summary = summary_visits_by_visit;
run;
*transpose to a wide format for example - may make more sense to you in this format;
proc transpose data=test out=wide (where = (not missing(visit_2))) prefix=visit_;
where visit in (1, 2);
by patid;
id visit;
var var;
run;
proc means data=wide NWAY n mean median std min max;
var visit_:;
ods output summary = summary_visits2;
run;
proc print data=summary_visits2;
run;
HTH
@Anita_n wrote:
Dear all,
I have a data set that looks like this:
data test; input patid visit var; datalines; 1 1 5 2 1 2 5 3 9 1 2 10 3 1 8 7 4 6 1 3 15 2 2 4 5 1 14 5 2 11 3 2 16 ; run;
I wish to search for the patid's and their corresponding var 's in visit 1 which also occurs in visit 2 and find their mean, standard deviation, min, max, and N
In this case it will be the following patid's
data mydat; input patid visite var; datalines; 1 1 5 1 2 10 2 1 2 2 2 4 3 1 8 3 2 16 5 1 14 5 2 11 ; run;
I will appreciate any help. Thanks
@Reeza @ballardw @PaigeMiller : It should something Iike this, first of all search for the patids that occurs in visit1 then something like this statement:
if (patids in visit1 occurs in visit2) then include this in my dataset
@Anita_n wrote:
@Reeza @ballardw @PaigeMiller : It should something Iike this, first of all search for the patids that occurs in visit1 then something like this statement:
if (patids in visit1 occurs in visit2) then include this in my dataset
Honestly, this doesn't help me at all understand, as "visit1 occurs in visit2" means nothing to me. I think your earlier explanation was more helpful "That means if a patient has visit 1 but does not occur in visit2, this is not considered." But you still have not answered the questions from @Reeza:
Sounds like you want patients who have both a visit 1 and 2 and then want to summarize the variable VAR? Visits over two are disregarded?
@PaigeMiller : Yes this statement from @Reeza :Sounds like you want patients who have both a visit 1 and 2 and then want to summarize the variable VAR? Visits over two are disregarded?
is right
@Reeza Yes I did, I found it okay till where you created the valid_records but then you summarized across all values which is not what I want, then you summarized by patid, I did not understand why.
Your tranposed dataset is close to what I want. But I would then wish to do a summarized statistics on these two cloumns (visit_1 and visit_2). Is there any way to do a summarized statistics on both columns?
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.