Dear all,
I have a data set that looks like this:
data test;
input patid visit var;
datalines;
1 1 5
2 1 2
5 3 9
1 2 10
3 1 8
7 4 6
1 3 15
2 2 4
5 1 14
5 2 11
3 2 16
;
run;
I wish to search for the patid's and their corresponding var 's in visit 1 which also occurs in visit 2 and find their mean, standard deviation, min, max, and N
In this case it will be the following patid's
data mydat;
input patid visite var;
datalines;
1 1 5
1 2 10
2 1 2
2 2 4
3 1 8
3 2 16
5 1 14
5 2 11
;
run;
I will appreciate any help. Thanks
Could you please explain this part?
"... and their corresponding var 's in visit 1 which also occurs in visit 2"
I mean for example the patient with the id=1 has at visit 1 var=5 and at visit 2 var=10. I only wish to search for patients whose patid occurs in visit1 then search for exactly these patids in visit2. That means if a patient has visit 1 but does not occur in visit2, this is not considered. At the end, wish to do the above statistics on the column 'var'
data mydat;
input patid visite var;
datalines;
1 1 5
1 2 10
2 1 2
2 2 4
3 1 8
3 2 16
5 1 14
5 2 11
;
run;
I hope I could explain better
Which do you want, the filtered data set or the mean, std and such? The later doesn't require much work:
proc means data=test mean std min max n; where visit in (1 2); class patid; var var; run;
The where statement in the proc means restricts the data to only visit numbers 1 and 2 and doesn't require any special coding to handle the cases where there are visit number 1 without a 2 , or possibly a visit number 2 without a visit 1 for each patient id.
Procedures Report and Tabulate would also do this.
If you need a data set with the summaries, say so.
@ballardw thanks for that. I will try that now to see if it works
@ballardw : I tried that your code but that is not exactly what I want. If I use your code as below I get N=8
proc means data=test mean std min max n;
where visit in (1 2) ;
var var;
run;
if I use the following code to select the number of patids in visit1, I get N=4
proc sql;
create table test2 as select patid as patid, var , visit from test
where visit =1;
quit;
That means my N can be less than 4 but not greater 4. From these 4. I now need to search if these 4 patid's also occur in visit 2. If yes then they will be included in my statistics. I hope you can understand what I mean
@Anita_n wrote:
@ballardw : I tried that your code but that is not exactly what I want. If I use your code as below I get N=8
proc means data=test mean std min max n; where visit in (1 2) ; var var; run;if I use the following code to select the number of patids in visit1, I get N=4
proc sql; create table test2 as select patid as patid, var , visit from test where visit =1; quit;That means my N can be less than 4 but not greater 4. From these 4. I now need to search if these 4 patid's also occur in visit 2. If yes then they will be included in my statistics. I hope you can understand what I mean
Which means your example data is insufficient for your problem description. The N i got for each Id with your example data was 2 for each Patientid. AND your description appears very odd which is why @PaigeMiller asked for clarification of that phrase. Which means describe by selecting observations and showing the desired calculation.
The where clause supplied exactly matches your requested example Mydat data set:
data selected; set test; where visit in (1 2); run; proc sort data=selected; by patid visit; run;
which results in:
patid visit var 1 1 5 1 2 10 2 1 2 2 2 4 3 1 8 3 2 16 5 1 14 5 2 11
data test;
input patid visit var;
datalines;
1 1 5
2 1 2
5 3 9
1 2 10
3 1 8
7 4 6
1 3 15
2 2 4
5 1 14
5 2 11
3 2 16
;
run;
proc sort data=test;
by patid visit;
run;
proc sql;
create table valid_records as
select *
from test
/*filters only visit=1 or 2 into data*/
where visit in (1, 2)
group by patid
/*ensures record has both a 1 & 2, ie 2 distinct values*/
having count(distinct visit)=2;
quit;
*summarizes across all values;
proc means data=valid_records n mean median std min max;
var var;
ods output summary = summary_visits;
run;
proc print data=summary_visits;
run;
*summarizes by patid values;
proc means data=valid_records NWAY n mean median std min max;
class patid;
var var;
ods output summary = summary_visits_ID;
run;
proc print data=summary_visits_ID;
run;
*Summarize by visit;
proc means data=valid_records NWAY n mean median std min max;
class visit;
var var;
ods output summary = summary_visits_by_visit;
run;
*transpose to a wide format for example - may make more sense to you in this format;
proc transpose data=test out=wide (where = (not missing(visit_2))) prefix=visit_;
where visit in (1, 2);
by patid;
id visit;
var var;
run;
proc means data=wide NWAY n mean median std min max;
var visit_:;
ods output summary = summary_visits2;
run;
proc print data=summary_visits2;
run;
HTH
@Anita_n wrote:
Dear all,
I have a data set that looks like this:
data test; input patid visit var; datalines; 1 1 5 2 1 2 5 3 9 1 2 10 3 1 8 7 4 6 1 3 15 2 2 4 5 1 14 5 2 11 3 2 16 ; run;I wish to search for the patid's and their corresponding var 's in visit 1 which also occurs in visit 2 and find their mean, standard deviation, min, max, and N
In this case it will be the following patid's
data mydat; input patid visite var; datalines; 1 1 5 1 2 10 2 1 2 2 2 4 3 1 8 3 2 16 5 1 14 5 2 11 ; run;I will appreciate any help. Thanks
@Reeza @ballardw @PaigeMiller : It should something Iike this, first of all search for the patids that occurs in visit1 then something like this statement:
if (patids in visit1 occurs in visit2) then include this in my dataset
@Anita_n wrote:
@Reeza @ballardw @PaigeMiller : It should something Iike this, first of all search for the patids that occurs in visit1 then something like this statement:
if (patids in visit1 occurs in visit2) then include this in my dataset
Honestly, this doesn't help me at all understand, as "visit1 occurs in visit2" means nothing to me. I think your earlier explanation was more helpful "That means if a patient has visit 1 but does not occur in visit2, this is not considered." But you still have not answered the questions from @Reeza:
Sounds like you want patients who have both a visit 1 and 2 and then want to summarize the variable VAR? Visits over two are disregarded?
@PaigeMiller : Yes this statement from @Reeza :Sounds like you want patients who have both a visit 1 and 2 and then want to summarize the variable VAR? Visits over two are disregarded?
is right
@Reeza Yes I did, I found it okay till where you created the valid_records but then you summarized across all values which is not what I want, then you summarized by patid, I did not understand why.
Your tranposed dataset is close to what I want. But I would then wish to do a summarized statistics on these two cloumns (visit_1 and visit_2). Is there any way to do a summarized statistics on both columns?
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.