Hi i ran this datastep and it keeps stating "log window full" and cannot go further. I tried manually clearing the log window but it still gets stuck on this step. Any thoughts on how to set up the option so it wont happen again? thanks
data lab_bnp_ntbnp lab_trop; set lab;
monthyear_test=seq_date_time;
put monthyear_test=monyy7;
testname=lowcase(group_id);
if index(testname,'bnp')>0 then output lab_bnp_ntbnp;
if index(testname,'trop')>0 then output lab_trop;
run;
Usually this means you have flooded the log with messages.
If you really want to collect all the messages, you can use PROC PRINTTO to send the log to a file, instead of the log window.
proc printto log="C:\mylog.log" new;
run;
*code here;
proc printto log=log;
run;
Then you can open the log with a text editor.
Is there a reason you're using the PUT statement to print messages to the log? If your dataset has 1,000 records, this will print 1,000 lines to your log.
If you want to see a few rows of data you can do stuff like:
if _n_ <=10 then put monthyear_test= monyy7.;
Note I added the missing dot after monyy7. The dot is needed to tell SAS that monyy7 is the name of a format, not the name of a variable.
Usually this means you have flooded the log with messages.
If you really want to collect all the messages, you can use PROC PRINTTO to send the log to a file, instead of the log window.
proc printto log="C:\mylog.log" new;
run;
*code here;
proc printto log=log;
run;
Then you can open the log with a text editor.
Is there a reason you're using the PUT statement to print messages to the log? If your dataset has 1,000 records, this will print 1,000 lines to your log.
If you want to see a few rows of data you can do stuff like:
if _n_ <=10 then put monthyear_test= monyy7.;
Note I added the missing dot after monyy7. The dot is needed to tell SAS that monyy7 is the name of a format, not the name of a variable.
You have PUT statement that will write to the LOG for every observation. Why?
You can limit the number of observations processed with the OBS= option and actually see what is in the log for just a few:
data lab_bnp_ntbnp lab_trop; set lab (obs=50); monthyear_test=seq_date_time; put monthyear_test=monyy7; testname=lowcase(group_id); if index(testname,'bnp')>0 then output lab_bnp_ntbnp; if index(testname,'trop')>0 then output lab_trop; run;
I suspect that you may see a bunch of "monthyear_test = *******" because if seq_date_time is a datetime value, which contains number of seconds, that trying to display the value with a date format, which is number of days, exceeds the range of values that the date format monyy7 will display.
If that is the case and you want "monthyear_test" to have a date value you need to use
monthyear_test= datepart(seq_date_time);
OR use a format intended to display the date portion of datetime value
put monthyear_test= dtmon7. ;
But either one of them will continue to fill up the LOG if the data set Lab has many observations because you have told SAS to write a line for every observation.
I have the dataset with these variables
patient_id pt_name lab_date, lab_type result normal_range visit_description HD date_HD
01234 Smith, J 2023-10-23 gfr 30 60-100 chest pain 1 0 .
01234 Smith, J 2024-02-23 gfr 20 60-100 dyspnea 0 .
01234 Smith, J 2024-10-21 gfr 10 60-100 asymptomatic 1 2023_10_30
01234 Smith, J 2023-07-23 trop 0.06 <0.010 chest pain 5 0 .
01234 Smith, J 2024-01-20 trop 0.08 <0.010 dyspnea 0 .
01226 wong, J 2023-04-23 gfr 15 <0.010 chest pain 4 1 2023_04_30
01226 wong, J 2024-2-20 trop 0.08 <0.010 dyspnea 0 .
the dataset has several millions lines from several thousand pts
would like to organize the data so that each line represents the range of a lab type in a given month with range and mean values such as this;
patient_id patient_name lab_month lab_type low high mean max date_max normal_range visit_des HD date date_hd
01234 Smith, J Oct2023 gfr 10 30 20 30 2023_10-23 60-100 chest pain 1, asymptomatic 1 2023-10-30
I have tried proc SQL but could not achieve all the results
many thansk for your help!
You "example" doesn't show any repeats of values for "lab month" so I am not quite sure what you mean. Your example output seems to imply that the high low and mean are calculated regardless of "month".
Here is a way to get the the min, max and mean plus an identification date for the high value.
Please note that I made a LOT of guesses about the content of your data to make a working data step example. The data step would is only needed to have some data that works with the code.
data work.have; infile datalines dlm='|' dsd; input patient_id :$6. pt_name :$10. lab_date :yymmdd10. lab_type :$5. result normal_range :$15. visit_description :$15. HD date_HD :yymmdd10.; format lab_date date_HD yymmdd10.; datalines; 01234|Smith, J|2023-10-23|gfr|30|60-100|chest pain 1|0|. 01234|Smith, J|2024-02-23|gfr|20|60-100|dyspnea|0|. 01234|Smith, J|2024-10-21|gfr|10|60-100|asymptomatic|1|2023_10_30 01234|Smith, J|2023-07-23|trop|0.06|<0.010|chest pain 5|0|. 01234|Smith, J|2024-01-20|trop|0.08|<0.010|dyspnea|0|. 01226|wong, J|2023-04-23|gfr|15|<0.010|chest pain 4|1|2023_04_30 01226|wong, J|2024-2-20|trop|0.08|<0.010|dyspnea|0|. ; proc sort data=work.have; by patient_id Pt_name lab_type lab_date; run; proc summary data=work.have nway; by patient_id Pt_name lab_type ; var result; output out=work.examplesummary (drop=_type_) min = mean= max= idgroup (max(result) out[1] (lab_date) = date_max) /autoname autolabel ; run;
The Sort is so we can process like lab_type values together per patient as that appears to be what you want.
Proc summary is used to get the min/mean/max (and other statistics if desired like STD, RANGE, percentiles) and uses the IDGROUP option to get the date for the max resulting score.
The Autoname means the output variables are named by suffixing the statistic to the variable (in case you have multiple variables that might need summary at the same time then you don't have to be verbose in your code unless you want to). The Autoname applies to any statistic or option output requested that yo u do not supply a name for. Without specific requests per variable all VAR variables would have the MAX MIN and MEAN calculated
The IDGroup will allow getting one or more id variables associated with one or more max or min values of the var variable(s).
I suspect that what you want is this data set and then merge it back to your original data by Patient_id Pt_name and Lab_type.
The only way to stick multiple values into a single variable is a data step and has been answered on this forum multiple times. So perhaps do that as a step by Patient_id Pt_name and Lab_type and combine this output with that result.
Since you didn't provide any rules about what that HD variable is supposed to be I am not sure where to place it. It might belong is the Proc Summary as a var variable asking for the max which would require some modification to the OUTPUT statement.
The Drop removes a variable that indicates which combinations of the BY and Class variables (if any) are in the output.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.