Hello,
So I am reading this comma delimited file and created it into a temporary dataset. Here is my code and the output:
%let clinical=P:\SAS\71342_example\Data\Clinical Trial Case Study;
libname clinical "&clinical";
filename ClinRaw "&clinical";
data visit_site2;
infile ClinRaw("Site 2, Baseline Visit.csv") dsd;
length sf_reason $20 p_units $9 pos $12;
input Subject sf_reason$ screen Sex$ dov$ notif_date$
sbp dbp bp_units$ pulse p_units$ pos$ temp temp_units$
weight weight_units$ pain ;
label sf_reason='Screen Failure Reason'
screen='Screening Flag, 0=Failure, 1=Pass'
dov='Date of Visit'
notif_date='Failure Notification Date'
sbp='Systolic Blood Pressure'
dbp='Diastolic Blood Pressure'
bp_units='Blood Pressure Units'
p_units='Pulse Units'
pos='Position'
temp='Temperature'
temp_units='Temperature Units';
run;
proc contents data=visit_site2;
run;
Sex | Char | 8 | |
Subject | Num | 8 | |
bp_units | Char | 8 | Blood Pressure Units |
dbp | Num | 8 | Diastolic Blood Pressure |
dov | Char | 8 | Date of Visit |
notif_date | Char | 8 | Failure Notification Date |
p_units | Char | 9 | Pulse Units |
pain | Num | 8 | |
pos | Char | 12 | Position |
pulse | Num | 8 | |
sbp | Num | 8 | Systolic Blood Pressure |
screen | Num | 8 | Screening Flag, 0=Failure, 1=Pass |
sf_reason | Char | 20 | Screen Failure Reason |
temp | Num | 8 | Temperature |
temp_units | Char | 8 | Temperature Units |
weight | Num | 8 | |
weight_units | Char | 8 |
The problem is that I need the dataset to look like the file I uploaded. What should I do especially in regards of formatting to DATE10.
Another issue I'm having is that I'll be combining these two files and I need to include two new variables (age at visit) and (number of days between). I have been looking at youtube videos, but I still do not know how to do that. I put the two files I'll be combining down here. This is the code I used to combine them:
%let clinical=P:\SAS\71342_example\Data\Clinical Trial Case Study;
libname clinical "&clinical";
filename ClinRaw "&clinical";
data visit_site2;
infile ClinRaw("Site 2, Baseline Visit.csv") dsd;
length sf_reason $20 p_units $9 pos $12;
input Subject sf_reason$ screen Sex$ dov$ notif_date$
sbp dbp bp_units$ pulse p_units$ pos$ temp temp_units$
weight weight_units$ pain ;
label sf_reason='Screen Failure Reason'
screen='Screening Flag, 0=Failure, 1=Pass'
dov='Date of Visit'
notif_date='Failure Notification Date'
sbp='Systolic Blood Pressure'
dbp='Diastolic Blood Pressure'
bp_units='Blood Pressure Units'
p_units='Pulse Units'
pos='Position'
temp='Temperature'
temp_units='Temperature Units';
run;
ods select position;
proc contents data=visit_site2 varnum;
run;
ods select all;
data random_site2;
infile ClinRaw("randomization_site2") dsd;
length sf_reason $20 p_units $9 pos $12;
input Subject sf_reason$ screen Sex$ dov$ notif_date$
sbp dbp bp_units$ pulse p_units$ pos$ temp temp_units$
weight weight_units$ pain ;
proc contents data=random_site2;
run;
data site2;
set visit_site2
random_site2;
length sf_reason $20 p_units $9 pos $12;
run;
Can someone help refine this code and how to add these new variables (age at demographics visit and number of days between (i.e., excluding the visit days) demographics and randomization visits)?
Assuming DOV is short for Date of Visit then you will need to rename one or the other to be able to have two different values in the same observation.
data site2;
merge clinical.demographics_site2(rename=(dov=demog_dov))
clinical.randomization_site2(rename=(dov=random_dov))
;
by subject;
days = random_dov - demog_dov ;
age_at_randomization = intck ('YEAR', DOB, random_dov, 'C');
run;
Why don't you read the dates as such? Currently, you are reading them as strings, which makes them nearly useless for later.
Use a numeric (date) informat in place of the dollar sign. Then you can also assign a numeric display format to your liking.
Please post a few lines of your csv file.
Open it with a text editor (not with Excel!) and copy/paste the contents into a window opened with the </> button.
1 1_1 M 19/01/2018 105 72 mm-hg 71 beats/min RECLINED 2.5 C 36 KG 1_2 2 Low Baseline Pain 0 F 21/01/2018 21/01/2018 115 89 mm-hg 80 beats/min SEATED 2.5 C 27 KG 0 3 1 M 23/01/2018 102 68 mm-hg 63 beats/min SEATED 2.8 C 38 KG 3 4 1 F 24/01/2018 114 88 mm-hg 85 beats/min SEATED 2.9 C 35 KG 2 5 Low Baseline Pain 0 F 23/01/2018 23/01/2018 101 83 mm-hg 76 beats/min SITTING 2.5 C 29 KG 0 6 1 F 24/01/2018 119 87 mm-hg 84 beats/min RECLINED 3.0 C 29 KG 1 7 Low Baseline Pain 0 M 30/01/2018 30/01/2018 110 90 mm-hg 81 beats/min SEATED 3.1 C 38 KG 0 8 HIGH BLOOD PRESSURE 0 M 31/01/2018 31/01/2018 141 106 mm-hg 102 beats/min SEATED 2.7 C 48 KG * 9 1 M 29/01/2018 91 66 mm-hg 60 beats/min RECLINED 2.8 C 33 KG 4 10 HIGH BLOOD PRESSURE 0 F 30/01/2018 30/01/2018 146 114 mm-hg 114 beats/min SITTING 2.5 C 37 KG * 11 1 F 02/02/2018 126 86 mm-hg 83 beats/min SEATED 2.8 C 38 KG 1 12 HIGH BLOOD PRESSURE 0 F 31/01/2018 31/01/2018 137 102 mm-hg 96 beats/min SITTING 2.4 C 35 KG * 13 1 F 07/02/2018 89 71 mm-hg 71 beats/min SITTING 2.8 C 27 KG 4 15 1 F 08/02/2018 116 89 mm-hg 83 beats/min SEATED 2.5 C 32 KG 3 16 1 M 07/02/2018 108 81 mm-hg 77 beats/min SITTING 2.6 C 43 KG 4 17 Low Baseline Pain 0 F 07/02/2018 07/02/2018 101 67 mm-hg 65 beats/min RECLINED 2.7 C 27 KG 0 18 1 F 09/02/2018 106 81 mm-hg 78 beats/min SITTING 2.8 C 30 KG 2 19 1 M 07/02/2018 125 87 mm-hg 84 beats/min SITTING 2.9 C 40 KG 3 20 1 M 10/02/2018 93 75 mm-hg 71 beats/min SEATED 3.1 C 37 KG 2 22 HIGH BLOOD PRESSURE 0 M 12/02/2018 12/02/2018 123 93 mm-hg 86 beats/min SEATED 2.7 C 45 KG * 23 Low Baseline Pain 0 M 11/02/2018 11/02/2018 96 78 mm-hg 69 beats/min SEATED 2.7 C 37 KG 0 24 Low Baseline Pain 0 M 10/02/2018 10/02/2018 89 70 mm-hg 65 beats/min SITTING 2.9 C 32 KG 0 25 1 M 09/02/2018 99 70 mm-hg 63 beats/min RECLINED 2.4 C 35 KG 1 26 1 M 13/02/2018 95 75 mm-hg 74 beats/min SITTING 2.8 C 38 KG 1 27 HIGH BLOOD PRESSURE 0 F 14/02/2018 14/02/2018 133 106 mm-hg 100 beats/min SITTING 2.7 C 31 KG * 28 HIGH BLOOD PRESSURE 0 M 11/02/2018 11/02/2018 136 96 mm-hg 90 beats/min RECLINED 2.8 C 54 KG * 29 1 F 11/02/2018 112 89 mm-hg 86 beats/min SITTING 2.9 C 36 KG 4 30 1 F 14/02/2018 88 56 mm-hg 50 beats/min SITTING 2.7 C 25 KG 3 31 HIGH BLOOD PRESSURE 0 F 12/02/2018 12/02/2018 153 104 mm-hg 100 beats/min SEATED 3.0 C 39 KG * 32 1 F 14/02/2018 108 76 mm-hg 69 beats/min SITTING 2.9 C 27 KG 3 33 HIGH BLOOD PRESSURE 0 M 14/02/2018 14/02/2018 137 99 mm-hg 90 beats/min SITTING 2.6 C 51 KG * 34 1 M 17/02/2018 107 82 mm-hg 73 beats/min SITTING 2.9 C 40 KG 3 35 HIGH BLOOD PRESSURE 0 F 16/02/2018 16/02/2018 134 98 mm-hg 94 beats/min SITTING 2.8 C 30 KG * 36 1 M 16/02/2018 100 71 mm-hg 67 beats/min SITTING 2.8 C 37 KG 4 37 1 F 16/02/2018 107 81 mm-hg 82 beats/min RECLINED 2.5 C 29 KG 2 39 1 F 16/02/2018 123 81 mm-hg 72 beats/min SITTING 2.6 C 31 KG 1 40 Low Baseline Pain 0 M 20/02/2018 20/02/2018 122 89 mm-hg 89 beats/min SITTING 2.6 C 46 KG 0 41 1 M 22/02/2018 117 85 mm-hg 78 beats/min SEATED 3.1 C 45 KG 2 43 Low Baseline Pain 0 F 18/02/2018 18/02/2018 105 82 mm-hg 79 beats/min RECLINED 2.8 C 29 KG 0 44 1 M 19/02/2018 102 71 mm-hg 69 beats/min SEATED 2.9 C 32 KG 4 45 1 F 22/02/2018 121 88 mm-hg 87 beats/min SEATED 3.0 C 30 KG 1 46 HIGH BLOOD PRESSURE 0 M 22/02/2018 22/02/2018 127 108 mm-hg 108 beats/min SEATED 2.7 C 52 KG * 47 HIGH BLOOD PRESSURE 0 M 22/02/2018 22/02/2018 122 97 mm-hg 95 beats/min SEATED 2.7 C 41 KG * 48 HIGH BLOOD PRESSURE 0 F 24/02/2018 24/02/2018 129 96 mm-hg 92 beats/min SITTING 3.1 C 33 KG * 49 HIGH BLOOD PRESSURE 0 M 26/02/2018 26/02/2018 119 91 mm-hg 86 beats/min SITTING 3.0 C 38 KG * 50 1 F 26/02/2018 105 85 mm-hg 81 beats/min SITTING 2.6 C 26 KG 3 51 Low Baseline Pain 0 F 25/02/2018 25/02/2018 112 84 mm-hg 81 beats/min RECLINED 3.0 C 24 KG 0 52 HIGH BLOOD PRESSURE 0 F 25/02/2018 25/02/2018 134 102 mm-hg 97 beats/min RECLINED 3.0 C 38 KG * 53 1 F 28/02/2018 81 70 mm-hg 70 beats/min SITTING 2.7 C 23 KG 2 54 Low Baseline Pain 0 M 01/03/2018 01/03/2018 113 90 mm-hg 83 beats/min SEATED 2.6 C 41 KG 0 55 1 F 27/02/2018 100 70 mm-hg 63 beats/min RECLINED 2.7 C 27 KG 4 56 1 F 01/03/2018 117 90 mm-hg 85 beats/min SEATED 2.5 C 31 KG 4 57 1 F 28/02/2018 115 81 mm-hg 73 beats/min SEATED 2.8 C 33 KG 3 58 HIGH BLOOD PRESSURE 0 M 28/02/2018 28/02/2018 128 99 mm-hg 96 beats/min SEATED 2.9 C 51 KG * 59 1 M 02/03/2018 123 85 mm-hg 84 beats/min RECLINED 2.7 C 47 KG 4 60 1 M 03/03/2018 108 75 mm-hg 72 beats/min SITTING 2.6 C 34 KG 3 61 Low Baseline Pain 0 F 05/03/2018 05/03/2018 98 75 mm-hg 70 beats/min SITTING 2.6 C 26 KG 0 62 1 F 03/03/2018 105 83 mm-hg 74 beats/min SEATED 2.8 C 28 KG 4 63 HIGH BLOOD PRESSURE 0 M 08/03/2018 08/03/2018 148 118 mm-hg 118 beats/min SITTING 2.5 C 55 KG * 64 HIGH BLOOD PRESSURE 0 M 07/03/2018 07/03/2018 116 92 mm-hg 84 beats/min SITTING 2.4 C 46 KG * 65 HIGH BLOOD PRESSURE 0 M 07/03/2018 07/03/2018 131 106 mm-hg 98 beats/min SITTING 2.7 C 52 KG * 66 HIGH BLOOD PRESSURE 0 F 07/03/2018 07/03/2018 138 99 mm-hg 95 beats/min SEATED 2.8 C 36 KG * 67 1 M 14/03/2018 105 77 mm-hg 72 beats/min SITTING 2.7 C 38 KG 2 69 HIGH BLOOD PRESSURE 0 M 12/03/2018 12/03/2018 151 110 mm-hg 105 beats/min SEATED 2.6 C 50 KG * 70 1 F 15/03/2018 105 89 mm-hg 89 beats/min RECLINED 2.9 C 27 KG 1 71 1 F 15/03/2018 105 73 mm-hg 73 beats/min SITTING 2.5 C 31 KG 3 72 1 M 15/03/2018 97 69 mm-hg 69 beats/min SITTING 2.4 C 36 KG 4 73 1 M 14/03/2018 111 77 mm-hg 76 beats/min RECLINED 2.5 C 37 KG 4 74 HIGH BLOOD PRESSURE 0 F 19/03/2018 19/03/2018 121 94 mm-hg 87 beats/min RECLINED 2.5 C 32 KG * 75 HIGH BLOOD PRESSURE 0 F 15/03/2018 15/03/2018 109 92 mm-hg 92 beats/min SITTING 2.8 C 27 KG * 76 HIGH BLOOD PRESSURE 0 F 17/03/2018 17/03/2018 121 98 mm-hg 94 beats/min SITTING 2.8 C 30 KG * 77 1 M 20/03/2018 107 83 mm-hg 79 beats/min SEATED 2.7 C 33 KG 4 78 1 F 17/03/2018 109 83 mm-hg 75 beats/min SITTING 2.8 C 29 KG 4 79 1 M 17/03/2018 119 84 mm-hg 79 beats/min SITTING 2.6 C 46 KG 4 80 1 F 17/03/2018 109 72 mm-hg 69 beats/min SEATED 2.8 C 33 KG 1 81 HIGH BLOOD PRESSURE 0 F 17/03/2018 17/03/2018 113 94 mm-hg 86 beats/min RECLINED 2.7 C 29 KG * 82 1 M 21/03/2018 102 81 mm-hg 80 beats/min SITTING 2.7 C 33 KG 1 83 1 M 22/03/2018 82 69 mm-hg 66 beats/min SITTING 2.8 C 32 KG 3 84 Low Baseline Pain 0 F 20/03/2018 20/03/2018 104 84 mm-hg 78 beats/min SEATED 3.1 C 28 KG 0 85 1 M 22/03/2018 115 89 mm-hg 83 beats/min SEATED 2.4 C 40 KG 4 86 1 F 27/03/2018 111 75 mm-hg 76 beats/min RECLINED 3.0 C 29 KG 4 87 HIGH BLOOD PRESSURE 0 M 23/03/2018 23/03/2018 120 99 mm-hg 92 beats/min SEATED 2.7 C 46 KG * 88 HIGH BLOOD PRESSURE 0 F 27/03/2018 27/03/2018 123 91 mm-hg 85 beats/min SITTING 2.8 C 32 KG * 90 1 F 28/03/2018 98 72 mm-hg 65 beats/min SEATED 2.8 C 21 KG 3 91 HIGH BLOOD PRESSURE 0 M 26/03/2018 26/03/2018 140 106 mm-hg 99 beats/min SEATED 2.6 C 52 KG * 92 HIGH BLOOD PRESSURE 0 F 27/03/2018 27/03/2018 144 115 mm-hg 110 beats/min SITTING 2.9 C 39 KG * 93 1 F 28/03/2018 113 88 mm-hg 80 beats/min SITTING 2.9 C 32 KG 1 95 HIGH BLOOD PRESSURE 0 F 01/04/2018 01/04/2018 124 98 mm-hg 98 beats/min SEATED 3.1 C 35 KG * 96 HIGH BLOOD PRESSURE 0 M 31/03/2018 31/03/2018 143 111 mm-hg 107 beats/min SITTING 2.9 C 50 KG * 97 Low Baseline Pain 0 F 01/04/2018 01/04/2018 81 66 mm-hg 66 beats/min SITTING 2.9 C 22 KG 0 98 1 F 03/04/2018 93 66 mm-hg 60 beats/min SITTING 2.3 C 28 KG 4 99 Low Baseline Pain 0 F 04/04/2018 04/04/2018 93 67 mm-hg 60 beats/min RECLINED 2.8 C 22 KG 0 100 HIGH BLOOD PRESSURE 0 F 06/04/2018 06/04/2018 124 99 mm-hg 95 beats/min SEATED 2.7 C 34 KG * 101 1 M 02/04/2018 103 69 mm-hg 61 beats/min SEATED 2.8 C 35 KG 2 102 HIGH BLOOD PRESSURE 0 M 07/04/2018 07/04/2018 123 106 mm-hg 106 beats/min RECLINED 2.9 C 51 KG * 103 1 F 04/04/2018 123 83 mm-hg 81 beats/min RECLINED 2.9 C 31 KG 3
I've also uploaded the csv file. For the sas files I uploaded above those are the ones I need to create new variables for and am unsure how to do. For the csv file after I read the file I am attempting to make my output like the image I uploaded, but it's completely different and I don't know why.
What you posted into the text of your message does not match what you attached. Is it a different file? If not why does it appear to have tabs (or spaces) instead of commas between the values? Here are the first 10 lines from the file you attached.
1,,1,M,19/01/2018,,105,72,mm-hg,71,beats/min,RECLINED,2.5,C,36,KG,1 2,Low Baseline Pain,0,F,21/01/2018,21/01/2018,115,89,mm-hg,80,beats/min,SEATED,2.5,C,27,KG,0 3,,1,M,23/01/2018,,102,68,mm-hg,63,beats/min,SEATED,2.8,C,38,KG,3 4,,1,F,24/01/2018,,114,88,mm-hg,85,beats/min,SEATED,2.9,C,35,KG,2 5,Low Baseline Pain,0,F,23/01/2018,23/01/2018,101,83,mm-hg,76,beats/min,SITTING,2.5,C,29,KG,0 6,,1,F,24/01/2018,,119,87,mm-hg,84,beats/min,RECLINED,3,C,29,KG,1 7,Low Baseline Pain,0,M,30/01/2018,30/01/2018,110,90,mm-hg,81,beats/min,SEATED,3.1,C,38,KG,0 8,HIGH BLOOD PRESSURE,0,M,31/01/2018,31/01/2018,141,106,mm-hg,102,beats/min,SEATED,2.7,C,48,KG, 9,,1,M,29/01/2018,,91,66,mm-hg,60,beats/min,RECLINED,2.8,C,33,KG,4 10,HIGH BLOOD PRESSURE,0,F,30/01/2018,30/01/2018,146,114,mm-hg,114,beats/min,SITTING,2.5,C,37,KG,
The file does not have a column header row so make sure the fields are in the order you expect.
A couple of fixes to your data step. Make sure to use TRUNCOVER option on the INFILE to prevent SAS from going to the next line looking for data if one line doesn't have every field. Make sure to use a date informat for date values. Looks like your file has dates in DMY order. Your posted example seemed to want to attach the DATE10 format for displaying dates.The DATE format will print dates with 7,9 or 11 characters, but a width of 10 doesn't make any sense. Either use DATE9 or DATE11 (don't use date7 as then it only use two digits for the year.) If you want to print numeric months don't use DMY or MDY order as those are confusing. You can use YYMMDD10 format to print numeric months in YMD order.
data visit_site2;
infile ClinRaw("Site 2, Baseline Visit.csv") dsd truncover;
length Subject 8 sf_reason $20 screen 8 Sex $1 dov 8 notif_date 8
sbp 8 dbp 8 bp_units $8 pulse 8 p_units $9 pos $12
temp 8 temp_units $8 weight 8 weight_units $8 pain 8
;
informat dov notif_date ddmmyy.;
format dov notif_date yymmdd10.;
input subject -- pain;
run;
AS for trying to define a new variable you need explain the logic you want to use to create the variable. Your code was just concatenating the two datasets together. If your logic for creating the new variable needs to use variables from both datasets then you will want to merge the two together instead. But then watch out that you don't have variable name collisions.
Hello,
Thank you! In the case of creating new variables my goal is to combine the two SAS datasets demographics_site2.sas7bdat and randomization_site2.sas7bdat to create the new dataset named site2.sas7bdat, which includes two new variables (age at demographics visit and number of days between (i.e., excluding the visit days) demographics and randomization visits).
Now demographics_site2 and randomization_site2 have very different variables in play, but I do need to connect them and create those two new variables. Actually one variable they do have that's the same is Subject. I need to combine and create this variable to create these:
Use the MERGE statement to combine two datasets based on matching values of key variable(s).
So if your key variable is SUBJID and in the DEMOG dataset you have a date variable named DEMOG_DATE and in the RANDOM dataset you have a date variable named RANDOM_DATE then your code looks likes:
data want;
merge demog random;
by subjid ;
days = random_date - demog_date;
run;
Well both the datasets have dov and subject so I've coded it this way. Which dov is important and since I am trying to find between two dates for age I used intck. What do you think?
data site2; merge clinical.demographics_site2 clinical.randomization_site2; by subject; days = dov - dov; age= intck ('YEAR', DOB, DOV, 'C'); run;
I am running into an issue with creating
I actually have the exact numbers! It's mostly an issue with formatting my output and code looks like this (sex can only be used as a by statement it gives me an error for everything else):
proc sort data= site2;
by sex;
run;
proc means data= site2 N Mean std nonobs
maxdec=2;
class race;
var age;
by sex;
run;
The MEANS Procedure sex=F Analysis Variable : age race N Mean Std Dev 1 44 44.05 10.93 2 9 47.44 12.89 3 5 53.80 4.66 4 2 40.00 4.24 5 2 52.50 6.36 6 2 47.00 14.14 7 5 53.20 17.36 sex=M Analysis Variable : age race N Mean Std Dev 1 30 46.57 11.29 2 12 45.25 9.47 3 6 42.83 14.66 4 1 52.00 . 5 1 53.00 . 6 2 43.50 0.71 7 1 36.00 .
So as you can see above my age variable worked it's just a formatting issue where they split F and M into separate means procedures, but my dov has issues, is there something I need to rename because I have dov in both demographics and randomization and my days variable is supposed to signify the days between them, but it's hard to do so because they're both dov which is why my bar graph looks like this.
My code looks like this:
proc univariate data=site2;
var days;
class sex;
histogram days / NORMAL;
ods select Histogram;
run;
When I want it to look like this:
This is why you are getting zero for all days:
days = dov - dov;
Assuming DOV is short for Date of Visit then you will need to rename one or the other to be able to have two different values in the same observation.
data site2;
merge clinical.demographics_site2(rename=(dov=demog_dov))
clinical.randomization_site2(rename=(dov=random_dov))
;
by subject;
days = random_dov - demog_dov ;
age_at_randomization = intck ('YEAR', DOB, random_dov, 'C');
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.