Dear community,
I did 2 datasteps by creating 2 different new variables. I used the same dataset and put the new variable in the same new dataset. But either I have the first variable in this dataset or the second one. But I need both variables in one dataset.
This never happened to me before. What could be the problem? Does anybody know?
Thank you so much 🙂
data patients1;
set datensatz_1416_Erstanzeige;
format patients $20.;
if numberofpatients = . then patients="0";
if 1<= numberofpatients <= 100 then patients="1-100";
if 101<=numberofpatients<= 500 then patients="101-500";
if 501<=numberofpatients<=1000 then patients="501-1000";
if 1001<=numberofpatients<=10000 then patients="1001-10000";
if numberofpatients >10000 then patients= ">10000";
run;
data patients1;
set datensatz_1416_Erstanzeige;
format authorisation $8.;
if NIS_Nummer in (2628,3078,6791) then authorisation="11Jan2008";
if NIS_Nummer in (2408) then authorisation="28Aug2007";
if NIS_Nummer in (6759) then authorisation="18Sep2014";
if NIS_Nummer in (2311,5361,6887) then authorisation="23Apr2007";
if NIS_Nummer in (6687) then authorisation="03Aug2009";
if NIS_Nummer in (6667) then authorisation="27May2015";
if NIS_Nummer in (6657) then authorisation="08May2014";
if NIS_Nummer in (2075, 6756) then authorisation="26Aug2013";
run;
Seems pretty obvious if you look at it. You are creating database B from dataset A. You then overwrite it with a new version of B created from the original dataset A. If you did want to do it in two data steps then have the second step read from the output of the first one instead of going back to the original data.
Your Subject line is backwards. You cannot have a data step "in" a dataset. A dataset is the output (and inputs) to a data step.
Why are you attaching formats to your character variables? SAS already knows how to print character variables and does not need to have special formatting instructions attached to the variables. Perhaps you meant to use a LENGTH statement to set the variables length before using it later in the data step?
There is no reason to use two data steps. You can calculate both new variables in the same data step.
data patients1;
set datensatz_1416_Erstanzeige;
format patients $20. authorisation $8.;
if numberofpatients = . then patients="0";
if 1<= numberofpatients <= 100 then patients="1-100";
if 101<=numberofpatients<= 500 then patients="101-500";
if 501<=numberofpatients<=1000 then patients="501-1000";
if 1001<=numberofpatients<=10000 then patients="1001-10000";
if numberofpatients >10000 then patients= ">10000";
if NIS_Nummer in (2628,3078,6791) then authorisation="11Jan2008";
if NIS_Nummer in (2408) then authorisation="28Aug2007";
if NIS_Nummer in (6759) then authorisation="18Sep2014";
if NIS_Nummer in (2311,5361,6887) then authorisation="23Apr2007";
if NIS_Nummer in (6687) then authorisation="03Aug2009";
if NIS_Nummer in (6667) then authorisation="27May2015";
if NIS_Nummer in (6657) then authorisation="08May2014";
if NIS_Nummer in (2075, 6756) then authorisation="26Aug2013";
run;
Some advice ... in general this is not a good way to get groups like "1-100" for patients. Everything would work better if you used formats to group the number of patients instead of doing it as you have. In addition, if you ARE going to do it as above, use IF-THEN-ELSE instead of repeated IF-THEN.
Some advice part 2 ... there is usually no reason and no benefit to set dates as character strings as you are doing, such as "26AUG2013". There's no way to use this value in any comparison. Instead, you want to use SAS date values, such as
if NIS_Nummer in (2075, 6756) then authorisation='26Aug2013'D;
The D at the end makes this a SAS date value, and then you can compare it to other dates and do math on it. You might also want to assign a date format to authorisation.
First of all kudos for teaching yourself SAS! 🙂
1. Using formats is a very efficient way to group data. I would start with reading the User Defined Format Basics of the SAS Documentation on formats. Then work my way from there.
2. The numbers are not random, though they could seem so. 17406 is the number of days since the first of january 1960. That is how SAS dates are defined.
In the following code, I define two variables with the exact same value of the date constant 28aug2008. I only format one of them. They appear different in the data set, though their value is exactly the same.
data test;
DateNotFormatted="28aug2008"d;
DateFormatted="28aug2008"d;
format DateFormatted date9.;
run;
SAS stores dates as the number of days since 1/1/1960. Dates are values do need special formatting instructions attached to them so that they display in a way that humans recognize. You could use the DATE9. or any of the many other formats that SAS has to display dates. Make sure to define the variable as numeric instead of character.
Seems pretty obvious if you look at it. You are creating database B from dataset A. You then overwrite it with a new version of B created from the original dataset A. If you did want to do it in two data steps then have the second step read from the output of the first one instead of going back to the original data.
Your Subject line is backwards. You cannot have a data step "in" a dataset. A dataset is the output (and inputs) to a data step.
Why are you attaching formats to your character variables? SAS already knows how to print character variables and does not need to have special formatting instructions attached to the variables. Perhaps you meant to use a LENGTH statement to set the variables length before using it later in the data step?
There is no reason to use two data steps. You can calculate both new variables in the same data step.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.