I have deleted a couple of observations from my dataset. When I run the code it works fine and deletes the cases. However, when I close out of SAS and open it up the next day to continue working, I run my libname statement and go to where I left off in my code to begin working. When I run new code, it adds the deleted cases back in. It is not saving that I deleted them early on in my code. Do I have to rerun all my code everytime I log in? Shouldn't I just be able to rerun the libname and pick up where I left off? Here is the code I used to delete the cases:
DATA hemo.PLAY;
SET hemo.hemo_mrg2;
IF _CaseNumber="10.02849" THEN DO;
DOB='29SEP2009'd;
_Age = DateofDeath - DOB;
END;
IF _CaseNumber="10.03520" THEN DO;
DOB='02AUG2010'd;
_Age = DateofDeath - DOB;
END;
IF UniqueKey in (23,24) THEN delete;
IF _CaseNumber=11.04184 THEN delete;
IF _CaseNumber=11.02221 THEN delete;
IF _CaseNumber=11.04744 THEN delete;
IF _CaseNumber=10.04942 THEN delete;
RUN;
The cases are deleted, or probably better phrased as not ever written to, the OUTPUT data set hemo.play. Since you have not deleted them from hemo.hemo_mrg2 they are there the next time the code runs.
What would you suggest? Would I need to do this?
DATA hemo.hemo_mrg2;
IF _CaseNumber="10.02849" THEN DO;
DOB='29SEP2009'd;
_Age = DateofDeath - DOB;
END;
IF _CaseNumber="10.03520" THEN DO;
DOB='02AUG2010'd;
_Age = DateofDeath - DOB;
END;
IF UniqueKey in (23,24) THEN delete;
IF _CaseNumber=11.04184 THEN delete;
IF _CaseNumber=11.02221 THEN delete;
IF _CaseNumber=11.04744 THEN delete;
IF _CaseNumber=10.04942 THEN delete;
RUN;
Data hemo.play;
set hemo.hemo_mrg2;
run;
Your original code appears correct, assuming it ran correctly.
Your hemo.play data set will exist with the deleted records removed. Your hemo.mrg2 datasets will still have the records, as it was not modified.
Explain what isn't happening that you expect to happen more clearly perhaps.
So my original dataset has 78 observations in it. After running my code above and deleting 4 cases, my dataset now has 74 cases in it. However if I run:
proc print data=hemo.play;
var variablename;
run;
Then my data goes back to having 78 observations, so somehow the 4 observations that I deleted above are not being saved in hemo.play.
Can you post the full log, showing those results?
LIBNAME hemo "C:\Users\gajeski\Desktop\ALTE Study\Hemosiderin\CEHSCC pilot Grant 2012-Hemosiderin\Hemo SAS Data";
NOTE: Libref HEMO was successfully assigned as follows:
Engine: V9
Physical Name: C:\Users\gajeski\Desktop\ALTE Study\Hemosiderin\CEHSCC pilot Grant 2012Hemosiderin\Hemo SAS Data
2 DATA hemo.PLAY;
3 SET hemo.hemo_mrg2;
4 IF _CaseNumber="10.02849" THEN DO;
5 DOB='29SEP2009'd;
6 _Age = DateofDeath - DOB;
7 END;
8 IF _CaseNumber="10.03520" THEN DO;
9 DOB='02AUG2010'd;
10 _Age = DateofDeath - DOB;
11 END;
12 IF UniqueKey in (23,24) THEN delete;
13 IF _CaseNumber=11.04184 THEN delete;
14 IF _CaseNumber=11.02221 THEN delete;
15 IF _CaseNumber=11.04744 THEN delete;
16 IF _CaseNumber=10.04942 THEN delete;
17 RUN;
NOTE: Character values have been converted to numeric values at the places given by:
(Line):(Column).
13:8 14:8 15:8 16:8
NOTE: There were 78 observations read from the data set HEMO.HEMO_MRG2.
NOTE: The data set HEMO.PLAY has 74 observations and 477 variables.
NOTE: DATA statement used (Total process time):
real time 0.04 seconds
cpu time 0.04 seconds
18 proc print data=hemo.play;
NOTE: Writing HTML Body file: sashtml.htm
19 format LastPlacedTimeTest time.;
WARNING: Variable LASTPLACEDTIMETEST not found in data set HEMO.PLAY.
20 run;
NOTE: There were 74 observations read from the data set HEMO.PLAY.
NOTE: PROCEDURE PRINT used (Total process time):
real time 2.36 seconds
cpu time 2.27 seconds
21 DATA hemo.play;
22 SET hemo.hemo_mrg2;
23 IF _CaseNumber= 11.02088 THEN Presenceofpetechiae="Not Reported";
24 Run;
NOTE: Character values have been converted to numeric values at the places given by:
(Line):(Column).
23:8
NOTE: There were 78 observations read from the data set HEMO.HEMO_MRG2.
NOTE: The data set HEMO.PLAY has 78 observations and 477 variables.
NOTE: DATA statement used (Total process time):
real time 0.02 seconds
cpu time 0.01 seconds
The proc print shows that you have 74 in hemo.play. You then replace hemo.play with the original data set (hemo_mrg2) which still has 78 observations so your next version of hemo.play has 78 observations.
The output is correct, there's a flaw in your expectations. If you want to modify hemo_mrg permanently overwrite the output, but then you lose the table which may or may not be okay.
ie last proc should be:
data hemo.play2;
set hemo.play;
blah blah;
run;
I made the changes as you suggested and I seem to still be having trouble. I have included my log file. It seems to run fine initially but then when I try to do a second if then statement it overrides the previous one. So while my number of observations is at 74, like I want, it no longer recognizes the new variable I created.
LIBNAME hemo "C:\Users\gajeski\Desktop\ALTE Study\Hemosiderin\CEHSCC pilot Grant 2012
! -Hemosiderin\Hemo SAS Data";
NOTE: Libref HEMO was successfully assigned as follows:
Engine: V9
Physical Name: C:\Users\gajeski\Desktop\ALTE Study\Hemosiderin\CEHSCC pilot Grant 2012
-Hemosiderin\Hemo SAS Data
DATA hemo.PLAY;
SET hemo.hemo_mrg2;
IF _CaseNumber="10.02849" THEN DO;
DOB='29SEP2009'd;
_Age = DateofDeath - DOB;
END;
IF _CaseNumber="10.03520" THEN DO;
DOB='02AUG2010'd;
_Age = DateofDeath - DOB;
END;
IF UniqueKey in (23,24) THEN delete;
IF _CaseNumber=11.04184 THEN delete;
IF _CaseNumber=11.02221 THEN delete;
IF _CaseNumber=11.04744 THEN delete;
IF _CaseNumber=10.04942 THEN delete;
RUN;
NOTE: Character values have been converted to numeric values at the places given by:
(Line):(Column).
13:8 14:8 15:8 16:8
NOTE: There were 78 observations read from the data set HEMO.HEMO_MRG2.
NOTE: The data set HEMO.PLAY has 74 observations and 477 variables.
NOTE: DATA statement used (Total process time):
real time 0.03 seconds
cpu time 0.03 seconds
Data hemo.play2;
Set hemo.play;
hemoscore=.;
IF (_CaseNumber=11.02088) OR (_CaseNumber=11.04817) OR (_CaseNumber=11.03348) OR
! (_CaseNumber=12.04653) Then hemoscore=3;
Else IF (_CaseNumber=10.01867) OR (_CaseNumber=10.02274) OR (_CaseNumber=11.01668) OR
! (_CaseNumber=11.01889) OR (_CaseNumber=11.03437) OR (_CaseNumber=12.00581) THEN hemoscore=2;
Else hemoscore=1;
RUN;
NOTE: Character values have been converted to numeric values at the places given by:
(Line):(Column).
21:5 21:31 21:57 21:83 22:10 22:36 22:62 22:88 22:114 22:140
NOTE: There were 74 observations read from the data set HEMO.PLAY.
NOTE: The data set HEMO.PLAY2 has 74 observations and 478 variables.
NOTE: DATA statement used (Total process time):
real time 0.02 seconds
cpu time 0.01 seconds
Data hemo.play2;
set hemo.play;
IF _CaseNumber= 11.02088 THEN Presenceofpetechiae="Not Reported";
RUN;
NOTE: Character values have been converted to numeric values at the places given by:
(Line):(Column).
54:4
NOTE: There were 74 observations read from the data set HEMO.PLAY.
NOTE: The data set HEMO.PLAY2 has 74 observations and 477 variables.
NOTE: DATA statement used (Total process time):
real time 0.02 seconds
cpu time 0.03 seconds
PROC FREQ data=hemo.play2;
tables hemoscore * Presenceofpetechiae;
ERROR: Variable HEMOSCORE not found.
RUN;
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE FREQ used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
Why do you keep overwriting the same datasets?
1) DATA hemo.PLAY;
2) Data hemo.play2;
3) Data hemo.play2;
The computer is just doing what you tell it to do.
So I should not be using the set statement each time?
It does recognize your new variable, it doesn't find a different variable hemoscore.
Yes, you need to use a set statement each time.
In a data step:
A set statement points to input data, the DATA statements points to the output dataset. They do need to line up though.
But I made hemoscore in the step above and it was recognized (see here)
Data hemo.play2;
Set hemo.play;
hemoscore=.;
IF (_CaseNumber=11.02088) OR (_CaseNumber=11.04817) OR (_CaseNumber=11.03348) OR
! (_CaseNumber=12.04653) Then hemoscore=3;
Else IF (_CaseNumber=10.01867) OR (_CaseNumber=10.02274) OR (_CaseNumber=11.01668) OR
! (_CaseNumber=11.01889) OR (_CaseNumber=11.03437) OR (_CaseNumber=12.00581) THEN hemoscore=2;
Else hemoscore=1;
RUN;
NOTE: Character values have been converted to numeric values at the places given by:
(Line):(Column).
21:5 21:31 21:57 21:83 22:10 22:36 22:62 22:88 22:114 22:140
NOTE: There were 74 observations read from the data set HEMO.PLAY.
NOTE: The data set HEMO.PLAY2 has 74 observations and 478 variables.
NOTE: DATA statement used (Total process time):
real time 0.02 seconds
cpu time 0.01 seconds
Its then when I run this next that it no longer recognizes that I made hemoscore
Data hemo.play2;
set hemo.play;
IF _CaseNumber= 11.02088 THEN Presenceofpetechiae="Not Reported";
RUN;
NOTE: Character values have been converted to numeric values at the places given by:
(Line):(Column).
54:4
NOTE: There were 74 observations read from the data set HEMO.PLAY.
NOTE: The data set HEMO.PLAY2 has 74 observations and 477 variables.
NOTE: DATA statement used (Total process time):
real time 0.02 seconds
cpu time 0.03 seconds
PROC FREQ data=hemo.play2;
tables hemoscore * Presenceofpetechiae;
ERROR: Variable HEMOSCORE not found.
RUN;
Draw some diagrams of your input/output data sets and I think you'll begin to see the issues.
Ok, Hopefully I am understanding what I need to do and not overwriting anymore datasets. I ran the code below and was able to get the results I was looking for with no errors. Thank you all for your help!
Data temp;
Set hemo.hemo_mrg2;
hemoscore=1;
if _CaseNumber=10.01867 or _CaseNumber=10.02274 or _CaseNumber=11.01668 or _CaseNumber=11.01889 or _CaseNumber=11.03437 or _CaseNumber=12.00581 then hemoscore=2;
if _CaseNumber=11.02088 or _CaseNumber=11.04817 or _CaseNumber=11.03348 or _CaseNumber=12.04653 then hemoscore=3;
if _CaseNumber=11.04184 or _CaseNumber=11.02221 or _CaseNumber=11.04744 or _CaseNumber=10.04942 then delete;
if _CaseNumber= 11.02088 then Presenceofpetechiae="Not Reported";
run;
Data hemo.play2;
set temp;
run;
proc print data=hemo.play2;
var Presenceofpetechiae;
run;
PROC FREQ data=hemo.play2;
tables hemoscore * Presenceofpetechiae;
RUN;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.