Manipulating Data in Base SAS® Part 3 – Deduplicate
Recent Library Articles
Recently in the SAS Community Library: Duplicates in data can badly skew the results of an analysis. @SASJedi demonstrates data deduplication using PROC SORT with the NODUPKEY, OUT=, and DUPOUT= options and PROC SQL and PROC FedSQL
Hi,
I've been using Proc Glimmix to predict the effect of several predictor variables on my outcome where the distribution is negative binomial. for the purpose of model comparision when I run the following code I receive the AIC tests:
proc glimmix data=modelupdated; class metcat(ref="2") age_new(ref="1") sexnum (ref="2") race(ref="2"); model Rate_12= age_new sexnum race metcat/ dist=nb link=log s; run;
However when I add the random effect of counties, I do not receive the AIC in fit stati table in the output.
proc glimmix data=modelupdated1; class metcat(ref="2") age_new(ref="1") sexnum(ref="2") race(ref="2"); model Rate_12 = age_new sexnum race ave_uninsured metcat / dist=nb link=log s; random intercept /type=un subject=fips_all_digit_num ; run;
any thoughts on this and how I can get AIC becuse I need to compare the two different models vs eacg other.
Thank you,
... View more
Hello, I have a series of 10 Likert scale (1-3) variables that need to be recategorized using the same criteria. The variables are as follows: a1_c_q1 - a1_c_q5, where 'a1' is administrator 1, 'c' is cognitive test, and 'q' represents the questionnaire number (1 though 5). a2_c_q1 - a2_c_q5, where 'a2' is administrator 2, 'c ' is cognitive test, and 'q' represents the questionnaire number (1 through 5). For each of the 10 variables, I am looking to create a new recategorized variable where 1=3 and 3=1. Is there a way to use an array or loop in the code so the same lines do not need to be repeated 10 times? DATA new1;
SET new;
a1_c_q1_new=a1_c_q1;
IF a1_c_q1=3 THEN a1_c_q1_new=1;
ELSE IF a1_c_q1=3 THEN a1_c_q1_new=1;
RUN;
... View more
Hi, I have a http procedure in my script which worked without any problem. And recently we have to redirect this procedure to another server which is used for testing. Afer I changed the ip which is not a certified DNS in the url, I got this error message: ERROR: Secure communications error status 807ff019 description "11.1.1.111(the IP of our test environemnt): SSL Error: Invalid subject name in partner's certificate. Subject name must match machine name." ERROR: SSL Error: Invalid subject name in partner's certificate. Subject name must match machine name. ERROR: Call to tcpSockContinueSSL failed. I did try to modify the system option SSLSNIHOSTNAME by filling a valid url into it, but it doesn't work becuase our server is Windows+SAS9.4M8 not Linux. Can any SAS expert figure out how to resolve this issue? Many thanks! KR., Ye
... View more
I created a yes/no/missing format. It works when I apply it to a variable that I constructed in my syntax, but not to variables that were already present in the data file. Could the existing variables already have a format that is being retained? And if so, how do I override it?
The syntax below will produce a freq table for Any_cancer with 1:Yes, 2: No, Inapplicable/Missing, in that order. This is a constructed variable, and the output is being put into a table. So having it in the correct order is important.
The table for HLT_OCSTROKE shows 2, 1:Yes, Inapplicable/Missing, in that order. This is an existing variable in the data, and there are about 10 more that are doing the same thing.
I would add sample data, but I can't see how it would fail to run properly with new data. The only thing that makes sense is that the formatting is retained from the original data.
LIBNAME temp "C:\SAStemp";
proc format library=temp;
value yesfmt_r
1='1:Yes'
2='2:No'
.='Inapplicable/Missing'
.R='Inapplicable/Missing'
.D='Inapplicable/Missing'
.N='Inapplicable/Missing';
RUN;
DATA temp.want; SET temp.have;
FORMAT
Any_cancer YESFMT_r.
HLT_OCSTROKE YESFMT_r.;
RUN;
PROC FREQ data=temp.want; TABLES Any_cancer HLT_OCSTROKE / missing; RUN;
... View more
Hello SAS community,
I have a dataset with a variable number_images with values 1 or 2 which indicate whether an image was read for a bunch of variable (n=110) at T0 or/and T2
this is what I get from proc freq number_images nmissT0 nmissT2 1 0 110 1 1 110 1 2 110 1 3 110 1 4 110 1 5 110 1 6 110 1 7 110 1 8 110 1 9 110 1 10 110 1 12 110 1 14 110 1 15 110 1 18 110 1 31 110 1 93 110 1 110 0 1 110 1 1 110 2 1 110 6 1 110 10 1 110 12 1 110 13 1 110 14 2 0 0
If number_images=1, the image could have been read at either of the timepoints. I want to keep number_images =1 but be able to distinguish those with readings at T1 or T2 only
if nmissT0=110 then the images was only read at T2 if nmissT2=110 then the images was only read at T0 if nmiss is <110, then the image was read at least for some features at one timepoint- not concerned with that part for this question
I would like to update number_images to have something like this but I know this won't work proc format; value n_imagf 1="1: V0 readings only" 1="1: V2 readings only" 2="2: V0, V2 readings"
What can I do instead so number of images still shows 1 and is numeric? if number_images=1 and nmissT0=110 then number_images=1??
Can anyone help?
Thanks, Maggie
... View more