About FreelanceReinh

FreelanceReinh · ‎09-25-2024

Hi @Ryanb2, You can also use the fact that the UNION operator of PROC SQL (without the CORRESPONDING option) aligns columns by position. Example: /* Create sample data for demonstration */ data have1(rename=(name=abc)) have2(rename=(name=def)) have3(rename=(name=ghi)); set sashelp.class; run; /* Stack HAVE1 - HAVE3, renaming the first column to Firstname */ proc sql nowarn; create table base (Firstname char(8)); /* define name and length of the first column */ create table want as select * from base union all select * from have1 union all select * from have2 union all select * from have3; quit; Table BASE may contain more columns to be renamed. PROC SQL would automatically increase the length of character variables to avoid truncation (e.g., if variable DEF in dataset HAVE2 had length 11). If you have an existing template dataset, you can use that instead: proc sql; create table want as select * from sashelp.class(obs=0) union all select * from have1 union all select * from have2 union all select * from have3; quit; Obviously, you don't need an additional template dataset if the columns in question have the desired names in dataset HAVE1 .

FreelanceReinh · ‎09-24-2024

@stataq wrote: @FreelanceReinh Could you further explain _c+(stop & (~lag(stop) | first.id)); First of all, this is a sum statement, i.e., variable _c (the "counter") is incremented by the value in the outer parentheses. That increment is a Boolean value: either 1 or 0, depending on whether the logical expression involving the AND (&), OR (|) and NOT (~) operators is TRUE (1) or FALSE (0). Non-zero, non-missing values of variable stop (in particular the value 1) are evaluated to TRUE. Zero and missing values are evaluated to FALSE. The LAG function in this DATA step is called once for each observation of dataset DS1, which means that it returns the value of stop from the previous observation (and a numeric missing value in the very first observation). The value of automatic variable first.id is 1 for the first observation of each id BY-group and 0 otherwise. So, considering that stop has only values 1 or 0, the increment equals 1 if the current observation has stop=1 AND (the previous observation has stop=0 OR the current observation is the first of the current id) 0 otherwise. This is exactly what we need: A new "block" of consecutive observations with stop=1 of an id must (obviously) start with an observation with stop=1 and the only exception to the requirement "the previous observation had stop=0" (avoiding an incrementation within a block) is that we are at the first observation of the id. In the latter case the previous observation may be the last of a "stop=1 block" of the previous id. Also, for the very first observation of dataset DS1 there is no previous observation, lag(stop)=. (missing, i.e. FALSE), hence ~lag(stop)=1 (TRUE), but this is actually irrelevant because first.id=1 makes the subexpression (~lag(stop) | first.id) TRUE anyway.

FreelanceReinh · ‎09-24-2024

If you try a few different values for that bandwidth, some greater than 1.7, some less than 1.7, you'll get the impression that some are too small (e.g., produce peaks around the integer values 1, 2, 3, ..., as if x were a discrete variable) and some are too large (i.e., deviate considerably from parts of the histogram). That's how I ended up with 1.7. But your knowledge of the data, the subject matter and the scientific literature may suggest a different bandwidth. Or even a different type of density: With a bit more programming effort you can overlay an arbitrary density curve (e.g., exponential, gamma, etc.) on a histogram. See Rick Wicklin's blog post "How to overlay a custom density curve on a histogram in SAS" for details.

FreelanceReinh · ‎09-23-2024

Hello @MHines, You can transpose the list of persons for each row and remove the observations with missing values with a WHERE= option: data have; infile cards truncover; input Row CaseID :$11. (Person1-Person5) ($); cards; 1 2024-000-01 Sally James 2 2024-000-01 Gerry Natalie 3 2024-000-01 Sam 4 2024-000-02 Annie Holly Jaxson Ned Clark 5 2024-000-03 Camille Henry 6 2024-000-04 Alice Chris 7 2024-000-05 Maria George 8 2024-000-05 Ronnie Brenda ; proc transpose data=have out=want(drop=_: row rename=(col1=Person) where=(Person)); by Row CaseID; var Person:; run; If variable Row is not contained in your dataset HAVE or should be contained in dataset WANT (with new values), it can be created easily.

FreelanceReinh · ‎09-23-2024

Hi @mahi263 and welcome to the SAS Support Communities! You need to specify the count variable in the FREQ= option of the DENSITY statement, as it wouldn't be used by default. Example: data have; input x y; cards; 1 830 2 155 3 65 4 45 5 52 6 35 7 20 8 15 9 10 10 5 ; proc sgplot data=have; histogram x / freq=y scale=count binwidth=1; density x / freq=y type=kernel(c=1.7); xaxis values=(1 to 10) offsetmin=0.06 offsetmax=0.06; run;

FreelanceReinh · ‎09-23-2024

Hello @J111, My solution from the previous thread can be generalized to BY groups as well: data want(drop=_:); _s=0; _v=0; do _n=1 by 1 until(last.gr); set available; by gr; _s+y; /* cumulative sum */ _m=_s/_n; /* cumulative mean */ _d=dif(_m); /* mean change */ _q=(y-_m)**2; /* new term in sum of squares */ if _n>1 then do; std=sqrt(_v+_d**2+_q/(_n-1)); /* cumulative standard deviation */ _v=((_n-1)*(_v+_d**2)+_q)/_n; /* cumulative population variance */ end; output; end; run;

FreelanceReinh · ‎09-22-2024

Hello @J111, You can also compute the cumulative standard deviation using Steiner's theorem: data want(drop=_:); set available; retain _v 0; _s+y; /* cumulative sum */ _m=_s/_n_; /* cumulative mean */ _d=dif(_m); /* mean change */ _q=(y-_m)**2; /* new term in sum of squares */ if _n_>1 then do; std=sqrt(_v+_d**2+_q/(_n_-1)); /* cumulative standard deviation */ _v=((_n_-1)*(_v+_d**2)+_q)/_n_; /* cumulative population variance */ end; run; This calculation should take less than one second for a million observations.

FreelanceReinh · ‎09-19-2024

Hi @Discaboota, The error message referring to "column 128" indicates that the array reference in the second IF-THEN statement is the culprit because that is the position of the first "Q" in "Q[WHICHN(31,OF Q[*])]". And this is indeed the odd one among the three similar statements: The IF condition ((61 IN Q AND 32 IN Q) OR (31 IN Q)) AND 1 NOT IN Q AND BILLING_CYCLE_DAY = 15 does not imply 31 in Q. But if 31 is not in Q, the above mentioned call to the WHICHN function returns 0 and the resulting array reference Q[0] causes the error message. So you'll need to add programming logic for the case ((61 IN Q AND 32 IN Q) AND NOT (31 IN Q)) AND 1 NOT IN Q AND BILLING_CYCLE_DAY = 15

FreelanceReinh · ‎09-19-2024

Hello @astudent, @astudent wrote: What I think is the interpretation is that isolated individuals are more likely to be secure compared to those who are to isolated. since the probability being modeled is HHFS2_short=0. Why do you think that? The estimated odds ratio of 3.450 of "SocIsoSS 0 vs. 1" for being secure (HHFS2_short=0) and its lower confidence limit are clearly greater than one, so -- adjusted for the other covariates -- we would expect a significantly higher percentage of "secure" individuals in the subgroup "SocIsoSS=0" ("non-isolated") than in the subgroup "SocIsoSS=1" ("isolated"). This is consistent with your cross tabulation of SocIsoSS and HHFS2_short: The corresponding empirical odds ratio ignoring the other covariates is 151*125/(58*79)=4.119... (again, clearly greater than 1) and the percentages of "secure" individuals are 65.65 (among the "non-isolated") vs. 31.69 (among the "isolated").

FreelanceReinh · ‎09-19-2024

Hello @NP2212, Glad to see that @ballardw's solution worked for you. Then it would be fair and help later readers if you marked his helpful reply as the accepted solution, not your own "thank you" post. Could you please change that? It's very easy: Select his post as the solution after clicking "Not the Solution" in the option menu (see icon below) of the current solution.

FreelanceReinh · ‎09-17-2024

Hi @AndersS, The DROPLINE statement is designed for this purpose.

FreelanceReinh · ‎09-16-2024

It worked with a local directory on my Windows workstation. What does the PUT statement write to the log if you add it like in the modified code below? data filebase; length fref $8 fname created lastmod $200; did = filename(fref,"&pathbase."); did = dopen(fref); do i = 1 to dnum(did); fname = dread(did,i); fid=mopen(did, fname); put fid=; created=finfo(fid,'Create Time'); lastmod=finfo(fid,'Last Modified'); rc=fclose(fid); output; end; did = dclose(did); did = filename(fref); keep fname i created lastmod; run; I've also simplified the assignment statements for the two new variables by using their default (character!) format. Note the deletion of the FORMAT statement.

FreelanceReinh · ‎09-16-2024

Hi @febyinkasid, You can use the MOPEN, FINFO and FCLOSE functions: /*FIND PATH PARENT*/ %let pathbase=G:\Shared drives\; data filebase; length fref $8 fname $200; did = filename(fref,"&pathbase."); did = dopen(fref); do i = 1 to dnum(did); fname = dread(did,i); fid=mopen(did, fname); created=input(finfo(fid,'Create Time'),nldatm200.); lastmod=input(finfo(fid,'Last Modified'),nldatm200.); rc=fclose(fid); output; end; did = dclose(did); did = filename(fref); format created lastmod e8601dt.; keep fname i created lastmod; run;

FreelanceReinh · ‎09-13-2024

Another purpose of leading underscores in variable names (not in your example) is the avoidance of name conflicts. For example, consider a SAS macro which uses a DATA step to work with a user-supplied dataset. The developer of such a macro doesn't know the user's variable names in advance. But assuming that they don't start with an underscore or even two underscores he or she might use variable names like __i so as to make name conflicts unlikely. With the same rationale I sometimes use such names when I suggest code here in this forum. Similarly, some SAS procedures (e.g., PROC SUMMARY and PROC TRANSPOSE) create variable names with leading and trailing underscores. And there are the automatic variables _N_, _ERROR_, _IORC_, etc. (Anecdote: I remember a company's standard reporting macro which surprisingly didn't work well when a user applied it to a dataset containing temperature data in a variable TEMP. Why? The macro developer had carelessly used that same name for a variable to store some intermediate results temporarily.)

FreelanceReinh · ‎09-12-2024

@ballardw wrote: If you are splitting a large set into multiple smaller sets to select one at a time you are adding lots of time and complexity in general ... Indeed. Your sample log file mentions 694 numbers of observations being processed in the various steps. These numbers range from 0 to 120 with an average of about 10. So it really seems like you're working with large numbers of tiny datasets. But most of the time (in general) it is much more efficient to work with larger datasets possibly comprising the information of many of those small datasets in the form of BY groups. DATA steps and most procedures (such as PROC FREQ, which you are using) support BY-group processing. A single DATA or PROC step using a BY statement (e.g., by fips) can possibly replace many similar steps processing one BY value at a time (in your log indicated by WHERE statements of the form WHERE fips='xxxxx' with a lot of individual values xxxxx). In PROC SURVEYSELECT the STRATA statement plays the role of a BY statement, as ballardw has already pointed out.

Re: How to Reg on each row?! with Slope/Intercept saved out?!

Re: How to use a macro variable in a if else condition

Re: How to use a macro variable in a if else condition

Re: problem with where clause on numeric

Re: INTCK Question

Re: How to tell macro variable created or not inside PROC SQL?!

Re: INPUT not converting character to numeric

Re: modify xaxis with different ranges (some very close to 1.xx and ot...

Re: modify xaxis with different ranges (some very close to 1.xx and ot...

Re: Proc Optmodel - output

Re: VALIDVARNAME=V7

Re: problem with ODS in SAS EG 8.3

Re: is there a minimum file size for .sas7bdat files?

Re: is there a minimum file size for .sas7bdat files?

Re: ods pdf and gmap: PDF output different than EG

Re: How to use a macro variable in a if else condition

Re: INTCK Question

Re: How to tell macro variable created or not inside PROC SQL?!

Re: modify xaxis with different ranges (some very close to 1.xx and ot...

Re: IF statement not working consistently

Re: Can I rename column 1 without knowing the name of column 1?

Re: how to assign grouping seq number

Re: Histogram with density using SAS SGPLOT

Re: How do I transpose data to long-form if the raw data is both long ...

Re: Histogram with density using SAS SGPLOT

Re: How to calculate std with group by , on a growing data

Re: how to calculate std on an increasing data, similar to commulative...

Re: Array Error :Array subscript out of range

Re: Logistic regression output interpretation

Re: how to add reference line in proc sgplot in survival analysis

Re: Comined vertical and horizontal lines in SGPLOT

Re: Read file and add last modified date

Re: Read file and add last modified date

Re: What does "_" in front of variable mean?

Re: Optimize SurveySelect Routine

SAS Analytics Explorers

CoDe SAS German