About mkeintz

mkeintz · ‎06-06-2025

Not only is a missing value still a value that can be used as a key, as @Kurt_Bremser says, but there are many other possible missing values. There are the 26 values of .A,.B, .... .Z, and also ._ (dot underscore). These can be used for special purposes, if you want. I don't think users would want SAS to automatically assume that such values should be ignored by default, as if they "have no value" in database tasks, even if they are "ignored" in statistical analysis.

mkeintz · ‎06-06-2025

Your left join is generating a cartesian cross of instances of FORMID_FUP=. in both datasets if you have multiple instances of missing value in the LEFT dataset. MERGE ... BY, on the other hand, does not do this. And since you suggest, in the case of NON-missing FORMID_FUP, that using MERGE ...; BY FORMID_FUP; yields what you want (and what you get) from the PROC SQL ... LEFT JOIN, then it must be that the LEFT dataset has exactly one record per non-missing FORMID_FUP value. My question is why do you want cases with missing FORMID_FUP? Why not exclude those cases from the join? ... As in (see the "where=" dataset name parameters below): proc sql; create table fup_timing as select a.*, t.Procedure_Date as t_Proc_Date label = "Followup Timing: Date Procedure from Proc Form", t.Schedule as t_ScheduleCat label = "Followup Timing: Standard vs. Specialized", from followup (where=(formid_fup^=.)) as a left join followup_timing (where=(formid_fup^=.)) as t on a.FormID_FUP= t.FormID_FUP; quit;

mkeintz · ‎05-29-2025

@Prashan wrote: Consider SASHELP.CARS dataset and give the sequence number for MAKE variable. Ex:- in MAKE variable, I want sequence number for AUDI ... stuff deleted ... that too only with PROC SQL, not with data step, I know how to do with data step. I think there are three statements relevant to this problem. There is NO way to reliably reproduce within-group physical sequence numbers in PROC SQL using supported tools. By "reliable" I mean reproducible with certainty. And even if you were to use monotonic(), there is no way to reproduce the results you would get in a DATA step (see my other note) unless the data were already sorted by the grouping variable. And the use of proc sql with monotonic() would require filtering the source dataset once for each group (i.e. 38 times in the case of MAKE from sashelp.cars). A big waste of resources. But probably the most important advice is to resist the atavistic urge to use PROC SQL for a purpose that it is totally unsuited for.

mkeintz · ‎05-29-2025

But @dxiao2017 . The dataset example you are using is already sorted by SEX, as presented by the OP. But that may be unlike most situations (and unlike sashelp.class which is the source of the data example, and is sorted by name, not by sex/name.). Besides there is no need to sort the data by sex, merely to generate within-sex sequence numbers. For instance: data want (drop=_:); set sashelp.class; _nm+(sex='M'); _nf+(sex='F'); sequence=ifn(sex='M',_nm,_nf); run; And if one MUST use SQL, then the undocumented (read "unsupported") MONOTONIC() function (see MONOTONIC-function-in-PROC-SQL) can be used for each sex: proc sql; create table want as select name, monotonic() as seq from sashelp.class where sex='M' union corr select name, monotonic() as seq from sashelp.class where sex='F' ; quit; But note that data order in the case of PROC SQL will almost certainly not be the same as in the original data set. And that the row order will actually change depending on the order of variables

mkeintz · ‎05-23-2025

This can be done in a single data step that first reads DATASET1, stores it in a hash object (lookup table) with a lookup key created by modifying X (remove leading non-digits, shorten it to the last 5 digits if necessary, and remove leading zeroes. This is followed by reading DATASET2, where X is similarly modified, and a lookup is performed to see whether it is in the hash object, from which the PRODUCT value is retrieved: data dataset1 (label='x with 3 or 5 digits'); infile datalines missover; input product $4. x :$20. ; datalines; via1 via2 003 via3 014 via4 GA4 via5 GA015 via6 319 via7 23456 via8 10101010198765 run; data dataset2; infile datalines ; input name $1. x :$20. ; datalines; a 2 b 3 c 14 d 4 e 15 f GF319 g 23456 h 98765 run; data want (drop=rc);; set dataset1 (in=in1) dataset2 (in=in2); where x^=''; if _n_=1 then do; declare hash d1 (); d1.definekey('x'); d1.definedata('product'); d1.definedone(); end; x=substr(x,anydigit(x)); /*Remove leading non-digits*/ if length(x)>5 then x=substr(x,length(x)-4); /*If too long, take last 5 digits*/ do while (x=:'0'); /* Strip leading zeroes*/ x=substr(x,2); end; if in1=1 then d1.add(); if in2; rc=d1.find(); run;

mkeintz · ‎05-23-2025

You say that dataset1 can have: missing data 3 digits 3 or 5 digits 5 digits Yet the fourth observation has X=GA4. one digit. Please be complete in describing the data, and clarifying the matching rules. Help us help you.

mkeintz · ‎05-23-2025

data need (keep=subject max_enddt) / view=need; do until (last.subject); set have (where=(enddt^=.)); by subject; max_enddt=max(max_enddt,enddt); end; run; data want (drop=max_enddt); merge have need; by subject; cmenddt=coalesce(cmenddt,max_enddt); run; It's programmed as two steps, but because the first step is a data set view it's only activated when the view (named NEED) is called for in the second step. Reduces disk activity.

mkeintz · ‎05-22-2025

Can X in dataset 2 ever have more than five digits? If so, the scenario 6 will be ambiguous. Dataset 1 has 10101010198765 Your dataset 2 has 98765. But it might also have 198765. If so, then which obs from dataset 2 is chosen?

mkeintz · ‎05-22-2025

Probably a single DATA step is the simplest approach. Generalizing a bit from @PaigeMiller, consider the use of arrays, as in: data want (keep=locker_id locations person code); set have; array pers {*} person_a--person_c; array cods {*} person_a_code-- person_c_code; do i=1 to dim(pers); person=pers{i}; code=cods{i}; if person^=' ' then output; end; run; If you want to add a PID variable (="A", or "B", or "C", etc.) as done by @data_null__ , you can make a minor tweak: data want (keep=locker_id locations person pid code); set have; array pers {*} person_a--person_c; array cods {*} person_a_code-- person_c_code; do i=1 to dim(pers); person=pers{i}; pid=byte(i+64); /*Byte(65)='A', Byte(66)='B', etc. */ code=cods{i}; if person^=' ' then output; end; run;

mkeintz · ‎05-20-2025

If you already know the fixed values of variable TBL, then a merge of subsets of HAVE (one subset per TBL value), each with a rename, makes this into a single pass DATA step: data have; input tbl $ field $; cards; tbl1 x tbl1 y tbl1 z tbl2 w tbl2 x tbl2 v tbl2 y run; data want; merge have (where=(tbl='tbl1') rename=(field=tlb1)) have (where=(tbl='tbl2') rename=(field=tlb2)) ; drop tbl; run;

mkeintz · ‎05-10-2025

You could compare two daily datasets at a time, but that would mean processing most of the datasets twice, once as the "before" date, and once as the "after". But if each of the datasets are sorted by TKT, then you could process all of the datasets in a single pass. Something like (I have changed the daily dataset names to DATA_20250401, DATA_20250402, ... DATA_20250430): data want; set data_202504: ; by tkt descending date; if first.tkt=0 and dif(date)^=-1 then output; else if first.tkt=1 and date^='30apr2025'd then output; run; If the data are not sorted by TKT and if sorting would be expensive, then read the datasets in reverse chronological order. You could use two hash objects to hold current and next daily data (NEXTDAY in the code below). If an incoming observation has a TKT not found in the NEXTDAY object, then output it. At the end of each day, clear the NEXTDAY object and copy the CURRDAY data into it, in preparation for new current date. data want; set data_202504: ; by descending date; if _n_=1 then do; declare hash currday(); currday.definekey('tkt'); currday.definedata('tkt','date'); currday.definedone(); declare hiter i ('currday'); declare hash nextday(); nextday.definekey('tkt'); nextday.definedata('tkt','date'); nextday.definedone(); end; if date='30apr2025'd then do; nextday.add(); return; end; currday.add(); if nextday.check()^=0 then output; if last.date then do; /*Replace NEXTDAY with CURRDAY hash object */ nextday.clear(); do while (i.next()=0); nextday.add(); end; currday.clear(); end; run; Note these programs assume there are no duplicate TKT values within each daily dataset.

mkeintz · ‎05-06-2025

In the absence of sample data in the form of a working DATA step, here is untested code. This code assumes every EOS record is preceded by a matching WBC record: data want; set have; array var aval anrlo anrhi; array pct lborres lbornrlo lborrnrhi; do over var; var=ifn(param='EOS',sum(0,lag(var))*pct/100,var); end; run; The reason for the "sum(0,lag(var))" expression is to avoid an error message with the first observation, for which lag(var) is missing, and therefore would cause a missing value result when multiplied by "pct/100"). You could test for matching WBC record with something like: data want; set have; array var aval anrlo anrhi; array pct lborres lbornrlo lborrnrhi; do over var; var=ifn(param='EOS' and lag(param)='WBC' and subjid=lag(subjid) and avisit=lag(avisit) ,sum(0,lag(var))*pct/100 ,var); end; run; Again, untested.

mkeintz · ‎05-04-2025

I agree that much of the time order of the variables I produce in a data set are not important. But there are two situations in which I do care about variable order: I want a quick onscreen view of the data. It is effective to use the double-dash syntax to declare a list of variables

mkeintz · ‎05-04-2025

You want to know if you can form a table statement that generates one set of statistics for some rows (class vars) and other statistics for other rows (continuous vars). The answer is no. But the greater question is why? Are you trying to generate a specific sequence of variables in your report?

mkeintz · ‎05-04-2025

Let's say you want exactly 10 equal size groups, subject to you don't know exactly how many obs are in the dataset you may have tied price values you want to keep tied prices in the same group: Program below is corrected, by entering an explicit OUTPUT statement, which prevents premature increment to the GROUP variable. proc sort data=have out=need; by descending price; run; data want; set need nobs=n_need; by descending price; retain group 1; output; if last.price=1 and _n_ > group*(n_need/10) then group+1; run;

Online Status	Offline
Date Last Visited	‎11-28-2025 10:28 PM