yesterday
mkeintz
PROC Star
Member since
08-02-2012
- 4,481 Posts
- 497 Likes Given
- 714 Solutions
- 2,357 Likes Received
-
Latest posts by mkeintz
Subject Views Posted 259 Thursday 312 Wednesday 175 a week ago 265 2 weeks ago 253 2 weeks ago 3062 2 weeks ago 383 2 weeks ago 402 2 weeks ago 446 2 weeks ago 236 3 weeks ago -
Activity Feed for mkeintz
- Got a Like for Re: Efficiently Adding a Shifted Date Column to a Large SAS Dataset. Thursday
- Posted Re: Efficiently Adding a Shifted Date Column to a Large SAS Dataset on SAS Programming. Thursday
- Liked Re: Presorted not working with NoDupRec for FreelanceReinh. Thursday
- Got a Like for Re: Why is the SAS documentation website always so slow to load?. Wednesday
- Posted Re: Why is the SAS documentation website always so slow to load? on SAS Programming. Wednesday
- Got a Like for Re: creating episodes using multiple dates when not clean sorted. Sunday
- Posted Re: creating episodes using multiple dates when not clean sorted on SAS Programming. a week ago
- Got a Like for Re: IF _N_=1 Then SET. a week ago
- Got a Like for Re: Variable to identify individuals whose type of diabetes changes from one hospital visit to anot. a week ago
- Got a Like for Re: IF _N_=1 Then SET. 2 weeks ago
- Posted Re: How can i achieve the value based on last YEAR (previous row)? on SAS Programming. 2 weeks ago
- Posted Re: IF _N_=1 Then SET on SAS Programming. 2 weeks ago
- Posted Re: What is NADIR on SAS Programming. 2 weeks ago
- Posted Re: How to identify the numbers inside '||' and mark them on SAS Programming. 2 weeks ago
- Posted Re: How to identify the numbers inside '||' and mark them on SAS Programming. 2 weeks ago
- Got a Like for Re: I have different results in PROC GLIMMIX with the same dataset and code. 2 weeks ago
- Posted Re: I have different results in PROC GLIMMIX with the same dataset and code on Statistical Procedures. 2 weeks ago
- Liked Re: format date with underscore for PeterClemmensen. 3 weeks ago
- Posted Re: Washout out window for medications on SAS Programming. 3 weeks ago
- Got a Like for Re: How do I identify Subtypes based on specific algorithm. 3 weeks ago
-
Posts I Liked
Subject Likes Author Latest Post 4 1 1 1 1 -
My Liked Posts
Subject Likes Posted 1 Thursday 1 Wednesday 2 a week ago 3 2 weeks ago 1 2 weeks ago -
My Library Contributions
Subject Likes Author Latest Post 6
Thursday
1 Like
Welcome @abigel to the sas support community:
you wrote:
However, when I created a new table, the process finished in 30 minutes. How should I properly execute this process?
I would use the new table creation process. It's simply faster.
As to why this is the case, I write the below as a non-expert in PROC SQL:
I wonder if the ALTER process "modifies" one row at a time, as opposed to writing a collection of rows when creating a new table. That is, perhaps it utilizes much more input/output resources.
Also a question, Does your current table have integrity constraints defined? If so, then note from Creating and Using Integrity Constraints in a Table: (italics mine)
"Integrity constraints are rules that you specify to guarantee the accuracy, completeness, or consistency of data in tables. All integrity constraints are enforced when you insert, delete, or alter data values in the columns of a table for which integrity constraints have been defined. Before a constraint is added to a table that contains existing data, all the data is checked to determine that it satisfies the constraints."
This suggests to me that the ALTER statement would need to allocate time to apply those integrity constraints in modifying every observation in the table. OTOH, I presume that making a new table would not automatically do so using the code you have shown.
... View more
Wednesday
1 Like
I can not reproduce anything like the result you report. From my home computer (around 8pm EST) it took about 5 seconds to load the bookmarked link you cite.
More generally, I have not had speed issues in loading SAS documentation links.
... View more
a week ago
2 Likes
Please state a clear and complete rule for assigning stay_start and stay_end. I cannot conjure up a rule to satisfy all the results you show.
... View more
2 weeks ago
This is a good use case for embedding a LAG function inside an IFN (or IFC) function, as in:
data mohx_data;
input MOHX_ID $ MOHX_YEAR MOHX_SEQ MOHX_ENDST NC;
datalines;
A 2023 1 0 60
A 2023 2 0 30
A 2024 1 0 60
A 2025 1 0 60
B 2023 1 0 25
B 2023 2 0 35
B 2024 1 0 45
B 2025 1 0 60
run;
data want;
set mohx_data;
by mohx_id mohx_year;
retain prv_nc .;
if first.mohx_year then prv_nc=ifn(first.mohx_id,.,lag(nc));
run;
I assume when there are multiple obs for a single mohx_year (and therefore multiple possible NC values), that you want to carry forward the NC value for the first of those multiple obs.
... View more
2 weeks ago
3 Likes
You can use if "first.type then set type;" per @ChrisNZ's suggestion. That would be the most similar to your other code solution.
Or you can use the MERGE statement, as in:
proc sort data=sashelp.cars out=cars;
by type;
run;
data want;
merge cars tbl1;
by type;
run;
This many-to-one merge process is probably the most common use of the MERGE statement (accompanied by the BY statement).
If you don't like the idea of being required to sort sashelp.cars by TYPE, then you could load dataset TBL1 into a hash lookup table:
data want;
set sashelp.cars tbl1 (obs=0);
if _n_=1 then do;
declare hash h (dataset:'tbl1');
h.definekey('type');
h.definedata(all:'Y');
h.definedone();
end;
if h.find()^=0 then call missing(avg_invoice);
run;
... View more
2 weeks ago
I presume that, for each week, you want nadir to report the lowest weekly value of AVAL up through the week in hand. If so, then you can use the "retain nadir" statement:
data want;
set have;
by usubjid;
retain nadir;
if first.usubjid then nadir=aval;
else nadir=min(aval,nadir);
run;
As each new observation is read, nadir has the lowest aval through all preceding weeks, which then is compared to the current aval. Of course, for the beginning of each usubjid, nadir is set to aval.
... View more
2 weeks ago
@shawnchen0321 wrote:
@mkeintz @Thank you for your assistance.
I have a question. If the company with a big problem is 2 or 42, how can I avoid 2 by catching 2 instead of catching 12 and 42?
This is a perfect opportunity to do a test. Create a string with 12 (and no 2) and a string with 2 (and no 12). Then see whether the code generates a false positive from the string with 2.
... View more
2 weeks ago
You have a character variable with a list of problem identifiers, separated by pipe symbols. Treat those pipes as word separators, in the SCAN function. Use the SCAN function to examine each "word" in the string, to see if any has a value of "42" or "12":
data data;
input string$40.;
datalines;
|17|42|44|
|17|33|55|2|
|5|12|17|42|44|77|
run;
data want;
set data;
big_problem=0;
do w=1 to countw(string,'|') while (big_problem=0);
if strip(scan(string,w,'|')) in ('12','42') then big_problem=1;
end;
drop w;
run;
The STRIP function removes leading and trailing blanks, if there are any, in each constituent of the string variable.
... View more
2 weeks ago
1 Like
Whenever someone raises the problem of "different results ... from the same code and data", I would want to see the log's of both runs, to confirm that SAS also thinks it is the same code.
And, for those in this community who have familiarity with PROC GLIMMIX (I am not), you might benefit from showing the logs from both runs, as well as the results from each. Help the community help you.
... View more
3 weeks ago
You want to find the first observation following a date gap of 180 days or more. All obs starting with that obs have INCLUDE=1. All obs prior to it have INCLUDE=0. For the first obs of each PT_ID the gap is the number of days from 01jan2020 to PRES_DATE:
This is an untested program in the absence of sample data in the form of a working DATA step. Assuming data are sorted by PT_ID/PRES_DATE, then:
data want (drop=_:);
set have;
by pt_id;
retain include 0;
if first.pt_id then include=0;
_prior_date=ifn(first.pt_id,'01jan2020'd,lag(pres_date));
if pres_date-_prior_date>180 then include=1;
run;
... View more
3 weeks ago
What you apparently want is code that looks like this:
data want;
set values;
seq1 = sum(score1 ,score2 ,score3 ,score4 ,score5 ,score6 ,score7 ,score8 );
seq2 = sum(score1 ,score2 ,score3 ,score4 );
seq3 = sum(score1 ,score3 ,score6 ,score8 );
seq4 = sum(score3 ,score4 ,score7 );
run;
You can use a DATA _NULL_ step reading dataset SEQ to produce that code in a temporary file, which can then be INCLUDED in a subsequent DATA WANT step, as in:
filename sumcode temp;
data _null_;
set seq;
array sdummies {*} s1-s8 ;
file sumcode;
put seqnames '= sum(' @;
do i=1 to dim(sdummies);
if sdummies{i}=1 then put 'score' i ',' @;
end;
put + (-1) ');';
run;
data want;
set values;
%include sumcode / source2 ;
run;
The primary tool used here is the trailing @ used in the PUT statements, which holds the output column pointer in place, waiting for the next PUT statement. The last PUT has the pointer control '+ (-1)' which moves the pointer back one character, in order to replace the troublesome unnecessary comma appended to the last variable name with a closing paren and a semi-colon.
This program is untested in the absence of sample data in the form of working DATA steps.
I see other responders have utilized the character variable SEQ, instead of the S1 through S8 variables that I used. If those variables are not available, then change the DATA _NULL_ step to:
filename sumcode temp;
data _null_;
set seq;
file sumcode;
put seqnames '= sum(' @;
do i=1 to countw(seq);
addend=cats('score',scan(seq,i)) ;
if i<countw(seq) then addend=cats(addend,',');
put addend @;
end;
put ');';
run;
... View more
3 weeks ago
1 Like
Given that the data are already sorted by ID, then you can read each ID group with all the 'A' and 'B' types preceding all 'Unspecified' types, via
data want;
set have (where=(type in ('A','B')))
have (where=(type ='Unspecified')) ;
...
;
Then:
data want;
set have (where=(type in ('A','B')))
have (where=(type ='Unspecified')) ;
by id;
retain min_dist . true_type ' ' ;
_dist=min(abs(service_start-reference_date)
,abs(service_end-reference_date));
if first.id=1 then do;
min_dist=_dist;
true_type=type;
end;
else if type in ('A','B') and _dist<min_dist then do;
true_type=type;
min_dist=_dist;
end;
if last.id;
keep id true_type ;
run;
... View more
3 weeks ago
2 Likes
This is a case in which you need to have the lagged value of dose in hand (_last_dose below). Then, for any but the first observation for an ID, you can test the values of current dose and last dose to generate the four dummy variables of interest:
data have;
input id Month Drug_A_dose ;
datalines;
1 1 500
1 2 500
1 3 .
1 4 .
2 1 500
2 2 500
2 3 500
2 4 250
2 5 250
3 1 500
3 2 250
3 3 250
3 4 500
3 5 100
4 1 500
4 2 .
4 3 .
4 4 500
run;
data want;
set have;
by id;
_last_dose=lag(drug_a_dose);
if first.id=0 then do;
if . < drug_a_dose < _last_dose then drug_a_reduce=1; else
if drug_a_dose > _last_dose > . then drug_a_increase=1; else
if _last_dose^=. and drug_a_dose= . then drug_a_discontinue=1; else
if _last_dose=. and drug_a_dose^= . then drug_a_restart=1;
end;
run;
But why not make a single result variable, coded from 1 to 4 (1=reduce, 2=increase, 3=discontinue, 4=restart). This would probably be better for most types of subsequent analysis procedures. (Note forgotten line of code now where it belongs).
data want;
set have;
by id;
_last_dose=lag(drug_a_dose); /* Forgot to put this in*/
if first.id=0 then do;
if . < drug_a_dose < _last_dose then drug_a_change=1; else
if drug_a_dose > _last_dose > . then drug_a_change=2; else
if _last_dose^=. and drug_a_dose= . then drug_a_change=3; else
if _last_dose=. and drug_a_dose^= . then drug_a_change=4;
end;
run;
... View more
4 weeks ago
4 Likes
Untested, in the absence of sample data in the form of a working data step:
data want;
set have;
by account_n;
trans_cr=coalesce(trans_cr,0);
trans_dr=coalesce(trans_dr,0);
retain endday_balance;
if first.account_n then endday_balance=start_balance;
else endday_balance + trans_cr +(-1*trans_dr);
run;
The above assumes data are sorted by account_n/date, and that the first obs for each account has a valid start_balance value, but no values for trans_cr and trans_dr. Also no other obs has a valid start_balance value.
... View more
a month ago
This code flags the last-encountered triplet of observations that have any instance of ADY with the minimum value for a USUBJID/PARAMCD/AVISITN group.
The program assumes:
Data are sorted by USUBJID/PARAMCD/AVISITN.
Records come in groups of three consecutive observations.
Each such triplet starts with REPNUM=1 (and no other observation has REPNUM=1).
data have;
infile cards truncover expandtabs;
input USUBJID PARAMCD $ ADTM :$40. ADY AVISITN REPNUM;
cards;
3001 EG 11JAN2024:12:12:37 29 7 1
3001 EG 11JAN2024:12:13:30 29 7 2
3001 EG 11JAN2024:12:14:07 29 7 3
3001 EG 11JAN2024:14:02:35 29 7 1
3001 EG 11JAN2024:14:03:12 29 7 2
3001 EG 11JAN2024:14:03:50 29 7 3
3002 EG 25JAN2023:14:02:22 169 7 1
3002 EG 25JAN2023:14:05:28 169 7 2
3002 EG 25JAN2023:14:08:28 169 7 3
3002 EG 01FEB2023:13:55:43 176 7 1
3002 EG 01FEB2023:13:58:46 176 7 2
3002 EG 01FEB2023:14:01:47 176 7 3
run;
data want (drop=_:);
set have (in=firstpass) have (in=secondpass);
by usubjid paramcd avisitn;
if first.avisitn then call missing(_rep_group,_min_ady,_min_group);
retain _rep_group -- _min_group;
if firstpass then do;
if repnum=1 then _rep_group+1;
_min_ady=min(ady,_min_ady);
if ady=_min_ady then _min_group=_rep_group;
end;
if secondpass;
if repnum=1 then _min_group=_min_group-1;
if _min_group=0 then new_flag='Y';
run;
Each USUBJID/PARAMCD/AVISITN group is passed through twice. The first time finds the minimum ADY and identifies the most recent triplet (_MIN_GROUP) having that minimum. During the second pass _MIN_GROUP is decremented every time REPNUM=1 is encountered. When it becomes zero, that's the group you want to flag. After that group, the variable _MIN_GROUP will decrement to negative values for subsequent triplets. These were determined NOT to have the minimum ADY in the first pass.
... View more