About ErikLund_Jensen

ErikLund_Jensen · ‎03-21-2019

Hi @Pabster Are you sure that DLDTE is a year and not a date value?

ErikLund_Jensen · ‎03-21-2019

Hi @n6 In many cases "return to sender" is not an option, e.g. if the source is an application where data is generated by many users for a long period, so we have to deal with garbage and make the best of it. The definition of "best" depends on the further use of data, so it is not a technical matter to decide if a partial date is best represented as a missing value or as an interpretation where a missing day component is set to 1 or 15 or whatever. If you are working on your own project you can make your own decisions like "I only want years in my final analysis". But if you are just preparing data, you shold always consult the requestor and make her define "best" instead of making your own interpretation and present the result as facts to the requestor. It is tempting to read dirty dates with an any-informat as shown in @PeterClemmensen 's example. But be careful, because the result is an interpretation, where missing values are filled out and day/month may be shifted, as the following example will show: data work.have; input VisitDate $20. ; cards; % %/%/% %/%/2008 %/%/2009 01/%/2010 01//2013 01/1/2010 02-03-2011 02-11-2011/ 03/%/2008 03/%/2010 03/13/12 13/03/12 03/5/2013 /% /%/%/% %/%/2008/ %/%/2009 /01/%/2010 /01//2013 ; data work.w ; set work.have; format date_any date_mdy mmddyy10.; date_any = input(VisitDate, ??anydtdte12.); date_mdy = input(VisitDate, ??mmddyy10.); run;

ErikLund_Jensen · ‎03-19-2019

Sorry It is here

ErikLund_Jensen · ‎03-14-2019

Hi @Srigyan Here is a bit of code to transform the override dataset. data work.b; informat Override_date anydtdte9.; input Product $ Override_date price; datalines; A 10/01/2018 11 A 11/01/2018 12 A 13/01/2018 9 A 14/01/2018 5 ; run; data b (rename=(price=New_price)); format Override_date End_date ddmmyyd10.; set b; by Product; p = _N_+1; if not last.Product then do; set b(drop=price rename=(Override_date=end_date)) point=p; end_date = end_date - 1; end; else end_date = '31dec9999'd; run;

ErikLund_Jensen · ‎03-14-2019

Hi @Srigyan I have a macro to handle problems like this. It is an interval merger that takes several intervals for the same ID in both datasets and accepts missing periods between intervals. It works with subgroups too. It is made to handle large data sets, so it is quite effecient. It requires intervals in both data sets, and besides interval- and ID variables it cannot accept variables withe same name in both data sets, so something must be done to add end dates to the override table and chane the price variable name before calling the macro. And the documentation is written in danish. It is big, so I have attached it as a file. You are free to use it, and I will be happy to help with any questions. It is used like this: data work.a; informat St_date End_date anydtdte9.; input Product $ St_date End_date Price; datalines; A 17/12/2017 09/01/2018 8 A 10/01/2018 15/01/2018 10 ; run; data work.b; informat Override_date End_date anydtdte9.; input Product $ Override_date End_date new_price; datalines; A 10/01/2018 10/01/2018 11 A 11/01/2018 12/01/2018 12 A 13/01/2018 13/01/2018 9 A 14/01/2018 31/12/9999 5 ; run; %FletrensInterval(work.a, work.b, work.want, unita=Product, datefirsta=St_date, datelasta=End_date, unitb=Product, datefirstb=Override_date, datelastb=End_date, join=left);

ErikLund_Jensen · ‎03-14-2019

Kudos! - Your solution makes me feel like this: https://www.shutterstock.com/da/image-vector/simple-cartoon-businessman-knocking-his-head-443053690

ErikLund_Jensen · ‎03-14-2019

Hi @km0927 If you don't have IML licensed at your site ($$$$), I think the only way is to rearrange data, so you have all occurrences of any two diagnoses per ID. Then you can get your wanted output with one proc freq. I tried and came up with this: data have; input id fever vomiting redness swelling; datalines; 1 1 0 1 1 2 1 1 0 0 3 0 1 1 1 4 1 0 1 0 5 0 0 1 1 ; run; proc transpose data=have out=temp1; by id; run; proc sql; create table temp2 as select a._name_ as diag1 label='', b._name_ as diag2 label='' from temp1 as a full outer join temp1 as b on a.id = b.id where a.col1 = 1 and b.col1 = 1; quit; proc freq data=temp2; table diag1 * diag2 / norow nocol nopercent; run; Here is the result. I left the totals out:

ErikLund_Jensen · ‎03-13-2019

Hi @noda6003 Long character variables has always been a problem i SAS, and setting new lengths based on an analysis of actual data is dangerous, because next day's actual data might be longer. Some variables have content with a defined max. length, like a postal code or social security number, and for these the length can be set safely. But give ample space to varying-length text, e.g. use 200 if actual data says 157. Consider compressing your SAS data with compress=char, if your concern is disk space. An easy way to find the lengths in actual data is to export the troublesome variables to csv and import them to a work data set: data have; length name1 name2 name3 $1024; name1 = 'Tom'; name2 = 'W.'; name3 = 'Jones'; run; filename csvfile "%sysfunc(getoption(work))\w.csv"; proc export data=have outfile=csvfile dbms=csv replace; run; proc import datafile=csvfile out=want dbms=csv replace; run;

ErikLund_Jensen · ‎03-13-2019

Hi @Santt0sh A warning in SAS is is something that causes the automatic variable SYSCC to be set to the value 4. It also gives a message in the SAS log, but the message is just a green text that explains the warning, it has nothing to do with job completion. You can - at your own peril - redirect the log and get rid of the message, but your job will still end with a return code 4. In general, warnings in production jobs should not be accepted. There are special cases, e.g. when reading XML input using a map/XML filename, and content is longer than 32676 bytes, where a warning seems unavoidable, and in these cases the warning can be nullified by resetting SYSCC to 0, but then there is no reason to suppress the message. Step with warning - gives message + return code: data _null_; b = "&y"; run; %put &=syscc; 121 data _null_; 122 b = "&y"; WARNING: Apparent symbolic reference Y not resolved. 123 run; NOTE: There were 1 observations read from the data set WORK.A. NOTE: The data set WORK.B has 1 observations and 2 variables. NOTE: DATA statement used (Total process time): real time 0.02 seconds cpu time 0.00 seconds 124 %put &=syscc; SYSCC=4 Step with warning and redirected log - no message, but return code: filename tmp dummy; proc printto log=tmp; run; data _null_; b = "&y"; run; proc printto; run; %put &=syscc; 138 filename tmp dummy; 139 proc printto log=tmp; 140 run; NOTE: PROCEDURE PRINTTO used (Total process time): real time 0.00 seconds cpu time 0.00 seconds 146 %put &=syscc; SYSCC=4 Step with warning and reset of return code - gives message: data _null_; b = "&y"; run; %put &=syscc; %if &syscc=4 %then %do; %let syscc = 0; %end; %put &=syscc; 148 data _null_; 149 b = "&y"; WARNING: Apparent symbolic reference Y not resolved. 150 run; NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.00 seconds 151 %put &=syscc; SYSCC=4 152 %if &syscc=4 %then %do; 153 %let syscc = 0; 154 %end; 155 %put &=syscc; SYSCC=0

ErikLund_Jensen · ‎03-11-2019

Hi @Ammu18 Never use macro coding if it is not necessary, and it is not in this case. As explained by @PaigeMiller, you cannot have more than one variable with the same name, and I cannot imagine why you would want that, because you would still have the same output data set with 360 variables. I think your request only makes sense it you want to output 12 output variabeles with Month1 as the sum of day1-day30 and so on. If that is a usable result, you can get it in one data step using 2 arrays like this: data have (drop= i j); array day day1-day360; do i = 1 to 10; do j = 1 to 360; day{j} = ranuni(1) > 0.5; end; output; end; run; data want (keep=month1-month12); set have; array day day1-day360; array month month1-month12; do i = 1 to 360; m = int((i-1)/30)+1; month{m} = sum(month{m},day{i}); end; run; Want has 360 variables with random distribution of 0/1, and have is:

ErikLund_Jensen · ‎03-10-2019

A relevant question because we don't know the actual problem. But I tried to imagine a situation where a partial overlap should be removed, like in "make a list of employees employed in the full months of april, may and june". I often do interval joins, but the requirement has always been either a boolean "overlap or not", a real interval join where the overlapping interval is the wanted output, or overlap on a given day like end-of-month. It was never "Full overlap only", so I am just curious - is it something people often use "out there"?

ErikLund_Jensen · ‎03-10-2019

Hi @Patrick Very elegant indeed - I didn't think of that. I had a hash lookup in mind, but (I am ashamed to admit) most of my toolbox is pre-V9, and after 15 years with V9 I am still not really familiar with hash objects, so it would take me too long to figure out. But I could not resist running a test to compare the two solutions. I used 24 months in A and 5.000.000 observations in B with random intervals. SQL is the winner: 236 data A (drop=i s); 237 format begin end mmddyy10.; 238 s = '01dec2017'd; 239 do i = 1 to 24; 240 begin = intnx('month',s,i); 241 end = intnx('month',begin,1)-1; 242 month = put(begin,monyy.); 243 output; 244 end; 245 run; NOTE: The data set WORK.A has 24 observations and 3 variables. NOTE: DATA statement used (Total process time): real time 0.02 seconds cpu time 0.00 seconds 246 247 data B; 248 format recur_start recur_end yymmdd10.; 249 do custid = 1 to 5000000; 250 recur_start = (ranuni(1) * 730) + 21000; 251 recur_end = min(recur_start + (ranuni(3) * 730),22100); 252 recur_amt = int(ranuni(5)*1000); 253 output; 254 end; 255 run; NOTE: The data set WORK.B has 5000000 observations and 4 variables. NOTE: DATA statement used (Total process time): real time 0.45 seconds cpu time 0.45 seconds 256 257 data want(keep=month begin end recur_amt); 258 set b; 259 _last=0; 260 do _i=1 to _nobs; 261 set a point=_i nobs=_nobs; 262 if begin<=recur_end and end>=recur_start then output want; 263 end; 264 run; NOTE: There were 5000000 observations read from the data set WORK.B. NOTE: The data set WORK.WANT has 51607422 observations and 4 variables. NOTE: DATA statement used (Total process time): real time 17.74 seconds cpu time 16.90 seconds 265 266 proc sql; 267 create table want as 268 select a.month, b.custid, b.recur_start, b.recur_end, b.recur_amt 269 from A, B 270 where 271 a.begin <= b.recur_end 272 and a.end >= b.recur_start; NOTE: The execution of this query involves performing one or more Cartesian product joins that can not be optimized. NOTE: Table WORK.WANT created, with 51607422 rows and 5 columns. 273 quit; NOTE: PROCEDURE SQL used (Total process time): real time 9.92 seconds cpu time 8.40 seconds I ran this on a Lenovo PC with SSD disk. I also tried it on a Linux Grid running a heavy batch load at the moment, and got almost the same figures: 21 22 data want(keep=month begin end recur_amt); 23 set b; 24 _last=0; 25 do _i=1 to _nobs; 26 set a point=_i nobs=_nobs; 27 if begin<=recur_end and end>=recur_start then output want; 28 end; 29 run; NOTE: There were 5000000 observations read from the data set WORK.B. NOTE: The data set WORK.WANT has 51607422 observations and 4 variables. NOTE: DATA statement used (Total process time): real time 14.18 seconds cpu time 13.16 seconds 30 31 proc sql; 32 create table want as 33 select a.month, b.custid, b.recur_start, b.recur_end, b.recur_amt 34 from A, B 35 where 36 a.begin <= b.recur_end 37 and a.end >= b.recur_start; NOTE: The execution of this query involves performing one or more Cartesian product joins that can not be optimized. NOTE: Table WORK.WANT created, with 51607422 rows and 5 columns. 38 quit; NOTE: PROCEDURE SQL used (Total process time): real time 8.39 seconds cpu time 8.39 seconds

ErikLund_Jensen · ‎03-10-2019

I wonder what sort of data you have in 6000 excel documents, and it's not clear what you want to do. Are your excel sheets all with a common set of columns, so they can be read with the same code? Do you want a single big SAS data set as output, or do you want individual SAS data sets per sheet? Do you have one or more sheets per excel file? Is this a one-shot or something that should end as production code?

ErikLund_Jensen · ‎03-10-2019

Hi @redfishJAX One way is an SQL join. The might be more effecient solutions, but I think this would run pretty fast too, because the A data set is small, and it is simple. data A; informat begin end mmddyy10.; format begin end mmddyy10.; input month $ begin end; datalines; Jan19 1/1/2019 1/31/2019 Feb19 2/1/2019 2/28/2019 ; run; data B; informat recur_start recur_end mmddyy10.; format recur_start recur_end mmddyy10.; input custid recur_start recur_end recur_amt; datalines; 1 10/15/2014 2/16/2019 150 2 2/18/2018 1/31/2019 150 3 12/15/2012 3/31/2021 100 ; run; proc sql; create table C as select a.month, b.custid, b.recur_start, b.recur_end, b.recur_amt from A, B where a.begin <= b.recur_end and a.end >= b.recur_start; quit;

ErikLund_Jensen · ‎03-10-2019

Hi @daft22 In your posted code you don't have a set statement in your recode step, so instead of recoding values in the original data set HW, you replace the existing HW with a new data set with one observation and one variable containing a missing value. But you refer to "none of the data", which means that you ran your code with a set statement, but didn't get the desired result. That is because Homework1 is already defined as numeric, so you cannot change it to 'A'. You get a missing value and a note about invalid numeric data. If you want to recode data instead of using a format, you must create new grade-variables to hold the character values. Please always post the log. I gues it looks like this: 21 22 DATA HW; set HW; 23 IF Homework1 >= 90 THEN DO; 24 Homework1 = 'A'; 25 END; 26 RUN; NOTE: Character values have been converted to numeric values at the places given by: (Line):(Column). 24:13 NOTE: Invalid numeric data, 'A' , at line 24 column 13. Homework1=. Homework2=30 _ERROR_=1 _N_=1 NOTE: There were 1 observations read from the data set WORK.HW. NOTE: The data set WORK.HW has 1 observations and 2 variables. NOTE: DATA statement used (Total process time): real time 4.55 seconds cpu time 0.14 seconds

Online Status	Offline
Date Last Visited	‎06-24-2025 05:37 AM

Re: SAS Metadata: list of all attributes you can use in METADATA_GETAT...

Re: SAS Metadata: list of all attributes you can use in METADATA_GETAT...

Re: Split name vertically

Re: How to flag the last AVALC = 'Y' record prior to this AVALC='N' re...

Re: The use of last.id by multiple groups

Re: combine datasets even when results are missing

Re: combine datasets even when results are missing

Re: How do i parse specific number combinations from a field in a tabl...

Re: How do i parse specific number combinations from a field in a tabl...

Re: Pseudonymization of sensitive data

Re: How to convert DATETIME to ISO8601 format ?

Re: create macro vars-End of month

Re: Flag the next value of a variable

Re: SAS Programming 1_Lesson 5_p105a03.sas: repeating the same code do...

Re: Write SAS Time fields to Excel - loses time format

Re: combine datasets even when results are missing

Re: Pseudonymization of sensitive data

Re: SAS Connect To ODBC with Big Query issue

Re: SAS code for assigning different values of race for an individual ...

Re: Unable to open a particular job in SAS DI Studio

Re: PROC SQL Merging error

Re: Date Cleaning

Re: range change

Re: range change

Re: range change

Re: n*n table with multiple variables

Re: n*n table with multiple variables

Re: how to resize datasets

Re: Automated jobs failing because SAS logs contains Warning messages.

Re: How to rename day1 to day30 as month 1

Re: Loop through large set multiple times based on matching values fro...

Re: Loop through large set multiple times based on matching values fro...

Re: How to exchange and calculate data in a large number of excel docu...

Re: Loop through large set multiple times based on matching values fro...

Re: Need assistance with how to alter data using a IF-THEN ELSE loop

SAS Inner Circle Panel

SAS Analytics Explorers