About Vince28_Statcan

Vince28_Statcan · ‎08-29-2013

I assume a.dtserv is your datetime variable. Since it is already a datetime20. formatted variable, it means that it is stored as numeric already. Thus, using something like input(a.dtserv, 5.) only truncates the numeric representation of your datetime variable making it lose all of it's meaning. If that is the case, then you simply want to use datepart(a.dtserv). That will return you a roughly 5 digit numeric value representing a date. I can't provide better help without details on your variables or at least examples of their unformated and formated values.

Vince28_Statcan · ‎08-29-2013

No, datepart is meant to be applied directly to a datetime value. Typically, datetimes are much larger than the 10000-99999 range in numeric value. As I couldn't tell which one of your variables were datetime20., I only provided a mean to extract the date from a datetime value. I don't quite see where you have a datetime value in your dates. Could you provide an example both variables (dtserv before the input statement preferably) as well as their formats and informats?

Vince28_Statcan · ‎08-29-2013

Lookup for DATEPART function (resp. timepart). data _null_; dt='01JAN2011'd; dttm=input('01JAN2011:00:00:00', datetime18.); dtpt = datepart(dttm); if dt=dtpt then put "Hello World!"; run;

Vince28_Statcan · ‎08-29-2013

SAS only has 2 types of variables, num and char. That is, currencies, dates, times are all simply numeric. It's the format that dictates how they are output. So you would have to do some form of conditioning on either the format and/or the informat columns to achieve your desired results. You could create some additionnal variable like ... type, format, (case when type="char" then "string" when type="num" and format in ("DATE9.", "MONYY.", "DDMMYY8.") then "date" when type="num" and format in (<some currency format list>) then "currency" else "numeric" end ) as derived_type the issue is that of creating an exhaustive list that will cover all the formats that you encounter on a regular basis.

Vince28_Statcan · ‎08-29-2013

As Ballardw said, it appears as though you are dereferencing macro variables too much for the way you've constructed your macro variables. Main_&&&&MONTHj..._&&&DAY&k.._&&YEAR&i; would resolve as follow first pass: && resolves to &, && resolves to & (leaving &&monthj..._) && resolves to &, &DAY resolves to an error because there are not macro variables DAY, &k resolves to 20 (leaving &<some error>20._ to resolve) && resolves to &, &i resolves to 13 (leaving &year13) to resolve Your real desired result would be along the lines of what Ballard posted. For each macro variable, you only need to double ampersand the month/day/year so that on first pass, it resolves && to & but the indexes ijk are resolved properly and that then on 2nd pass, you are left with &month1_&day2_&year3 to resolve. However, there are definitely many "better" ways to achieve your desired results for this example but depending on how you loop, you may or may not want to go one way or another. An approach would be as follow: %macro show(); %do i=1 %to 13; %do j=1 %to 12; %do k=1 %to 31; /*just to give a broader spectrum*/ %put Main_month%sysfunc(putn(&j, z2.))_day%sysfunc(putn(&k, z2.))_year20%sysfunc(putn(&i, z2.)) ; %end; %end; %end; %mend; %show(); That is, with the use of %sysfunc(putn( ,z2.)) there is no need to force yourself to hard type everything. you can just parse month day and year20 as text (instead of as macro variables) and use putn to do the job of left 0 padding numbers below 10.

Vince28_Statcan · ‎08-29-2013

You can use either SASHELP.Vcolumn and/or SASHELP.Vtable to help you retrieve the tables full name/variable names. However, if you always have exactly 12 datasets, you could do a fairly simple macro to loop on all 12 datasets. Regardless of the approach, some macroing will be necessary. If you have always exactly 12 months, you could do half hard code half macroing like %macro flagged(year=) data want; set have_&year.01(where=(flag_1='Y') rename=(flag_1=flag)) have_&year.02(where=(flag_2='Y') rename=(flag_2=flag)) ... have_&year.12(where=(flag_12='Y') rename=(flag_12=flag)) ; /* other processing if desired */ run; %mend; %flagged(year=2012); This is somewhat tedious as you have to write the 01/1 everywhere. What can be done instead is use some more macroing %macro flagged(year=) data want; set %do i=1 %to 12; have_&year.%sysfunc(putn(&i., z2.)(where=(flag_&i.='Y') rename=(flag_&i.=flag)) %end; ; /*the semi column ending the set statement */ /* other processing if desired */ run; %mend; %flagged(year=2012); If there is a chance that you will run on an incomplete year, then there is a way to achieve the desired results with vtable/vcolumn %macro flagged(year= , libname=); proc sql; select memname into :dsname1-:dsname12 from sashelp.vtable where libname=%upcase(&libname.) and substr(libname, 1, 9)="have_&year." ; quit; data want; set %do i=1 %to 12 &libname..have_&year.%sysfunc(putn(&i., z2.).(where=(flag_&i.='Y') rename=(flag_&i.=flag)) %end; ; /*the semi column ending the set statement */ /* other processing if desired */ run; %mend; %flagged(year=2012, libname=work); *edited immediately after seeing previous post noting that %sysfunc supports putn! The renames are just so that you don't have flag_1 to flag_12 in your resulting dataset with a bunch of missing values Vincent

Vince28_Statcan · ‎08-29-2013

As Tom has mentionned, you have to think of the possible paths through your condition logic. I believe you are missunderstanding either because you don't backtrack far enough to realize when/how can i=0 or because you are unaware of how the data vector (and thus the in: out: variables) behaves behind the scenes at each iteration of the data step. if i=0 then i=i+1; if out(i)^=. then i=i+1; In order for the first if condition to be true and thus for i to be incremented, you have to have i=0. Now if you look at your code, when exactly can i be equal to 0? Well, you only set i=0 once per data step iteration and it's when first.id. That is, i=0 only when you encounter the very first row of a by group. On top of that, the way your do until construct is made, the data step iterator loops each time you encounter a new by group. So then you have to know/be aware that at each iteration of the data step, all variables that are not in a retain statement are set to missing and then a new vector is read. Since out: and in: are variables built throughout your data step and not read from a dataset, it means that each time your data step loops, in: and out: are all set to missing. Thus, each time you enter a new by group and set the value i=0, in: and out: are all set to missing. So the resulting conclusion is that the conditions i=0 and out(i+1)^=. are mutually exclusive. They can never both be true at the same time. So adding the else has no impact on the final result. It only improves processing slightly because then each time i=0, the other condition is not verified. I gave it some time as I felt, like Tom, that it would be better for you to crush it by yourself. However, reading subsequent comments, I was affraid that the main blocking point was not knowing how the data step iterator behaved which really isn't entirely natural and not exactly easy to derive from an example alone. I Hope this helps and that I did not ruin Tom's pedagogical plans on this. Vincent

Vince28_Statcan · ‎08-28-2013

Hi, I'm sorry I guess I pasted the wrong version. "desc" only contains the name of the last product but your desired output is supposed to be in "result". The reason why it is empty/buggy is because there is no handling of blank padding thus at every loop iteration, the concatenation was trying to concatenate 500 blanks with whatever additionnal info and truncating to the 500 blanks. Here's the fixed version without removal of the ending comma atm although that isn't overly complicated to add. data want; length id $1. count $1. desc $20. result $500.; if _N_=1 then do; declare hash myhash(dataset: 'lookuptable'); myhash.defineKey('id'); myhash.defineData('desc'); myhash.defineDone(); end; set have; do i=1 to 4; /* taking advantage of fixed length and 24 different codes - here only 4 for the example data */ id = substr(string, i*2-1, 1); count = substr(string, i*2, 1); if count NE "0" then do; myhash.find(); result=cat(trim(result), trim(desc), " ", trim(count), ","); end; end; drop desc count i id; run; Sorry about that. Vincent

Vince28_Statcan · ‎08-28-2013

Could you provide a sample data and program to tinker with? In theory, you can modify the template for stat.lifetest.quartiles table to achieve the 10%, 20% etc setup directly with the existing template rather than rebuilding from scratch using the same method to calculate your intervals manually. I'm not yet familiar enough with proc template and I am still waiting for the proc template book to arrive back at the library at work so I can't be of much further help but if you can provide a working small set of data (it can be one of sashelp datasets) with a basic proc lifetest, I would gladly try to tinker around and provide you with findings. *at least that's what I understood from proc template Vincent

Vince28_Statcan · ‎08-28-2013

Hi Robert, Yes, both method should achieve your desired results. The main reason why Tom has suggested the transpose method is that the array method requires you to know, a priori, the maximum of groups for any given day (so as to set the array dimensions). You could, for instance, not care to add additionnal missing columns to your dataset and simply set a value "high enough" that it will never be reached but that depends on your requirements and what further processing you may want to do on this data. The transpose method has the advantage of creating variables only for each distinct ID (as defined by the variables in the ID statement of the proc transpose, not the ID variable in your dataset). So for instance, if you used this for 2 distinct datasets (lets say year 2011 and year 2012) and in year 2011, the maximum number of in/out pair for an ID is 30 and in 2012, it is instead 42, the transpose method on the 2011 dataset will provide you with a dataset with 61 columns and 2012 with 85 columns. On the other hand, if you used the array method and set array in (50); array out (50) ; "just to be safe" as you didn't know, a priori, how many distinct events occured on any given ID, then both 2011 and 2012 data would yield a dataset with 101 columns with the array method. You would effectively have the last 40 (resp. 16) columns missing for all records in the 2011 (resp. 2012) resulting datasets. So really, the approach to use depends on your task requirements and your a priori knowledge of your data. As to go back to the cnt=cnt-.1 and cnt=cnt+.1, looking back at the data you have provided, I realize it does not achieve what I wanted as your cnt variable is not a counter of sequential events for an ID but rather a counter of sequential events for an ID FLAG pair. The underlying logic would've been something like this: 101 IN 04Sep1989:7:30 1 101 IN 04Sep1989:13:45 2 101 IN 21SEP1989:17:55 3 101 OUT 05SEP1989:7:15 1 101 OUT 22SEP1989:06:00 2 101 OUT 23SEP1989:06:00 12 101 IN 24SEP1989:06:00 15 If you sort this by id time, you get 101 IN 04Sep1989:7:30 1 101 IN 04Sep1989:13:45 2 101 OUT 05SEP1989:7:15 1 101 IN 21SEP1989:17:55 3 101 OUT 22SEP1989:06:00 2 101 OUT 23SEP1989:06:00 12 101 IN 24SEP1989:06:00 15 Now imagine that you had a different "counter" that represents the sequence of events within a given ID as follow NEWCNT 101 IN 04Sep1989:7:30 1 1 101 IN 04Sep1989:13:45 2 2 101 OUT 05SEP1989:7:15 1 3 101 IN 21SEP1989:17:55 3 4 101 OUT 22SEP1989:06:00 2 5 101 OUT 23SEP1989:06:00 12 6 101 IN 24SEP1989:06:00 15 7 The logic of the +.1 -.1 was that the created records with a missing date would allow you to sort by id newcnt to achieve the appropriate sorting by time within an ID even though you had a bunch of missing time. 101 IN 04Sep1989:7:30 1 1 101 OUT . . 1.1 101 IN 04Sep1989:13:45 2 2 101 OUT 05SEP1989:7:15 1 3 101 IN 21SEP1989:17:55 3 4 101 OUT 22SEP1989:06:00 2 5 101 IN . . 5.9 101 OUT 23SEP1989:06:00 12 6 101 IN 24SEP1989:06:00 15 7 101 OUT . . 7.1 Due to the way my data step is built for middle2, the IN records with missing time are output after their respective OUT records so the way to rebuild the sequence of events with missing values would be through something similar. Again though, it will not work as intended after review since your CNT variable is not built the way I had in mind when I wrote the code. Nonetheless, it is irrelevant to the proc transpose. It is merely a conceptual tool if you wanted to have a full sequence of events vertically rather than horizontally. Vincent

Vince28_Statcan · ‎08-28-2013

Hi Robert, The reference to Tom in post #34 is how he handled cases where there would be 2 'OUT' in a sequence. However, it does not discuss handling the special case where the very first event has no 'IN' data. He has actually come up with the same alternative as I did in post #35. As for the proc transpose. On a large dataset where "at least one occurence" of, say, IN4 has data, the variable will be created. In the unlikely event that for all ID, an IN# or OUT# has all occurences missing, you would need to add an additionnal data step to create a "dummy row". Basically, the variables are only created by the transpose procedure if the ID is is encountered at least once in the data. Here is one of many ways that you could use to create such dummy rows data middle2; set middle; by id group; if first.group and last.group and flag='IN' then do; output; flag='OUT'; time=.; cnt=cnt+.1; output; end; else if first.group and flag='OUT' then do; output; flag='IN'; time=.; cnt=cnt-.1; output; end; run; proc transpose data=middle2 out=want (drop=_name_); by id; id flag group ; var time; run; This will add a whole bunch of records with a missing time variable for given groups. Don't mind the +.1 and -.1 for cnt=cnt±.1; that was only for continuity of middle2 serves for anything besides the transpose. It allows you to redo a sort by id cnt and achieve the logical/sequential results. It could very well not have been modified or set to missing instead and the proc transpose would've achieved the same result. Vincent

Vince28_Statcan · ‎08-27-2013

Hi Robert, I definitely overlooked that case. It should be easy to fix by adding a condition to increment the 'IN' case when 'OUT' already has a value. data want ; do until (last.id); set have ; by id; array in (6); array out (6) ; if first.id then i=0; if flag='IN' then do; i=i+1; in(i)=recorded_time; end; if flag='OUT' then do; if i=0 then i=i+1; if out(i)^=. then i=i+1; out(i)=recorded_time; end; end; keep id in: out: ; format in: out: datetime.; run; I'm sorry I didn't think of it in the first reply. Basically, the idea was that Tom's code expected your data to always have, at the very least, one 'IN' before and 'OUT' for each ID. I fixed the array subscript issue but had forget to account for what happens when you read a 'IN' immediately after a 'OUT' as first value. I'm going to think about it some more to make sure I didn't overlook anything with this additionnal fix but it will probably be faster for you to just test the code and come back if I overlooked something. As I put it up, I realized I was still missing out on a case. I reverted back a few things towards Tom's code and changed the way to handle the special case where the first record for an ID is a 'OUT' flag. Reverts are bolded in the code. The new net change from Tom's original code should be the if i=0 then i=i+1; line. Vincent

Vince28_Statcan · ‎08-27-2013

This is because the array iterator begins at i=0 and only iterates to 1 before subscripting the array when the flag for the first record of a by-group (ID) is "IN". It appears as though you have cases in your data where in1=. out1=some date. So basically, SAS attempts to retrive out{0} but array subscripts in SAS go from 1 to dim(arrayname) rather than from 0. Real quick without any testing, I believe you could fix your issue by changing the following segment 484 array out (6) ; /*length of the array has to be adjusted to the maximum number of records per ID in the dataset*/ 485 if first.id then i=0; 486 if flag='IN' then do; 487 i=i+1; 488 in(i)=recorded_time; 489 end; to array out (6); if first.id then i=1; if flag='IN' then do; in(i)=recorded_time; i=i+1; end; Notice the reordering as well as the index starting point changes. At first glance, this logic change on the iterator start shouldn't impact on the segment for flag='out' You may still get subscript out of range if you ever have more than 6 set of data for a given ID. You can use a proc freq by ID and take the max value as a safe bet for your array dimensions if that ever occurs. Vincent

Vince28_Statcan · ‎08-27-2013

Hi, you can use SASHELP.vcolumn to extract the name of all the columns of your dataset in appropriate order into macro variables and then declare multiple arrays for each set of variables. This way, you can do comparisons over a do loop using the same array index (or i and i+1 etc depending on your intentions). A generic approach would be something like this: proc sql noprint; create table orderedcolumns as select libname, memname, name as varname, type as vartype, length as varlen, varnum from sashelp.vcolumn where libname= /* work or whatever other libname you use, in capital letters quoted */ memname= /* your table names, again in capital letters quoted*/ order by libname, memname, varnum ; quit; proc sql noprint; select varname into :favarnames separated by " " from orderedcolumns where libname= and memname= and substr(varname, 1, 2) = "FA" ; select varname into :fbvarnames separated by " " from orderedcolumns where libname= and memname= and substr(varname, 1, 2) = "FB" ; select varname into :fsvarnames separated by " " from orderedcolumns where libname= and memname= and substr(varname, 1, 2) = "FS" ; quit; data want; set have; array FA {*} &favarnames; array FB {*} &fbvarnames; array FS {*} &fsvarnames; do i=1 to (dim(FA)-1); /* to prevent array subscript out of range */ if FA{i} > FA{i+1} then do; /* do work here */ end; end; run; This is essentially what Ballardw said with the addition to how to generate the 3 arrays without having to hard type all variables names in the proper order (assuming your columns are ordered appropriately in your table obviously). Vincent

Vince28_Statcan · ‎08-26-2013

It is because the Do Until. Do; End; groups allow you to manually control iterations within your datastep. Doing so prevents the natural data step iterator from setting all values in the Data Vector to missing prior to reading the next record. As such, his array (7) does not get erased by the data step iterations due to the manually controlled iteration over each BY-group. Since he used variable-naming for the arrays, they generate in1-in7 and out1-out7 variables. I believe, if you remove the keep statement and add an output; statement before the end; of the do until; control, you can see the logic of building the arrays over time. It should help you understand his logic, in my opinion, better than via the putlog statements. This is actually a very clever use of by processing. Had I thought of using the sort by time/id, I would probably still have considered only current & next record leading to a further step with a proc transpose. I'm glad I followed this thread. Thanks Tom Vince

Online Status	Offline
Date Last Visited	‎07-02-2019 05:06 PM

Re: How to import this SDMX-ML data from Statistics Canada in SAS?

Re: Using the XML Mapper Utility

Re: Analysis by row

Re: SAS converting character variables to numeric while exporting to C...

Re: SAS converting character variables to numeric while exporting to C...

Re: If then statement to case statement

Re: using %sysfunc(cat() )

Re: proc contents

Re: Sas merge help

Re: Sas merge help

Re: put statement - format used contained in a variable

Re: Comparing one dataset with another without merging (with the help ...

Re: Comparing one dataset with another without merging (with the help ...

Re: Is it possible to run Excel VBA code using SAS

Re: FORMAT function

Re: Attempt to %GLOBAL a name (NAME) which exists in a local environme...

Re: Unable to export data to local folders (PROC EXPORT in SAS EG)

Re: Make first letter capital only

Re: Removing duplicate pairs i.e keeping only unique values that weren...

Re: Macro error

Re: Converting date from " DATETIME20." format to SAS date format

Re: Converting date from " DATETIME20." format to SAS date format

Re: Converting date from " DATETIME20." format to SAS date format

Re: Identifying Date, Currency and Time variables

Re: Macros and Dates

Re: Macro loop for months

Re: Proc Transpose

Re: Parse a Alphanumeric String against a Table

Re: Proc Lifetest Output : is there a ODS Table for Deciles (10%, 20%,...

Re: Proc Transpose

Re: Proc Transpose

Re: Proc Transpose

Re: Proc Transpose

Re: using datastep variables to reference other columns

Re: Proc Transpose