About FreelanceReinh

FreelanceReinh · ‎11-11-2015

Hi Lainie, Thank you so much. I take it that you had a check box to include that group content and, in fact, now I get a complete list of all my (meanwhile) 25 posts. Great! I'm not quite sure, though, whether the (two) German posts are counted in the number of "Replies" (an item in the "Private Statistics" on my profile page), as this number (23) still differs by two from the total number of posts (while the number of "Topics Started" and all other similar numbers are, correctly, zero). But maybe it's just that the underlying definition of "Reply" is different from what I think it is. In any case, this would be only a very minor inconsistency, so please don't waste your time with that if it is not very easy to correct. Thanks again. Best regards, Reinhard

FreelanceReinh · ‎11-10-2015

There are various ways to do this. Perhaps one of the simplest is to add the following statement after your correct assigment statement: substr(test,11,1)=' '; As you may have noticed, format datetime19. creates a leading blank. To get rid of that, you could do this: test=left(put(datetime(),datetime19.)); substr(test,10,1)=' '; Or this: test=put(datetime(),datetime19.-l); substr(test,10,1)=' ';

FreelanceReinh · ‎11-10-2015

Many thanks to both @kbaughma and @lvm! kbaughma's initial post reminded me of the performance-related system options. I discovered that the MEMSIZE option on my machine was still set to its default of 2147483648 (=2G), which meant that only a small portion of my 14 GB (or 64 GB after deactivating the RAM disk) had been available to SAS. By simply setting MEMSIZE to MAX (during startup) my previously failed PROC MEANS step (see earlier post in this thread) ran without problems. And much more: Now an ordinary PROC SUMMARY was able to cope with randomly generated 640 million observations (4.84 GB dataset) and calculated mean, min and, above all, median within less than 10 minutes -- without forcing me to resort to QMETHOD=P2 and its fluctuating results. After this breakthrough I tried to push the limit even further and found that PROC HPSUMMARY achieved the same with 700 million observations (5.29 GB, 11 minutes, peak physical memory usage at about 53 GB), whereas PROC SUMMARY failed. However, with 720 million obs. the old warning reappeared with either procedure. So, the improvement by PROC HPSUMMARY over PROC SUMMARY -- in single-machine mode! -- in terms of processable numbers of observations was somewhere between 0 and 12.5 percent. There seemed to be no significant difference regarding run time. Of course, in distributed mode a completely different picture is to be expected.

FreelanceReinh · ‎11-09-2015

Isn't it necessary to declare AllDepts as a character variable (assuming that it is not contained in the input dataset)? If I run either of your data steps, my SAS complains about "invalid numeric data" and makes AllDepts a numeric variable. My solution would be: proc sort data=input; by ID count; run; data output; do until(last.ID); set input; length AllDepts $46; /* to be adapted depending on max. length of Dept */ by ID; AllDepts=catx(', ', AllDepts, Dept); end; drop Dept; run; This applies the "DOW loop" technique (cf. http://www2.sas.com/proceedings/sugi28/099-28.pdf). The purpose of the unusual position of the (declarative) LENGTH statement is just to obtain the desired column order.

FreelanceReinh · ‎11-09-2015

Yes, I also ran into an unexpected memory issue recently, with this innocuous PROC MEANS step: proc means data=tmt mean min median; var dttm; run; Log message: "A shortage of memory has caused the quantile computations to terminate prematurely for QMETHOD=OS. ..." Dataset TMT had about 39.5 million observations. This happened on a Windows 7 Pro 64-Bit workstation with an Intel(R) Xeon(TM) E5-1630v3 3.7GHz 10M CPU and 64 GB DDR4-2133 RAM. However, only about 14 GB RAM were available to SAS at that time, because I am using a RAM disk software which combines 50 GB RAM with 100 GB of the 1st 256-GB SSD to form a 150-GB hybrid RAM disk. I am curious whether the issue would still occur if using the full 64 GB of RAM, but haven't tried yet. My first idea would have been to upgrade the RAM of your computer (if this is possible), but lvm's suggestion about PROC HPMIXED sounds very promising.

FreelanceReinh · ‎11-09-2015

Hi Lainie, Many thanks for this useful hint. Unfortunately, it does not work perfectly accurately for me: I have written 20 posts so far (the first one on Oct 31), not counting the one you're reading. When I click on the link "20 Posts" on my "card", as you suggested, I get a list of only 18 posts. (Same, of course, with the "View all" link at the bottom of the list of "Latest Posts".) And there is no second page. I was able to get a complete list of the topics (not the individual posts) I contributed to by searching for "Re:" (luckily, all my posts were answers!) and then restricting the expected huge list (of "about 40,000 discussions") by entering my user name into the field "By author". It turned out that the two formerly missing posts were exactly those which I posted to the German "CoDe SAS" community. Strikingly, in the complete list of my 15 topics, these 2 posts (topics) have a "Group" icon (as if they were a different type of object), whereas the other 13 (corresponding to my 18 English posts) have a "Topic" icon (speech bubble), as shown below: It would be great if the German posts could be included into the list of "all" posts (maybe by ticking some "include ..." checkbox) in order to have the list complete and consistent with the displayed number of posts. Best regards, Reinhard [Edit 2019-04-30: Reattached screenshot, which had been deleted by mistake.]

FreelanceReinh · ‎11-09-2015

Hello DingDing, First of all, you should be aware that there are four different styles of reading raw data with the INPUT statement: column, list, formatted and named input. These are described in the online documentation on the INPUT statement and in more detailed pages which are linked there. (It is even possible to mix two or more of these styles within a single INPUT statement.) Working in different industries during the past 18 years, I have mostly used list or formatted input, less frequently column input and only rarely named input. What you are using in your example is called formatted input. For understanding when and where (so called) column pointer controls such as the "+1" are required, it is important to know how the column pointer (a kind of invisible "cursor") moves across the raw data while the INPUT statement is executed. To quote the documentation on formatted input: "The pointer moves the length that the informat specifies and stops at the next column." Let's see what this means in practice by looking at your example (actually, RW9 did that already, while I was typing this lengthy post): Detailed explanations for your INPUT statement "with +1": At the start, the column pointer is located at the beginning of line 1. As you specified formatted input using informat $16. for reading variable NAME, the informat is applied to the first 16 characters ("columns") of line 1. This is because 16 is the length of that informat. Well, the first 16 characters contain the name "Alicia Grossman" (length=15) followed by a single blank. So, this is stored in variable NAME. (Apparently, the "$16." has been thoughtfully chosen in view of the longest name in the data, "Elizabeth Garcia".) Next is AGE, read with informat 3., hence looking at the next 3 columns (no. 17 - 19). Thanks to the alignment in pumpkin.txt, all of these numbers are read correctly, in particular the "13" in line 1. Now, the column pointer is located at column 20 ("between" columns 19 and 20 if you like), but there is nothing of interest in column 20, only a blank, in all lines of the .txt file. In order to read the next relevant portion of the line (the single character "c" in column 21) into variable TYPE with informat $1., the column pointer must be moved forward by 1 column. This is what the "+1" pointer control does. Same situation after reading TYPE: The column pointer is located between columns 21 and 22, ready to continue reading at column 22. But the length-10 date value "10-28-2012" starts only in column 23! Therefore, the informat MMDDYY10. used for reading variable DATE would look at the wrong 10 columns (namely columns 22 - 31) if we didn't move the pointer again one position to the right (the second "+1")! Having read the date, the pointer rests between columns 32 and 33. Finally, 5 variables are to be read -- all with informat 4.1, which has length 4. Side note: Please note that using a w.d informat such as 4.1 is risky in case that possibly some values do not contain a decimal point. For example, if in your data the value 8.0 was written simply as 8, it would be read as 0.8 without further notice! This is because SAS would regard the rightmost d digits (here: d=1) as decimals. I strongly recommend to use informat 4. instead (and informat w. with appropriate width w in general), because it recognizes decimal points and will not cause this potential error. [End of side note] The content of columns 33 - 36, i.e. the number 7.8 preceded by a blank, is now available for reading, which is suitable content for the numeric variable SCORE1. Similarly, the remaining four blocks of four columns each (37 - 40, 41 - 44, 45 - 48 and 49 - 52) are read into variables SCORE2-SCORE5. The latter being the last variable in the INPUT statement, the pointer now moves to column 1 of line 2 and is ready for reading the next record in the same way. Unlike the human eye, SAS is not at all confused by what looks like a "missing gap" between values 9.5 and 10.0 of "Jose Martinez". The characters " 9.5" and "10.0" belong to distinct blocks of columns: 37 - 40 and 41 - 44, respectively (see above). With formatted input there is no need to separate them. Sometimes the use of pointer controls such as "+1" can be reduced by using correspondingly longer informats. You can see an example of this by comparing your first INPUT statement to that in RW9's first data step: He inserted a "+1" between NAME and AGE, because he reads AGE only with informat 2. rather than 3., so that the blank in column 17 must be skipped. You read this blank column into variable AGE, which does no harm and makes no difference to the stored numeric value. I think, given the above explanations you can see not only why your INPUT statement "without +1" fails, but also exactly in which way it does so. You will see each of the missing and non-missing values in your erroneous result table "TEST" explained (cf. RW9's pertinent explanations), when you consider which columns are read into a certain variable and whether the content of these columns is valid data for that variable (check the SAS log for notes on "Invalid data"). However, I'm wondering if the SCORE3=510 for "Jose Martinez" results from the INPUT statement you quote. I obtain 0.51 instead (which is plausible, because at this point columns 39 - 42 are read into SCORE3 and these contain the characters ".510"). Please note that RW9 uses list input (more precisely: modified list input, namely modified by the informats he assigned by means of an INFORMAT statement) for the SCOREn variables in his second data step. So, this is an example of what I referred to as mixing input styles within a single INPUT statement (here: formatted input and list input). For the list input he needs the blanks between the score columns -- or other delimiters like the comma he uses in his third data step. The latter uses (partially modified) list input only, not formatted input. [Minor edits of wording and formatting done.]

FreelanceReinh · ‎11-09-2015

Sorry for being a pain, RW9, I think you have inadvertently dropped the DLM="," option from your third data step, where it is required since comma is not the default delimiter. Also, the LENGTH statement does not work if you put it after the INPUT statement: at this point the length "has already been set" (to the insufficient default 8), as is pointed out by a warning in the log.

FreelanceReinh · ‎11-09-2015

Hmm, when I test it (in a Windows SAS 9.4 session), the step with dlm="¬" fails with "Invalid data" notes (as it should, since the delimiter in the data is the blank), missing values and observations, but works fine with dlm=" " (and without using the dlm= option). I had assumed the "¬" character was just the result of some strange display issue or should indicate the blank between the quotation marks.

FreelanceReinh · ‎11-09-2015

Apologies, RW9, I overlooked that you certainly meant to read score1-score5 in your data steps. (And thanks for the "Like"! :-)) Moreover, I share DingDing's confusion about the uncommon delimiter "¬". This equals 'AC'x, but I think it should be just a blank ('20'x) to make your second data step work (using modifed list input for the score variables). As this is the default delimiter, dlm=" " would actually be redundant.

FreelanceReinh · ‎11-09-2015

Hello RW9, you may want to insert a LENGTH statement for variable NAME in your third data step. Otherwise, NAME will be truncated to the default 8 characters.

FreelanceReinh · ‎11-07-2015

The question in line 1 is: "... what is the probability 4 will be male". The answer (0.431337...) could be obtained with this one-liner: %put %sysfunc(pdf(HYPE,4,50,40,5)); But in fact I, too, would be more interested in the question you switched to: what the chances are that "4 will be female." (The result is less pleasing, though: only 0.003964...) Did you perhaps intend to calculate PDF('HYPER',4,50,40,5) in the first of the two data steps?

FreelanceReinh · ‎11-07-2015

Try this: data want; set have(rename=(description=d)); length description $1; /* If your real items are not single characters, specify a sufficiently large length! */ do i=1 to countw(d, ','); description=scan(d, i, ','); output; end; drop d i; run;

FreelanceReinh · ‎11-06-2015

What about this: data test; length c $12 d $8; input c $12.; d=put(input(c, anydtdte12.), yymmdd8.); cards; Nov 7, 1948 May 27, 1947 Feb 13, 1953 ; Or a bit shorter (if your dates are raw data): data test; length d $8; input c anydtdte12.; d=put(c, yymmdd8.); cards; Nov 7, 1948 May 27, 1947 Feb 13, 1953 ;

FreelanceReinh · ‎11-06-2015

I think the GCHART procedure (requires SAS/GRAPH) allows for more flexibility regarding the bins of a histogram. Example: data dataset; do _n_=1 to 200; var1=2500*ranuni(31416); output; end; run; proc gchart data=dataset; format var1 hist_fmt.; vbar var1 / midpoints=100 to 2100 by 200; run; quit; Even without the FORMAT statement the resulting histogram would group the "extreme" values (>=2000) into the rightmost bin, but the label of that bin would then be displayed as "2100" rather than ">2000". Please note that in the above example the exact value 2000 would be assigned to the ">2000" category. If this was an issue, you could extend your format definition to cover the whole range of possible VAR1 values and then use the DISCRETE option of the VBAR statement. Thus, the bins would correspond 1:1 to the format categories. In particular, only values >2000 would go into the ">2000" bin. Example: proc format; value hist_fmt 0<- 500 = ' 0<- 500' 500<- 1000 = ' 500<-1000' 1000<- 1500 = '1000<-1500' 1500<- 2000 = '1500<-2000' 2000<- high = '>2000' ; run; proc gchart data=dataset; format var1 hist_fmt.; vbar var1 / discrete; run; quit; A warning about "not evenly spaced" intervals may be written to the log. There is an older thread about this warning: https://communities.sas.com/t5/SAS-GRAPH-and-ODS-Graphics/Suppress-WARNING-The-intervals-on-the-axis-labeled-x-are-not/td-p/194265 However, with the extended format you could also use PROC SGPLOT: proc sgplot data=dataset; format var1 hist_fmt.; vbar var1; run;

Re: How to Reg on each row?! with Slope/Intercept saved out?!

Re: How to use a macro variable in a if else condition

Re: How to use a macro variable in a if else condition

Re: problem with where clause on numeric

Re: INTCK Question

Re: How to tell macro variable created or not inside PROC SQL?!

Re: INPUT not converting character to numeric

Re: modify xaxis with different ranges (some very close to 1.xx and ot...

Re: modify xaxis with different ranges (some very close to 1.xx and ot...

Re: Proc Optmodel - output

Re: VALIDVARNAME=V7

Re: problem with ODS in SAS EG 8.3

Re: is there a minimum file size for .sas7bdat files?

Re: is there a minimum file size for .sas7bdat files?

Re: ods pdf and gmap: PDF output different than EG

Re: How to use a macro variable in a if else condition

Re: INTCK Question

Re: How to tell macro variable created or not inside PROC SQL?!

Re: modify xaxis with different ranges (some very close to 1.xx and ot...

Re: IF statement not working consistently

Re: where I can find my old posts?

Re: Convert Datetime to ddMONyyyybhh:mm:ss format

Re: Proc Mix insufficient memory issue

Re: Collapsing and concatenating rows in data set

Re: Proc Mix insufficient memory issue

Re: where I can find my old posts?

Re: how to do I know when to skip a column in the "input" statement?

Re: how to do I know when to skip a column in the "input" statement?

Re: how to do I know when to skip a column in the "input" statement?

Re: how to do I know when to skip a column in the "input" statement?

Re: how to do I know when to skip a column in the "input" statement?

Re: An easy Example of Hypergeometric Distribution

Re: How to divide the variable into rows

Re: date conversion

Re: group extreme values in a single bin in a histogram

SAS Analytics Explorers

CoDe SAS German