About snoopy369

snoopy369 · ‎10-11-2017

I'll echo the advice: do _not_ do this the way you're suggesting. 150 or so reads from this dataset, and then writes of ~1GB data files, and then (further processing), will be slow, slow, slow. What else can you do? Well, you could turn this into a vertical dataset, meaning ID + variable name/key + variable value. This will be a net-larger dataset (maybe 3 times as large, at most). But you can now use BY group processing to perform your analysis, and the increase in size may not be significant. You can also use views to process your work, which at least avoids the step of writing out the 150 1GB datasets. You still do 150 reads of the dataset, which is still slow, but perhaps it's not that bad - depending on what you're doing after this it's possible you're not going to have that much of an issue here. Ultimately any better advice needs more details as to what you're doing next.

snoopy369 · ‎10-06-2017

That's because it is initialized to zero, as the first answer noted. So it's never missing unless it's set explicitly to missing. data _null_; put n=; if n=. then n=-5; n+1; put n=; stop; run; Note the log contents. It starts out at *zero*.

snoopy369 · ‎10-06-2017

+ is the Sum Operator; if you think of it that way, you will remember half of the reason for this at least! + does two things: * Retain the variable on the left side of the operator * Sum the two sides of the operator, just as the SUM function does Remember, the SUM function considers missing values to be zero when added to a nonmissing value. This operator works the same way.

snoopy369 · ‎08-25-2017

You can trivially calculate them by doing a PROC FREQ, then merge them back to the main table along with the targets. data have; call streaminit(7); do _n_=1 to 2000; if rand('uniform')<=.48 then age='18-35'; else age='35+'; if rand('uniform')<=.45 then sex='M '; else sex='F'; output; end; run; proc freq data=have; tables age*sex/out=age_sex_totals; run; data targets; length age $5 sex $1; input age $ sex $ target; datalines; 18-35 F .15 18-35 M .15 35+ F .35 35+ M .35 ;;;; run; proc sort data=targets; by age sex; run; data want_wts; merge age_sex_totals targets; by age sex; weight = 100*target/percent; run; proc sort data=have; by age sex; run; data want; merge have want_wts; by age sex; run; proc freq data=want; weight weight; tables age*sex; run; For more complicated cases, there are macros online (RAKINGE is the one I typically use).

snoopy369 · ‎06-23-2017

The answer to this depends on the exact details. In particular, there are probably two or three reasonable options here - depending on your system, the datasets, etc. If the many datasets with 800+ variables are either: a) Identical in metadata [variable names, types, lengths] or b) Have a combination of identical variable names/types/lengths and non-overlapping variable names Then there is a simple solution, where you have one data step and run each dataset through a loop. --- If your datasets are nonidentical and have overlapping variable names that are not identically typed, then you can't easily do that. You might be able to if you use some interesting renaming; I've done that, where I renamed every variable (through a macro) to DS_<var>, where DS was a prefix for that dataset or even the name of the dataset if sufficiently short. It's just ... messy, and error-prone. You can use DOSUBL to generate data steps inside your main data step to do the matching work, and allow persistent hash tables, but that's quite slow and probably not faster than just reloading the data sets into hash tables. --- Third, you could consider formats instead of hash tables. Formats are persistent and quite fast to lookup, and if you're only writing a couple of them per dataset, not slow to load either. They do sometimes have performance issues with high numbers of formats, but in a server environment you might be okay there. --- Fourth, another probably-slow-but-maybe-worth-checking option is PROC DS2. That would let you have persistent hash tables, I believe, and process each of your datasets. DS2 tends to be slow, though, on simple data reads. If you can give us a bit more information as to your problems' full scope, we can probably help you find the specific solution that's most appropriate.

snoopy369 · ‎06-15-2017

Thanks - that was a consideration as well, trying to avoid two PROC COMPAREs of course but since we're not doing value comparisons it's not exactly expensive. Thanks for the suggestion!

snoopy369 · ‎06-14-2017

Not sure what it's supposed to be hinting at? That produces "Compare" and "Compare 2" as sheet names, which isn't what I want; I want to individually name those two tabs.

snoopy369 · ‎06-13-2017

I'm trying to use ODS EXCEL to do the following: 1. Print something, and put that on a sheet named something specific. 2. Run a PROC COMPARE, and make two sheets, the main Dataset sheet and the Variables comparison, and name them something intelligent. Either of these is not a problem separately; I like the automatic names you get from SAS by default for 2, and for 1 I know how to use SHEET_NAME. But when I do them - in this order - it fails. data class; set sashelp.class; drop name; run; ods excel file="e:\temp\test.xlsx" options(sheet_name="test" sheet_interval="table"); proc print data=sashelp.class; run; ods exclude compareSummary; title1 "STEP 1: Compare contents of files"; proc compare base=sashelp.class compare=class listvar novalues ; run; ods excel close; I end up with "test 2" and "test 3" as sheet names, which is no good. I can use options(sheet_name=' ') and get "Sheet 1" and "Sheet 2", which is not really that much better. If I use Sheet_Label I don't get much better; obviously something sees that sheet_name has _ever_ been set, and refuses to forget that it was. I have a workaround using PROC DOCUMENT that works fine, but I'm curious if there's a more integrated way to do this. Something like ODS PROCLABEL but for the tables maybe? Thanks. data class; set sashelp.class; drop name; run; ods excel file="e:\temp\test.xlsx" options(sheet_name="test" sheet_interval="table"); proc print data=sashelp.class; run; ods document name=conts; ods exclude compareSummary; ods excel exclude all; title1 "STEP 1: Compare contents of files"; proc compare base=sashelp.class compare=class listvar novalues ; run; ods document close; ods exclude none; proc document name=conts; ods excel options(sheet_name="Compare Datasets"); replay \Compare#1\CompareDatasets#1; run; ods excel options(sheet_name="Compare Variables"); replay \Compare#1\CompareVariables#1; run; quit; ods excel close;

snoopy369 · ‎05-31-2017

I hope you at least bring up the issue here (that this doesn't really make sense). If you worked for me and didn't, I'd consider it a major negative in your review. Of course it may well be for someone else to say that it does make sense, but that's up to them once you point it out. Simply hiding behind 'it's not my job' is not a sign of a good employee. Of course if they're evaluating the change in (whatever) since 1/1/1900, then this is fine. But that's an important detail, don't you think? And if that's what they're doing, then Kurt's answer is the correct one. In fact you might ask whether they want the actual change since 1/1/1900, or the Excel-incorrect value, which will be slightly different - again as Kurt points out, Excel incorrectly considers 2/29/1900 a valid date, so it counts one more day in the period 1/1/1900->today than it should. It may well be that replicating the bug is desired - that replicability is the key here and not accuracy - but I'd definitely want that documented as a specific requirement (the 2/29/1900 day being included) so that in 5 years when my successor asks why we're subtracting 12/31/1899 the documentation explains why - and who made the decision.

snoopy369 · ‎05-10-2017

Perfect, thank you! Edit: @ChrisD Email replies seem a bit buggy... both of them double-posted and include the ##- Please Type Your Reply Above ... text...

snoopy369 · ‎05-10-2017

I think if that last sentence of the first paragraph were in the main interval page it would be perfect (with the added detail you imply in the editorial sentence of course a bonus) 🙂 Unfortunately that particular page isn't one that came up in my first glance at the topic, though perhaps it would have eventually... Thanks! ##- Please type your reply above this line. Simple formatting, no attachments. -##

snoopy369 · ‎05-10-2017

Thanks for the detailed explanation, Tammy. Any chance the documentation could have a little extra bit added to it to make this more clear? I figured it out basically by guessing, but at least the way I read the documentation it's not explicit that holes in the begin/end rows will be treated this way. Thanks!

snoopy369 · ‎05-10-2017

I don't think it's END specifically (though I would obviously defer to Chris and Rick and company since they've got the inside scoop). What I think is happening is that with END specified the way Art did - there are days that are not in _any_ interval. While I would implement this differently (to tolerate that), obviously it wasn't. Reeza's suggestion of dropping END must mean that SAS assumes the END is the _next_ begin; so that works. But you could also make this work directly. Here's the simpler (weekday only, ignoring holiday) version, though Art's could be nearly identically modified. data work.an_interval; format begin end date9.; do begin = '01JAN2010'd to '01JAN2018'd; end=begin; do while (weekday(end) in (1,7)); end=end+1; end; output; begin=end; end; run; options intervalds=(testint=work.an_interval); data test; days=intck('testint','1jan2017'd,'31jan2017'd)+1; run; That works fine (while the BEGIN/END always equal solution doesn't, since 1/1/17 was a weekend). Of course, I'm not sure what the value of END is then, since you need to have complete coverage; perhaps it lets you build intervals that intentionally exclude dates that shouldn't be considered valid inputs to the program?

snoopy369 · ‎04-03-2017

COUNT can be coerced to do this, if your data is agreeable. I'm not sure it's better than the loop operation timewise or structurally, but it's at least an interesting solution. Basically we delimit the data by ; including starting and ending it with the delimiter, then count the number of ;0; , and subtract from the total. data _null_; call streaminit(7); array a[20]; do _i = 1 to 20; a[_i] = max(0,rand('Uniform')-0.3); end; put a[*]=; nonzero = dim(a) - count(';'||catx(';',of a[*])||';',';0;'); put nonzero=; run; (This was cross-posted to/from Stack Overflow: http://stackoverflow.com/questions/43184156/sas-count-within-arrays)

snoopy369 · ‎03-14-2017

If you don't have a new enough version of SAS to use SEGLABEL, there are several options - annotate, GTL, High-Low plots, etc. I cover some in my paper Labelling Without the Hassle: http://www.mwsug.org/proceedings/2014/DV/MWSUG-2014-DV04.pdf

Online Status	Offline
Date Last Visited	‎10-10-2025 01:59 PM

Re: What is the state of SASGF 2022?

Re: Installing add-on package

Re: Help with ods output statement

Re: drill down from visual SAS Visual analytics

Applying control-based filters to subset of data

Re: EG 7.1 code lost during update

Re: Is there a way to use FIRST and LAST options within an IF THEN sta...

Re: Conexão SAS com Python para executar uma rotina

Re: How to transpose many columns to one

Re: Using a SAS Dataset instead of an External file in this interestin...

Re: Help with ods output statement

Re: What is the state of SASGF 2022?

Re: Editing the legend in a map

Editing the legend in a map

Make SAS Studio Log Window Honor the Alt Key

Re: Finding a value in all datasets in all variables

Re: Installing add-on package

Re: proc sgpanel x-axis with multiple scales

Re: Help with ods output statement

Re: Viya 3.5 on Ubuntu 18?

Git and SAS®: A Match Made in (SAS®) Studio

Re: Proc SQL vs Data step advice

Re: what happens when one increment a variable without "="

Re: what happens when one increment a variable without "="

Re: Rim weighting in sas

Re: hash joins with multiple fact tables

Re: ODS EXCEL sheet naming with multiple tables from a proc

Re: ODS EXCEL sheet naming with multiple tables from a proc

ODS EXCEL sheet naming with multiple tables from a proc

Re: SAS Date/Balance Sum

Re: Why isn't intervalds working with intck?

Re: Why isn't intervalds working with intck?

Re: Why isn't intervalds working with intck?

Re: Why isn't intervalds working with intck?

Re: Count within arrays

Re: How to get data labels for all bars if i select grouping style is ...

SAS Global Forum 2018

SAS Global Forum