About 1162

1162 · ‎10-22-2008

I'm using Proc SurveySelect to select samples from a population. I want to select 1,300 samples with replacement. I also have four categories and I want to weight the categories so that more samples are selected from category 1 than category 4. I've selected the PPS_WR method and created a variable that contains the category weights. But I have one other criteria and I can't seem to find an option for it. Although I want to sample with replacement, I don't want any individual selected more than 4 times. In essence, I want a ceiling placed on the sampling with replacement. I'm already using the SIZE statement to specify the weight for each category. Any suggestions?

1162 · ‎10-10-2008

I would start with something like this. Keep in mind that there's only one way to get all the observations and also be unique at the zip+naics level and that is if your first two tables only have one observation per zip+naics level and your second two tables only have one observatikon per zip level. [pre] proc sql; create table all as select * from year05d a, year06d b, year05t c, year06t d, look-up table e where a.zip = b.zip and a.naics = b.naics and c.zip = d.zip and c.zip = a.zip and a.naics = e.naics; quit; [/pre] Message was edited by: 1162

1162 · ‎10-10-2008

Did you set the alpha level in the PROC MIXED statement or in the LSMEANS statement? You can set it in both places.

1162 · ‎09-26-2008

According to your error message, you are running out of disk space, but I bet it has nothing to do with your external hard-drive. Although you're pointing to OUTLIB. as your destination, SAS doesn't actually write the file to that destination until the processing is complete. While SAS is processing the file, a temporary file is created. This temporary file is probably going onto your local C: drive. Try looking in Documents & Settings\ ... \ Local Settings \ Temp \ SAS Temporary Files. This is probably where you'll see a temporary file being generated while SAS is processing. As far as the solution, I think there are ways to point to another location for these temporary files. Another solution might be to break the job into 'chunks', process each of these 'chunks' and then append them together at the end.

1162 · ‎09-26-2008

In the REPEATED statement, there is an option called TYPE= that allows you to specify the covariance matrix. Try compound symmetry (CS) or autoregressive 1 (AR(1)). Autoregressive may suit you better since the correlation between two timepoints could be considered higher if the timepoints are closer together.

1162 · ‎09-26-2008

It's been a while since I've done a repeated measures analysis, but I suspect you have to specify the covariance matrix in your RANDOM statement. Try looking at the TYPE= option. Maybe the UN@AR(1) structure?

1162 · ‎09-26-2008

"It seems that SAS has to scan the whole SAS file in order to pick the few relevant fields (running the same code with fewer variables in the table decreases time immensely)." Have you tried creating an index on your SAS dataset? That might speed things up. PROC DATASETS LIBRARY=libname NOLIST; MODIFY dataset; INDEX CREATE varname; RUN; QUIT;

1162 · ‎09-23-2008

Looks like I answered my own question . . . I can refer to them as '0D'x and '0A'x in the TRANSLATE function.

1162 · ‎09-23-2008

I'm using SAS to extract data from a database. Some of the fields I'm grabbing are text fields that have newline and carriage return characters embedded. I tried to use TRANWRD to remove these characters, but I may not be doing it correctly. When I import this data into MS Excel, I see that the special characters appear as '\x0a\x0d'. SAS displays this as a pair of boxes. What is the proper way to remove these newline and carriage return characters from a text field?

1162 · ‎08-27-2008

Cynthia, Thanks for your reply. A combination of your suggestions gave me exactly what I wanted. If I sort my dataset and use FIRST.STORE, I can set a flag for each new store. Then I just count the flags instead of the stores in the report. Here's my solution: [pre] data tmp; input Store :$1. Day :$2. Visitors :8.; cards; A 01 123 A 02 234 A 03 345 B 01 987 B 02 876 B 03 765 ; run; proc sort data=tmp; by store; run; data tmp; set tmp; by store; if first.store then Distinct = 1; run; proc report data=tmp nowindows; columns Distinct,N Visitors,Sum; run; [/pre]

1162 · ‎08-27-2008

By the way, I think the CASE statement is specific to PROC SQL and doesn't work in DATA step processing (as you've already discovered).

1162 · ‎08-27-2008

Thanks for the clarification. I think I understand the issue better. I think your problem is similar to one I ran into and has to do with whether the query is actually being processed on the database server or on your SAS machine. When you run the query with the actual key value, it runs quickly because the entire query is passed to the database where it is processed and only the relevant records are passed back to SAS which then puts the results into a table. When you run the query with a subquery that uses an existing SAS dataset, that query can't be entirely passed to the database because of the reference to the SAS dataset. In this case, the database sends back the entire table B (all 20 million rows) and SAS does the processing of subsetting this table with the SAS dataset created in query A. One way you can see this is by watching the temporary file that is being created while the query is running. You'll probably see it grow to some very large size, and then near the end it will be replaced with a smaller file representing the merge of table B with IPJOIN. Subqueries work efficiently when they can be passed entirely to the database for processing. Whenever possible, I don't recommend using SAS datasets in subqueries to databases unless the database tables are small. On the other hand, I've found that using macro variables in the query doesn't hurt performance. Can you try this query? It uses the query you used to build IPJOIN as the subquery for IPBET, but doesn't use any SAS datasets. The side benefit is that your programming will be more efficient: one less query to the database and one less SAS dataset created. I have a feeling this will run faster. Let us know how it works. [pre] PROC SQL; CREATE TABLE IPBET AS SELECT client_key, ip_addr FROM tableB WHERE client_key IN (SELECT DISTINCT client_key FROM tableA WHERE acct_no = &acctno); QUIT; [/pre]

1162 · ‎08-25-2008

If I'm reading the two queries correctly, it looks to me like query A is looking for a single account numbers while query B is looking for all account numbers. In query A, is the macro variable &acctno a single number (I think it has to be)? It doesn't look like query B has any filters that would result in a single account number being returned from the subquery. Depending on the number of accounts, this could be the difference between returning 2,000 rows and 20,000,000 rows. If this isn't the issue, I've always found it useful to try the same query in another application. This can sometimes narrow down whether the difference is due to the query or due to the application. I've run into situations where queries run much quicker in another application than in SAS (and vice versa).

1162 · ‎08-20-2008

The issues are a little odd (in my opinion), but I'll try to explain. I'm querying a Sybase database. The database has a query restriction of 5 minutes which the administrator won't budge on. Their solution to larger queries is to have me write the output to a flat file which I can import into SAS. Somehow, this output to a flat file does not have the time restriction, but the code is written to run on a Sybase application, not SAS. The query is the same except that these temporary options are defined. The problem is that I can't incorporate their solution into an automated script. My goal was to try to set the options in PROC SQL statement so that I could generate the flat file and then automatically import it, all within SAS.

1162 · ‎08-20-2008

In SQL, you can use count(distinct X) to report on the number of distinct records. I'm wondering if something similar can be done with PROC REPORT. In the example below, there are two stores which I can report using PROC SQL, but PROC REPORT makes it look like there were six stores. Any idea how I can make PROC REPORT print the number of distinct stores? The easy answer is to just use SQL, but in reality, I'm producing a much more complex table and I would really like to use the PROC REPORT procedure if at all possible. [pre] data tmp; input Store :$1. Day :$2. Visitors :8.; cards; A 01 123 A 02 234 A 03 345 B 01 987 B 02 876 B 03 765 ; run; [/pre] [pre] proc sql; select count(distinct Store) "Number of Stores", sum(Visitors) "Visitors" from tmp; quit; [/pre] [pre] proc report data=tmp nowindows; columns Store,N Visitors,Sum; run; [/pre]

Online Status	Offline
Date Last Visited	‎09-01-2015 07:11 AM

Re: proc surveyfreq

Re: Leaving a space between two datasets

Re: Reading data file but only bring in numeric values

Re: Problem using CARDS statement with tab delimited data

Problem using CARDS statement with tab delimited data

Layering Plot, Legend, and Annotate Info

Re: SUm By Dates

Re: SUm By Dates

Re: Calculating Percentiles

Calculating Percentiles

Proc SurveySelect

Re: Proc sql join

Re: alpha values in PROC MIXED

Re: Out of Resource Problems

Re: Repeated measure ANOVA using PROC MIXED

Re: Covariance structure and proc MIXED

Re: SAS Sql subquery performance problem

Re: Removing Special Characters

Removing Special Characters

Re: Count Distinct in Proc Report?

Re: Regarding CASE statement in BASE SAS

Re: SAS Sql subquery performance problem

Re: SAS Sql subquery performance problem

Re: Proc SQL and options

Count Distinct in Proc Report?