About daradanye

daradanye · ‎05-05-2024

Hello, I have a dataset that I need to calculate the rolling average and sum. The file is attached. What I intend to do here is to calculate rolling average/sum of variables from -380 days to -20 days (I also require at least 50 non missing observations). The code is below. However, it seems that there are some very large numbers in the output files, starting from around row 800. The ouput number is not consistent with my manual calculation. I guess it might be due to the missing observations. I would like to know what I missed here. Any help is appreciated. proc expand data = test2 out = test2; by permno; id date; convert logret = logret_new/transformout=(lag 20 movsum 360 trim 70); convert turnover = turnover_new/transformout=(lag 20 movave 360 trim 70); convert ret = std /transformout=(lag 20 movstd 360 trim 70); run;

daradanye · ‎04-10-2024

Thanks for the reply. Following your suggestions, I did the following things. But it seems that it still does not work. data sg_to_match; set sg_to_match; if atq =. then delete; run; proc sql; create table sg_to_match as select *,sum(treat) as sumtreat, sum(control) as sumcontrol from sg_to_match group by fips1, visitday order by fips1, visitday; quit; data sg_to_match; set sg_to_match; if sumtreat = 0 then delete; if sumcontrol = 0 then delete; run;

daradanye · ‎04-10-2024

Hi, I would like to do psmatch and also would like to have treatment and control on the same day and county without replacement. Below is my code: proc psmatch data=sg_to_match region=cs; by fips1 visitday; class treat; psmodel treat = atq ; match method=optimal(k=1) caliper=1; output out(obs=match)=Outgs lps=_Lps matchid=_MatchID; run; But I got below error message: WARNING: The maximum likelihood estimates for the logistic regression model might not exist. The maximum likelihood estimates are based on the last maximum likelihood iteration. ERROR: Floating Point Zero Divide. ERROR: Termination due to Floating Point Exception NOTE: The SAS System stopped processing this step because of errors. WARNING: The data set WORK.OUTGS may be incomplete. When this step was stopped there were 0 observations and 13 variables. WARNING: Data set WORK.OUTGS was not replaced because this step was stopped. Is it because there might not be variations of treat/control in some county day? Is there any way I can automatically ignore those county days? Thanks!

daradanye · ‎12-15-2023

Hi, I'd like to use proc expand to generate a moving maxium number between lagged 3 and lagged windows. For example, the data I have is like this: id year amt 1 1991 1 1 1992 2 1 1993 3 1 1994 4 1 1995 5 1 1996 6 1 1997 7 1 1998 8 1 1999 9 I want to generate a new column maxamt, equal to max(lagged 5, lagged 4, lagged 3) like this: id year amt maxamt 1 1991 1 1 1992 2 1 1993 3 1 1994 4 1 1995 5 1 1996 6 3 1 1997 7 4 1 1998 8 5 1 1999 9 6 Can it be realized through proc expand? Thanks!

daradanye · ‎09-14-2023

I need to send the dataset to my colleague. With proper order, it is easier for him to handle. He is handling the dataset in different software.

daradanye · ‎09-14-2023

Thanks! One thing I forgot to mention is that: the length of some string variables are different in a and b (the string length is longer in a). For this code, I assume that the length of variables in B will be kept. Is there any way to keep the length of variables in dataset a but follow the order in dataset b? Thanks.

daradanye · ‎09-14-2023

Thanks! One thing I forgot to mention is that: the length of some string variables are different in a and b (the string length is longer in a). For this code, I assume that the length of variables in B will be kept. Is there any way to keep the length of variables in dataset a but follow the order in dataset b? Thanks.

daradanye · ‎09-13-2023

Hi, I have a question about variable reordering. I have one dataset that is called a and another dataset called b. The variables in dataset a are exactly the same as dataset B but the ordering is different. I would like to reorder the variables in dataset A based on the order in dataset b. I know that retain can be used to reorder the variables. But I have many variables, using retain to order it one by one is not possible. Thanks in advance for the help!

daradanye · ‎09-22-2022

Hi, I have a dataset like this: ID1 ID2 33567 23258:1， 33567:4， 55765:10 11267 20135:2，55367:5， 54765:1 What I want to do is: (1) determine if ID2 contains ID1; (2) if so, extract the number after it. Following is what I want: ID1 ID2 Num 33567 23258:1， 33567:4， 55765:10 4 11267 20135:2，55367:5， 54765:1 0 It seems that scan and index cannot work directly here. That would be great if someone can help here. Thanks.

daradanye · ‎09-19-2022

Hi, I have many tsv files to be imported in the SAS. The files are located in different subfolders named after numbers and underscore. For example, under the subfolder 6500_317, there are five tsv files: 8491_2020, 8611_2020, 5491_2020, 7453_2020, 2312_2020 I am wondering how I can import all files in all subfolders into the SAS and name after as data_tsvfilename. I will appreciate it very much if someone can help here. Thanks!

daradanye · ‎09-11-2022

Hello, I have a number of excel files named as file_a, file_b until file_h. I'd like to use a macro do loop to import the files. Following the code in this link I write the following code. I also tried to add % before proc import command and it does not work either. That would be great if someone can help here. Thanks. %let list = a b c d e f g h; %local i next_name; %let i=1; %do %while (%scan(&list, &i) ne ); %let next_name = %scan(&list, &i); proc import out=jobpost&next_name datafile="H:\bdata\file_&next_name..csv" dbms='' replace; run; %let i = %eval(&i + 1); %end;

daradanye · ‎08-04-2022

The /scratch/ is the root for the path. I have several steps before to import data from a folder in scratch and it works.

daradanye · ‎08-04-2022

Hi, I am running a SAS program on a cloud (which I believe is a unix/linux system). Basically, I want to import and clean every cvs file in a directory. All the files start with log. I find this support working in my local computer: http://support.sas.com/kb/41/880.html However, when I run it in the cloud, it does not work. My code in the cloud is as follows: filename DIRLIST pipe 'dir "/scratch/dg/log/SAS/log*.csv" /b '; data dirlist ; infile dirlist lrecl=200 truncover; input file_name $100.; run; data edgar.dirlist;set dirlist;run; data _null_; set dirlist end=end; count+1; call symputx('read'||put(count,4.-l),cats('/scratch/dg/log/SAS/',file_name)); call symputx('dset'||put(count,4.-l),scan(file_name,1,'.')); if end then call symputx('max',count); run; options mprint symbolgen; %macro readin; %do i=1 %to &max; data seclog; infile "&&read&i" delimiter = ',' MISSOVER DSD lrecl=32767 firstobs=2 ; informat ip $15. ; informat date yymmdd10. ; informat time anydtdtm40. ; format ip $15. ; format date yymmdd10. ; format time datetime. ; input ip $ date time ; run; %end; %mend readin; %readin; run; The error message is as follow: MPRINT(READIN): infile "/scratch/dg/log/SAS/dir: cannot access '/scratch/dg/log/SAS/log*.csv': No such file or directory" delimiter = ',' MISSOVER DSD lrecl=32767 firstobs=2 ; When I opened the dirlist and I did find that there are two variables: The variable name is file_name. But the value of the variable is dir: cannot access '/scratch/dg/log/SAS/log*.csv': No such file or directory I would appreciate it very much if someone can help here.

daradanye · ‎03-14-2022

Hi, I want to expand data from this: ID startdate enddate num 1 2/10/2010 2/16/2010 [0,1,0,2,4,7,0] 2 3/11/2010 3/17/2010 [0,1,3,0,1,2,0] to this: ID date num 1 2/10/2010 0 1 2/11/2010 1 1 2/12/2010 0 1 2/13/2010 2 1 2/14/2010 4 1 2/15/2010 7 1 2/16/2010 0 2 3/11/2010 0 2 3/12/2010 1 2 3/13/2010 3 2 3/14/2010 0 2 3/15/2010 1 2 3/16/2010 2 2 3/17/2010 0 I tried to look up proc expand but it seems not working in this example. That would be great if someone can give me some suggestions. Thanks.

daradanye · ‎05-03-2021

Hi, I am working on something like the following. For a dataset, the original dataset looks like this: Group Name A Cox & Wilson, PC, CPA A Cox & Wilson, PC, CPAs A Cox & Wilson, PC, CPAs A Jonathan D. Liner A Jonathan Liner B Memphis Light, Gas & Water B Memphis Light, Gas & Water B Memphis, Light, Gas and Water C Homer Electric Assn C Homer Electric Association C Homer Electric Association What I want to do is to categorize similar names within each group. The dataset I want looks like this: Group Name NameGroup A Cox & Wilson, PC, CPA 1 A Cox & Wilson, PC, CPAs 1 A Cox & Wilson, PC, CPAs 1 A Jonathan D. Liner 2 A Jonathan Liner 2 B Memphis Light, Gas & Water 3 B Memphis Light, Gas & Water 3 B Memphis, Light, Gas and Water 3 C Homer Electric Assn 4 C Homer Electric Association 4 C Homer Electric Association 4 My initial thought is to compute string distance (Levenshtein) between each pair of two words and then use some cluster methods. Then I realized there might be some difficulties: 1. First, I have a lot of observations. Computing each pair even within the group can be time-consuming. 2. The cluster method I am familiar with is K-means. However, K-means requires prespecifying number of groups. I am wondering if there are any cluster methods that work better for this problem. I am wondering if there are any more convenient ways to do that. That would be great if someone can help out here.

Online Status	Offline
Date Last Visited	‎05-09-2024 03:11 PM

PROC EXPAND wierd output

Re: psmatch by group

psmatch by group

Proc expand maxium number between lagged 3 and lagged 5 windows

Re: Reorde variables in one dataset based on another dataset

Re: Reorde variables in one dataset based on another dataset

Re: Reorde variables in one dataset based on another dataset

Reorde variables in one dataset based on another dataset

Extract value of a string after a specific string

Import all TSV files in all subfolders

Re: Remove leading and trailing zeros from character field

Re: Cluster similar strings into group

Re: Cluster similar strings into group

Re: Import txt two delimiters, with one delimiter is line break

Re: Import txt two delimiters, with one delimiter is line break

Extract value of a string after a specific string

PROC EXPAND wierd output

Re: psmatch by group

psmatch by group

Proc expand maxium number between lagged 3 and lagged 5 windows

Re: Reorde variables in one dataset based on another dataset

Re: Reorde variables in one dataset based on another dataset

Re: Reorde variables in one dataset based on another dataset

Reorde variables in one dataset based on another dataset

Extract value of a string after a specific string

Import all TSV files in all subfolders

loop import csv file

Re: Retrieve and clean all files in a directory one by one

Retrieve and clean all files in a directory one by one

Expand data

Cluster similar strings into group