About Eileen1496

Eileen1496 · ‎08-19-2024

@Tom @PaigeMiller Hi Tom and Paige, thanks for all the comments. It took me a while to reply because I was suspecting the SAS somehow enter a freeze state since my code is only submitted but not run. So I restarted SAS and it took me a while to rerun the previous temp data sets. Then my remote computer had some problems that were fixed until now. I add the debug option at the beginning and rerun the nested loop, turns out it works as expected! But the debug line is very helpful for me.

Eileen1496 · ‎04-17-2024

I don't have this line, but before it works fine. What is the function of this line? without it, the program may still run but not write log? Can you point to me which windows event I should look for? Thanks a lot!

Eileen1496 · ‎03-30-2024

Hi Ahmed, Thanks for the thoughts, I don't know the thing about memory to each task thing before. I guess I submit 99 task with 99 big datasets, and I do not have enough memory, so they start to wait in line. Thanks!

SASKiwi · ‎03-29-2024

Please post your complete SAS log, not just the errors, so we can see the code you are running. It looks like you haven't defined TASK10 anywhere. I'd expect to see a %LET statement defining it like so - note replace MySASComputeServer with the actual name of your server: %let task10 = MySASComputerServer; signon task10;

Eileen1496 · ‎03-03-2024

These are really helpful (and for me easy to understand check)! Thanks!

SASKiwi · ‎03-03-2024

@Eileen1496 - SAS licenses are not necessarily based on number of cores any more. Consult your SAS administrator, SAS software owner or SAS account manager for more details.

Eileen1496 · ‎03-03-2024

8292402Hi Paige, and everyone sorry I do not know how to reply in a way that everyone can see. Essentially what I want to do is to identify the users who stay in one firm for less than one year (so my data is career history data). data user10_ind0_tag; set user10_ind0; start_date_dt = input(startdate, yymmdd10.); end_date_dt = input(enddate, yymmdd10.); enddate_imputed = coalesce(end_date_dt, input('2022-12-31', yymmdd10.)); duration_in_days = intck('DAY', start_date_dt, enddate_imputed); miss_end = missing(enddate); miss_start = missing(startdate); turnover = (duration_in_days < 365); run; I have 16 files, each of it is 40GB. each of it has around 82924025 rows, and i have the following columns: uid, pid, company name raw, company url, company cleaned name, company priname, company name, ultimate company name (for various company name, i will just keep raw name, cleaned name, and ultimate company name), location raw, region, country, state, mas, startdate, enddate, jobtitle raw, mapped_role, job category, role_k150,role_k500,rolek1000, code1, code 2 (used to linked to external data), ticker, wexchange, naics, naics_desc, rcid, frcid, senority, rn, salary. I would like to keep even the raw variable to verify the data accuracy, because sometimes it has one worker write their job as independent company, which means they are self-employeed, but the data provider put this guy in a company called independent inc. I keep the raw in order to identify such cases. I try to cut into small pieces because the other user on the server is taken 70%-98% of the memory, I felt sometimes my code is not processing because my data cannot be read into the remaining memory, thus I try to make them smaller. I appreciate all your help further.

mkeintz · ‎12-15-2023

You didn't provide sample data in the form of a working data step, so this program isn't (fully) tested: data want (drop=_: rcid_: frcid_:); id=_n_; length source $5; set have; by gvkey year; retain _cik 'cik ' _cusip 'cusip' _gvkey 'gvkey' ; array src _cik _cusip _gvkey ; array rcid_data rcid_cik rcid_cusip rcid_gvkey ; array frcid_data frcid_cik frcid_cusip frcid_gvkey ; do over src; source=src; rcid=rcid_data; frcid=frcid_data; output; end; run;

PaigeMiller · ‎11-05-2023

@Eileen1496 wrote: Hi PaigeMiller, Thank you! I also think about this, but I'm worried in this way: 1. the dataset after combining them all together is too large 2. thus if i process any code on the large data set, it would take a long time, if something went wrong, i have to wait for a long time to realize that and redo everything. Hence, I want to try splitting small dataset. But yes in this case, combing and apply my correct code works. @Eileen1496 Please explain the background and context of what you are doing, please explain the background of the actual large problem you are trying to solve. Do not limit the discussion to simply programming issues. I can't help further if I don't know what you are doing and why you are doing it. If you want to test code without waiting a long time, you can use the large data set with data set options such as OBS=1000, which will run your code on the first 1000 data points to see if it works. Splitting data up in the long run will take a lot longer (for the computer to run, and a lot longer to program it properly) than working with just one data set.

Eileen1496 · ‎11-04-2023

I put them completely in a local drive, and it is way faster!

Eileen1496 · ‎09-28-2023

Getting a frequency table and then decide my own list is a good advice! Indeed when I try to match using tfidf before in Python, even two firms with different names, as long as they have one words in common they have very high score. I need to figure out this later.

GPatel · ‎09-25-2023

You can try : Suppose your abnormal characters in SAS file is in "Comments" column/field. Use code below to remove/replace weird/abnormal characters. Comments=KCVT(compress( Comments,,'kw'), 'wlatin1', 'UTF-8'); or Comments =compress(Comments, , 'kw');

Eileen1496 · ‎08-06-2023

Thank you, it indeed solves the problem!

Online Status	Offline
Date Last Visited	‎08-31-2024 01:07 PM

Re: Problem with nested macro do loop

Re: Problem with nested macro do loop

Problem with nested macro do loop

Re: SAS Batch running stuck while interactive mode works

Re: SAS Batch running stuck while interactive mode works

Re: SAS Batch running stuck while interactive mode works

SAS Batch running stuck while interactive mode works

Re: Why my CONNECT is not working well?

Why my CONNECT is not working well?

Re: how to inherit the temporary work folder when using SAS/CONNECT

Re: Why my CONNECT is not working well?

Re: SAS not working fully

Re: Problem when using order by

Re: Problem with nested macro do loop

Re: SAS Batch running stuck while interactive mode works

Re: Why my CONNECT is not working well?

Re: how to inherit the temporary work folder when using SAS/CONNECT

Re: How do I know how to adapt my code to proc ds2 structure?

Re: SAS Licensing Model - New Question

Re: split large data into small ones without knowing how many rows, an...

Re: How to reshape multiple variables with different suffix?

Re: Nested Do loop to read different data files and save different dat...

Re: SAS not working fully

Re: How to drop words with no real meaning in a string?

Re: Abnormal characters in sas file

Re: Problem when using order by