About ErikLund_Jensen

ErikLund_Jensen · ‎04-01-2019

Hi @elli444 I think this small modification (check for indexw > 1) will work. The problem with your code is that indexw ignores blanks. Even if you specify indexw(' AKA '), it treats is as 'AKA', so the condition vill be true also if name_last starts with AKA. data have; infile datalines truncover; input name_last $char50.; datalines; Hansen AKA Hans AKA Dummy Saint George AKA Dragon ; data want; set have; if indexw(name_last,'AKA') > 1 then do; name_last=substr(name_last,1,indexw(name_last,'AKA')-1); end; run;

ErikLund_Jensen · ‎04-01-2019

Hi everybody I have a problem with reading external files, and I hope that somebody with a better understanding of encoding issues can help me out. The files are exported as delimited files from an external system outside our control, and the files are in utf8-encoding, but contain some strings with a different encoding. We use SAS 9.4M5 Linux GRID, with system encoding=Latin9. I made a small test file (attached) . It looks like this in VI editor, and that is what I want as output: "813"#"Afsluttet"#"Klavs Hansen"#"Elev ½ tid)" "445"#"I gang"#"UU Nordvestsjælland"#"SSH´er" "427"#"Afbrudt"#"Systemoverførsel"#"VVS´er" When I read it into SAS with UTF-8 encoding, I get the national characters æ and ø in field 3 correct, but run into problems with the special characters in field 4: %let file = /sasdata/udvk/data_beskyt/ungevejledning_beskyt/1_grunddata/eksterne_filer/test.txt; filename ind "&file" encoding="utf-8"; data test; infile ind dsd dlm="#" truncover; informat id 8. status $char30. source $char30. udd $char30.; input id status source udd; run; NOTE: The infile IND is: Filename=/sasdata/udvk/data_beskyt/ungevejledning_beskyt/1_grunddata/eksterne_filer/test.txt, Owner Name=sasbatch, Group Name=torg-odk-sas9-etl, Access Permission=-rw-r--r--, Last Modified=01. april 2019 18:20:34, File Size (bytes)=141 WARNING: A character that could not be transcoded has been replaced in record 1. WARNING: A character that could not be transcoded has been replaced in record 2. WARNING: A character that could not be transcoded has been replaced in record 3. NOTE: 3 records were read from the infile IND. The minimum record length was 43. The maximum record length was 46. NOTE: The data set WORK.TEST has 3 observations and 4 variables. NOTE: DATA statement used (Total process time): real time 0.02 seconds cpu time 0.00 seconds Note that the problematic characters are lost, they are all translated to hex 1A. The characters are double-byte with hex values C2BD or C2B4 in this example. There are several others in real data, but all of the same type. If I read the files with system encoding (latin9), I get the two bytes, so they could be handled in the program, but I also get all the valid UTF-8 characters that way, as in the following example. The files are large, millions of records, and delivered daily, and we will every now and then get new double-byte characters, both valid and invalid in UTF-8, so it will be and endless maintenance task to read the files with latin encoding and idenfify and change all double-byte characters. So that is not really an option. But because the VI editor can display all characters correct, i think it should be possible i SAS also, so I must be missing something. All suggestions will be highly appreciated.

ErikLund_Jensen · ‎04-01-2019

Hi @elli444 This happens if AKA is the first word in last_name. Then indexw returnns 1, and the third argument becomes 0, which is illegal. What should happen in this case? - last_name left unchanged or last_name set empty?

ErikLund_Jensen · ‎03-29-2019

Hi @eabc0351 You should in principle be able to hold 3 generations of your 600GB data set in your allocated 2TB. But depending on your process, space for an extra copy (.lck extension) may be necessary, and in that case there is only space for 2 generations. Depending on the content of the big data set, compression can work wonders. In my work a data set is often reduced to anything from 20 to 60% of the uncompressed size, and the extra computing time is not substantially increased. Try both compressing algoritms, compress=yes and compress=binary and see what happens. How much space is allocated to the sas work/utilloc libraries in your installation? This might be a bottleneck with such large data sets.

ErikLund_Jensen · ‎03-28-2019

Hi @SR2019 Linux SAS cannot handle native Excel file formats, only xlsx-files, and there is no OLE support. But if your spreadsheet is xlsx, you can read xlsx files with proc import or the xlsx libname engine and write them proc export, the xlsx libname engine or ODS the same way as in windows. There is no difference. Usually, the difficult part is to access the files, because spreadsheets normally resides on windows machines in places where the linux system can't see them, so it is necessary to FTP them to linux or move them to a windows folder, that is mounted on the linux system. If it is not meant for production there is also SAS Enterprise Guide, and for production work there is a SAS PC files server that might make things a little easier, but I have no experience with it and don't miss it, because both FTP and mount works well for us, at least as well as the match between SAS and Excel usually works. These two are not meant for each other!

ErikLund_Jensen · ‎03-28-2019

Hi @chithra Here is a slightly different way of doing just the same as Kurtbremser's code. It uses the point= option instead of firstobs=2. It is in no way better, I just post it because I want to promote use of the point= option, because it is so useful for look-ahead and -back, also in more complicated cases. data want (drop= lastyear nextyear i); set have; by id scn; nextrec = _N_ + 1; lastyear = year; output; if not last.scn then do; set have (keep=year rename=(year=nextyear)) point=nextrec; do i = lastyear + 1 to nextyear - 1; year = i; value = 0; output; end; end; run;

ErikLund_Jensen · ‎03-27-2019

This will also give the "Invalid argument to substr" note, if the string "DEAD" is not found.

ErikLund_Jensen · ‎03-27-2019

@Alexxxxxxx The note is caused by an illegal third parameter to substr. If then string "DEAD" is not found, find returns 0, and then your substring vill be substr(name,1,0), which is illegal. This removes the notes: data want(drop=found); set have; found = find(name,'DEAD'); if found > 1 then name=substr(name,1,found-1); run; If found = 0 then name remains unchanged.

ErikLund_Jensen · ‎03-27-2019

Hi @ybz12003 Your date value is still 01JAN8888. The format prints it as 88/88/8888, but your check is against the value, not the formatted value, so it should work if you changed your check to data want; set have; array datelist(*) &dateVars; do i=1 to dim(datelist); if datelist (i) = '01jan8888'd; end; run;

ErikLund_Jensen · ‎03-26-2019

Hi @daradanye The number 21. is taken care of by the good modifications to my code by @ChrisNZ. I would expect more problems to pop up with a full input data set. Either something more that should be removed, or too much cleaning, where a meaningful part of a company name is removed - what would happen if the next company name in your data is Century 21. As you probably will need more modifications to the code, you should acquire some regex knowledge. The basics are covered in the very good tip sheet https://support.sas.com/rnd/base/datastep/perl_regexp/regexp-tip-sheet.pdf

ErikLund_Jensen · ‎03-25-2019

Hi @daradanye The following code works with your data. But there might be other cases where something not covered here should be removed. The first prxchange keeps anything before the last hyphen. the next removed a separate word containing only period, percentage sign or digits, the third removes anything within parentheses, and the last takes care of a period left over in the second record. data have; infile datalines truncover; input line $char100.; datalines; Dresser-Rand International B.V. 100.0 - Netherlands Becker CPA Review Limited (2), Corporation - Israel Union Planters National Bank (a)(1) 99.90% - USA 21. Hypercom Horizon, Inc - Missouri, USA El Paso Energy Service Company 100.0000 - Delaware, USA ; run; data want (drop=w); set have; length company w $80.; w = prxchange('s/(.*)-(.*$)/$1/',-1,trim(line)); w = prxchange('s/(.*)\s([\d\.%]*$)/$1/',-1,trim(w)); w = prxchange('s/$.*$/ /',-1,trim(w)); company = prxchange('s/\s,\s//',-1,trim(w)); run;

ErikLund_Jensen · ‎03-25-2019

Hi @Seb_A_Sanders I think your only way out of the problem is to use SQL passthru. This way you can handle +32 long names of both tables and columns. Se the following code, which is a subset of working production code cut down to 3 variables: options Validvarname = V7; proc sql; connect using xkmdne as sconn; create table &_OUTPUT (label="archive_NEXUS2_medicine_medication_additional_information") as select medication_id length=8 , administratively_deleted_datetim length=8 format=nldatm20. label="administratively_deleted_datetime" , last_past_fmk_dosage_period_end_ length=8 format=ddmmyyd10. label="last_past_fmk_dosage_period_end_date" from connection to sconn ( select medication_id, administratively_deleted_datetime as administratively_deleted_datetim, last_past_fmk_dosage_period_end_date as last_past_fmk_dosage_period_end_ from archive_NEXUS2_medicine_medication_additional_information ); disconnect from sconn; quit; Explanation: Make a connection to the relevant SQL database using a previously assigned libname to the database. Create a new SAS table with name abbreviated to 32 chars. Use original SQL table name as dataset label. select from connection to database, use original SQL column names as variable labels. Connection to database gives the result of the second select, where long column names are abbreviated to sonething useful. This select is sent to execution in the database, so all long names are kept on the SQL Server side, and SAS sees only your abbreviations.

ErikLund_Jensen · ‎03-24-2019

@uspanchal Would you care to explain your problem. It might be possible to simulate a data step using I/O functions in a macro, but it would be downright stupid to do it, so what are you trying to achieve?

ErikLund_Jensen · ‎03-21-2019

- as I told you earlier

ErikLund_Jensen · ‎03-21-2019

Hi @Pabster I assume that your data are SAS tables, and in SAS a date value is the number of days since 01jan1960. You can put it with a format and assign these formats to the variable in the data set also, but the format does not change the value, only the way it is displayed. You can use the SAS year-function to change the value "on the fly" in your comparison, like xx <= year(DLDTE). This makes the comparison operator to work against the extracted year and not the date value. You could also prepare data by adding an extra variable with the year value using the same function.

Online Status	Offline
Date Last Visited	‎06-24-2025 05:37 AM

Re: SAS Metadata: list of all attributes you can use in METADATA_GETAT...

Re: SAS Metadata: list of all attributes you can use in METADATA_GETAT...

Re: Split name vertically

Re: How to flag the last AVALC = 'Y' record prior to this AVALC='N' re...

Re: The use of last.id by multiple groups

Re: combine datasets even when results are missing

Re: combine datasets even when results are missing

Re: How do i parse specific number combinations from a field in a tabl...

Re: How do i parse specific number combinations from a field in a tabl...

Re: Pseudonymization of sensitive data

Re: How to convert DATETIME to ISO8601 format ?

Re: create macro vars-End of month

Re: Flag the next value of a variable

Re: SAS Programming 1_Lesson 5_p105a03.sas: repeating the same code do...

Re: Write SAS Time fields to Excel - loses time format

Re: combine datasets even when results are missing

Re: Pseudonymization of sensitive data

Re: SAS Connect To ODBC with Big Query issue

Re: SAS code for assigning different values of race for an individual ...

Re: Unable to open a particular job in SAS DI Studio

Re: Invalid third argument to function SUBSTR

Encoding problem while reading delimited file

Re: Invalid third argument to function SUBSTR

Re: Historical Datasets Without Multiple Permanent Datasets

Re: Excel and SAS GRID

Re: Add consecutive numbers in between of the dataset

Re: delete the strings beginning at 'DEAD'

Re: delete the strings beginning at 'DEAD'

Re: How to inquire specific date?

Re: SAS scan(trim) and regex

Re: SAS scan(trim) and regex

Re: Handling / Differentiating identically named variables in source t...

Re: how to compare two numeric value with macro code using

Re: PROC SQL Merging error

Re: PROC SQL Merging error

SAS Inner Circle Panel

SAS Analytics Explorers