Thursday, 31 March 2016

Generate Page Numbers in X of Y Format in ODS RTF Output

 STEP 1. CREATE A PAGE VARIABLE IN THE FINAL DATASET BEFORE USING THE PROC REPORT PROCEDURE

A numeric variable containing page number values will be used in the PROC REPORT procedure to generate Page X of Y. To get page number values, first you have to figure out what is the maximum number of rows each page body (excluding title, footnote, and free margin space areas) can contain. The best way to get this number is to use the PROC PEPORT procedure to test it. In theory, if there are no wrapped columns, no blank rows, and assuming that each observation occupies only one row in the output, the total number of pages should be equal to the number of total rows divided by the maximum row number of a page if there is no remainder, or to that value +1 if there is a remainder. However, almost every RTF output contains blank rows at least for cosmetic purposes. Some RTF outputs such as most of listings have wrapped columns, which means that one single observation in the SAS dataset could be displayed in multiple rows depending on the maximum length of the variable in that    observation. As a result, how many rows each page contains will vary. This makes the job to create page numbers much more complicated.
The following three steps have been developed to get the page number:
1). Get the number of wraps for each wrapped column with the CEIL and LENGTH functions.
 wrapx = CEIL(LENGTH(colx) /Nx) ;
wrapx stands for the number of wraps for column x.
 Colx stands for column x.
Nx stands for the maximum number of characters column x can have. For example, if column x is 35
characters long, then replace Nx with 35.

2). Find out the maximum wrap count with the MAX function:
maxwrap = MAX(wrap1, wrap2, , …) ;

3). Calculate the page number.

DATA fin;
SET xyz ;
BY sitesubj ;
RETAIN pg 0 pgcnt 0 ;
IF FIRST.sitesubj THEN pgcnt = pgcnt + maxwrap + 1 ;
ELSE pgcnt = pgcnt + maxwrap ;
IF pgcnt > pnum THEN DO;
Pgcnt = maxwrap + 1;
Pg = pg + 1;
END;
RUN;

pg is the page number variable.
pnum stands for the maximum number of rows that each page body can contain. For example, if the
maximum row number that each page body can contain is 30, then replace pnum with 30.

STEP 2. CREATE A MACRO VARIABLE THAT CARRIES A VALUE FOR THE TOTAL NUMBER OF PAGES


The PROC SQL procedure can be used to create a macro variable - &tpg that carries the maximum number of pages:
PROC SQL NOPRINT;
SELECT max(pg) into :tpg
FROM fin ;
QUIT;


STEP 3. GENERATE PAGE NUMBERS IN X of Y FORMAT IN ODS RTF OUTPUTS


The following macro generates RTF output.
ODS RTF FILE = “xyz.rtf” ;
 %MACRO Rept ;
   %DO i = 1 %to &tpg ;
    TITLE1 J=L "xyz Inc." J=C "Confidential" J=R "Page &i of &tpg" ;
 PROC REPORT DATA = fin (where=(pg= &i)) missing nowd headline headskip;
 COLUMN pg col01 col02 col03 col04 ;
 DEFINE pg / order order=internal noprint ;
 DEFINE col01 / … ;
                ……
 BREAK after pg / page ;
 RUN ;
   %END ;
 %MEND;
 %Rept ;
ODS RTF CLOSE ;


CONCLUSION
SAS’s position on RTF is that the format is intended for modification. However, this is not the case for drug submission. Once the final batch run is finished, the outputs are not supposed to be modified any more. The approach to generate page numbers in X of Y format as discussed above is straightforward without having to deal with the Microsoft Word field code information. A drawback is that the number of rows in each page may not be even. Some pages may contain more rows with less white space whereas other pages may have more white space.
However, the benefi

Monday, 28 March 2016

Proc SQL Tricks:1

Draw a horizontal histogram :

A histogram visualizes the distribution pattern of a variable. PROC SQL can draw a horizontal histogram
by showing the frequency bars with a few asterisks for each level of the variable age.

procsql;
 selectage, repeat('*',count(*)*4) as Frequency
  fromSashelp.class
   groupby age
    orderby age;
quit;


OUTPUT:

Tuesday, 22 March 2016

Many-to-Many MERGE

Sample data:
data one;
input ID $ c1 $ n1;
cards;
A aa 11
A bb 12
A cc 13
B dd 14
;
data two;
input ID $ n2;
cards;
A 21
A 22
B 23
B 24
;
Code:
data combined;
merge one two;
by ID;
run;
Result:
ID    c1    n1    n2

A aa 11 21
bb 12 22
cc 13 22

B dd 14 23
dd 14 24
Usually such a merge is undesirable, but not always. In some situations, the problem is to exclude the repetitions caused by the "inheritance" of values (notice 22, dd, and 14 in the example). Here is a simple solution:
data combined;
merge one two;
by ID;
output;
call missing(of _all_);
run;
Result:
ID    c1    n1    n2

A aa 11 21
bb 12 22
cc 13 .

B dd 14 23
. 24
Of course if the DATA step is more complicated and involves additional variables which are RETAINed, the process performed by the CALL MISSING statement has to be selective.
Note: This technique is applicable to one-to-many or many-to-one situations as well as to many-to-many.

Monday, 21 March 2016

2. DATA STEP

   
Used for: Names the SAS data set and creates the dataset.
In a Data step the following can be performed.
  •  defining the variables
  •  read input files
  •  assign values to the variables,
  •  creating new variables,
  •  merging two or more data sets
  •  formatting and labeling variables
  •  assignment of missing values.



If a variable is used in a SAS program but not initialized then SAS automatically assigns a missing value to it.
Numeric missing values are represented by a single period (.). Character missing values are represented by a single
blank enclosed in quotes (' ').

Syntax:
DATA <SOMENAME>;

The DATA step starts with the DATA statement.
The data set name should be 1-32 characters and must begin with a letter or underscore.
Any text within <> represents optional material or a user defined name or value.

Example:
DATA EMPDAT;

The following SAS statements can be used in a DATA step.

2.1 INFILE AND INPUT STATEMENT

Used for: INFILE is used to read external files (such as mainframe file, text files, comma delimited files etc).
INPUT is used to define names and order of variables for the SAS dataset.

Syntax:
INFILE file-specification <options>;
INPUT variable variable_type column(s);

Example1:

DATA EMPFL;
 INFILE 'c:\emp\external\emp1.dat';
 INPUT
 @001 EMPNO $CHAR6.
 @007 NAME $CHAR15.
 @022 AGE 3.;
 RUN;

In this example the external file “emp1.dat” which is stored in the location “c:\emp\external” will be read and copied into SAS file EMPFL. Three variables EMPNO, NAME and AGE will be read.

2.2 SET STATEMENT

Used for: Reads one or more existing SAS datasets
Syntax:

SET <SAS dataset name> <(OBS=n)>;

Where “n” is the number of observations you want to read from the file.
Example1:

Copies first 100 records of OLDFILE1 to NEWFILE.

DATA NEWFILE;
 SET OLDFILE1 (OBS=100);
RUN;

Example2:

Concatenate two datasets.
Input SAS datasets:
DATA COMMON;
 SET NWEST SWEST;
RUN;


2.3 IF/THEN; ELSE; STATEMENT

Used for: Used for conditional checking. Used in a data step.

Syntax:

IF expression THEN statement;
<ELSE statement;>
Example:
IF LANG='Spanish' or LANG='French' THEN
NEWLANG='NotEngl';
ELSE
NEWLANG='English';

In this example ‘Spanish’ is a character constant.

Example 2:

IF status=‘M' AND type=1 THEN  count=count+1;

2.4 SUBSETTING “IF” STATEMENT

Used for: To subset, or take a portion of the data set

Syntax:

IF expression;

Where expression is any SAS expression.
Example:

DATA FORGNER;
 IF LANG= ‘ENGLISH’ AND TAX >= 20000;
RUN;

In the above example the subsetting IF statements will select all observations where LANG equals ‘ENGLISH’ and
where TAX greater than and equal to 20000.
Notice that values for character variables must be enclosed in quotes and values must match exactly including case.

2.5 LIBNAME STATEMENT

Used for: Associates a libref with a SAS library. It’s kind of location pointer.
This is generally used when we want to save the SAS dataset in a permanent location. When a libref is not
associated with SAS dataset, SAS assumes it is created or read from SAS work area which is temporary in nature.

Syntax:

LIBNAME libref <engine>'('SAS-data-library-1' <,..'SAS-data-library-n'> ) ';

Example:
In windows –

LIBNAME EMPLIB1 ‘C:\data\SAS\EMPDATA’;

The path ‘C:\data\SAS\EMPDATA’ will be assigned to libref EMPLIB1. Say there is a SAS dataset EMPDAT in that

library, it can be access as,
DATA EMPFILE1;
 SET EMPLIB1.EMPDAT;
RUN;
In Unix –

LIBNAME CLAIM ‘\data\CL\CLAIM\INPUT’;
DATA CLAIM.LOSS_HISTORY;
 SET LOSS_TEMP;
RUN;
In this example the temporary file LOSS_TEMP will be stored in a permanent location ‘\data\CL\CLAIM\INPUT’
with a new name LOSS_HISTORY.

2.6 MERGE STATEMENT
Used for: Joins corresponding observations from two or more SAS data sets.
Syntax:

DATA new-sasdataset;
 MERGE SAS dataset name-1 SAS dataset name-2 ... SAS dataset name-n;
 BY var1 var2 varm;
RUN;

Input data sets must be sorted by the same BY variables before you can merge them.
Example:
Merge the two files SURVY and NAMES by the variable NAME.

DATA NEW;
 MERGE NAMES
 SURVY;
 BY NAME;
RUN;



1. SAS and its Rukes

1.1 WHAT IS SAS?

Originally SAS was an abbreviation for Statistical Analysis System, now SAS stands on its own.
Useful for the following types of task:
  •  Data entry, retrieval, and management
  •  Report writing and graphics
  •  Statistical and mathematical analysis


1.2 UNDERSTANDING TERMS IN A SAS DATA STRUCTURE

  •  DATA VALUE: The basic unit of information.
  •  VARIABLE or COLUMN: A set of data values that describes a given attribute. 2 main types of variable


types: numeric and character
  •  OBSERVATION or ROW: All the data values associated with a particular record.
  •  DATA SET or TABLE: It is a collection of observation.



1.3 RULES FOR SAS NAMES

  •  SAS variable/variable names must be between 1 and 32 characters long.
  •  The first character must be a letter or an underscore.
  •  Characters after the first may be letters, digits or underscores.
 Example: NAME, _NAME_, FILE1, _NULL_ etc.


1.4 RULES FOR SAS STATEMENTS


  •  SAS statements may begin in any column of the line.
  •  SAS statements end with a semicolon (;).
  •  Some SAS statements consist of more than one line of commands.
  •  A SAS statement may continue over more than one line.


One or more blanks should be placed between items in SAS statements. If the items are special characters such as '=', '+', '$', the blanks are not necessary.

There are two major building blocks in SAS.
  •  DATA Step
  •  PROC Step