CDSP-k2 Assembler

CDSP-k2 Assember

Memory model and assembler directives

The CDSP-k2 assembler, apart from assembling the specific instruction set mnemonics (as listed in the instruction set table), also interprets a number of directives used to define the way the memory is allocated for the three modules: DataRAM, ProgramROM, and ConstantsROM. These commands are ".org", ".seg", ".cst", ".num"

The ProgramROM Memory

The .org directive

The assembly program op-codes are assembled at successive memory locations in the ProgramROM, while the constants that appear in the program are allocated in the ConstantsROM, which leads to having two separate memory spaces that store the complete machine code program.
The .org command is used in defining the way the op-codes are allocated in the ProgramROM: As the assembly language program lines are assembled, the generated op-codes are allocated at successive addresses in the ProgramROM using a ProgramROM allocation pointer P_allocp, starting with address 0x0. Each processor instruction translates into exactly one location in the ProgramROM, thus the P_allocp is incremented by one after each instruction op-code allocation. The .org directive changes the P_allocp to a specified value, thus allowing to control where the program op-codes are placed.

The .org syntax is:
.org ProgramROM_Address

The program following after a .org line in the assembler file is allocated starting with location ProgramROM_Address.

Example

   .org 0x10;   the program allocation pointer is set to 0x10
   nop;         this instruction is placed at address 0x10 in the ProgramROM
   .org 0x20;   the program allocation pointer is set to 0x20
   nop;         this instruction is placed at address 0x20 in the ProgramROM

Remark

The current version of the assembler does not check for allocation collisions in the ProgramROM; the programmer has to ensure that a ProgramROM location is not overwritten by improper usage of the .org directive.

The ConstantsROM Memory

Some of the processor instructions require a numeric argument that cannot be stored embedded in the instruction; these instructions make use of a relatively small field in the instruction code to specify the address in the ConstantsROM of their numeric argument.
This arrangement provides a high code density for two reasons:
First, the instruction op-codes are no longer related to the size of the processor data word, thus allowing for a highly efficient usage of the op-code bits,
Second, because the instruction arguments reside in a physically (and logically) different memory space, it is possible to physically store only once an argument value, even if that value is used in more than one place in the program (for example, there will only be one copy of the constant Zero in the memory, although this is a very frequently used constant; all instructions having a Zero argument will only use a few bits to point to it in the ConstantsROM)
However, this arrangement also limits the number of different explicit immediate numeric constants that a program can use. One can only have 256 explicit constants in a CDSP-k2 program because the constants referencing bit field in the instruction code is 8 bits wide.

A program can use far more than 256 constants, but not as immediate arguments in assembler instructions (the ConstantsROM addresses above 256 can only be accessed via the indexed addressing mode that use the x0 pointer register).

In order to allow efficient algorithm implementations one can physically divide the ConstantsROM into partitions, each partition accommodating different word lengths. This will allow to have, for example, full precision constants used as instruction arguments, while also allowing to store lower precision tables efficiently in a short-word partition.
This memory organization is supported by the assembler via multiple constants allocation pointers. The assembler allows the declaration of ConstantsROM memory allocation pointers, and then provides ways to explicitly specify which pointer to use when allocating specific constants. The assembler directives that support this ConstantsROM memory organization model are .seg, .cst, .num.

The .seg directive

The .seg directive declares a ConstantsROM memory allocation pointer using the following syntax:

   .seg %SegmentName [SegmentMinAddress SegmentMaxAddress]

When this directive is assembled a memory segment named SegmentName is declared in the ConstanstsROM starting at address SegmentMinAddress and ending at SegmentMaxAddress; also a segment allocation pointer is defined at this stage and initialized with the lower limit address of the segment (i.e. SegmentMinAddress).
A memory segment cannot be re-declared (i.e. the limits of the segment can only be defined once in the assembler source and cannot be further modified).
Any program (that uses at least one constant) has to declare and define at least one ConstantsROM memory segment.

Example

.seg %KSeg0 [0x00 0xff] ;declare a ConstantsROM segment starting at address 0 and
                        ;ending at address 255, i.e. it will contain all the immediate
                        ;numeric constants used in the program; also declare an
                        ;allocation pointer Kseg0 and initialize it with 0.

Remark

Multiple segments can overlap, and one can even have multiple segments declared as containing the same memory address range. This can be useful if one needs more than one allocation pointer in the same address range.

The .num directive

The .num directive is used to specify in which ConstantsROM segment the immediate instruction arguments are to be placed. Before using any instruction that has an immediate numeric argument, one has to declare a segment and map the immediate constants to that segment via the .num directive.

Example

.seg %KSeg0 [0x00 0x0f]  ;declare a 16-word segment KSeg0 between 0x00 and 0x0f
.num %KSeg0              ;specify that immediate numeric arguments are to be
                         ;allocated in segment KSeg0
     lda r0,12:34;       ;the immediate complex numeric constant 12:34 will be
                         ;placed at address 0x00 in the ConstantsROM and KSeg0's
                         ;allocation pointer will be incremented to address 0x01
     lda r1,56:78        ;the 56:78 constant will be placed at address 0x01, etc

The assembler automatically detects multiple occurrences of immediate numeric constants inside a segment, and only keeps one copy for a constant. For example, consider the following instruction coming in continuation of the above code:

      lda r2,12:34   ;the 12:34 constant will not be re-allocated; the instruction
                     ;op-code will reference the above-allocated constant at
                     ;ConstantsROM memory location 0x00.

The .num directive can be used at any time inside the assembler program; each time it is encountered it instructs the assembler that, starting with the next line, constants will be allocated in the specified segment, using the specified segment's allocation pointer

Example

.num %Segment_2      ;start allocating explicit numeric constants in Segemnt_2
     lda r0,0.123    ;0.123 is stored at whatever address the Segemnt_2
                     ;allocation pointer was pointing; after that the allocation
                     ;pointer is incremented by one.

The above .num directive causes the immediate constants to be allocated using the Segment_2 allocation pointer. Thus the following "0.123" constant will be placed in Segment_2, at the location pointed to by the Segment_2 allocation pointer (at whatever address the pointer was indicating). Constants will continue to be allocated in the Segment_2 segment until a new .num directive is met (and that specifies a new segment), or until the segment allocation pointer overflows (in this case an error message is reported by the assembler).

Constants that are not declared inside the same segment are not subject to the "constants merging" optimization because the algorithm might rely on having a specific representation (precision, etc) of a constant when declaring it in a specific segment. Thus, for example, if a Zero constant appears in the source file inside the scope of one ConstantsROM segment (say KSeg0) and then again in the scope of another segment (say KSeg1), both segments will store one (but only one) copy of the Zero constant.

Example

   .seg %KSeg0 [0x00 0x0f]
   .seg %KSeg1 [0x10 0x1f]

   .num %KSeg0
   lda r0,0 ;store constant Zero at address 0x00 (i.e. beginning of segment KSeg0)

   .num %KSeg1
   lda r0,0 ;store constant Zero at address 0x10 (i.e. beginning of segment KSeg1)

The .num command also allows changing the allocation pointer that is used to allocate the constants using the following syntax:

   .num %SegmentName @NewSegmentAllocationPointerValue

This directive simply changes the value of the SegmentName allocation pointer.

The assembler also provides support for locally changing the numbers' allocation pointer, on a per-assembler-line basis. This is useful when most of the immediate numeric constants in a program have a common format (precision, etc), while several constants need a special format. In this case there will be one segment that will contain most of the immediate constants, and one (or more) segments that will store the special-format immediate constants. The syntax that allows to locally specify an allocation segment is formed by preceding the actual constant with the segment name:

   %Segment Constant

Example

   .seg %GeneralConstSeg [0x00 0xdf]
   .seg %SpecialConstSeg0 [0xe0 0xef]
   .seg %SpecialConstSeg1 [0xf0 0xff]

   .num %GeneralConstSeg

   lda r0,100               ;place 100 at address 0x00
   lda r1,%SpecialConstSeg0 101 ;place 101 at address 0xe0
   lda r2,102               ;place 102 at address 0x01
   lda r3,%SpecialConstSeg1 103 ;place 103 at address 0xf0
   lda r4,104               ;place 104 at address 0x02
   lda r5,%SpecialConstSeg0 105 ;place 105 at addess 0xe1

Remark 1

The assembler detects constants allocation collisions. Whenever an attempt is made to place a constant at a ConstantsROM address that has already been allocated, an error message is issued.

Remark 2

Given the above description of the .num directive, the complete syntax is:

   .num %Segment <@Address>

The .cst directive

The .cst directive is used to explicitly declare constants that do not (directly) appear as instruction arguments. Its main uses are to declare isolated constants at specific memory locations and to declare constants tables. The .cst syntax is (the <…> designates optional fields):

   .cst <%Segment> <@Address:> <Name> = <Value>

The %Segment field has a similar meaning as in the case of the .num directive: it specifies the segment in which the constants explicitly declared with the .cst directive will be placed, starting with the current .cst line. Following (and including) this directive the Segment becomes the "current" segment for explicit constants allocation.
Like with the .num directive, the @Address field specifies a new value for the current Segment allocation pointer. When the directive is encountered, the Address value is assigned to the Segment allocation pointer. This means that all explicit constants will, from this point on, be allocated at successive addresses after Address, inside Segment.
The Name field can be used to name the designated constants memory location.
The Value field specifies the constant to be placed in the current segment, at the location specified by the current segment's allocation pointer. In order to allow memory models that use "shadow" addresses to map (mirror) constants from one memory location to another, Value needs not be specified (it is also an optional field).

Example

 .seg %n0 [0x00 0xff]    ;declare a segment for immediate numeric constants
 .seg %k0 [0x100 0x10f]  ;declare a segment for explicit constants
 .seg %k1 [0x110 0x11f]  ;declare another segment for explicit constants
 .num %n0                ;now actually specify n0 to store immediate constants
 .cst %k0                ;now actually specify k0 to store explicit constants
 .cst k_0x00 = 0x00      ;define constant k_0x00 as being 0x00, and place it at
                         ;address 0x100 (at the beginning of segment k0)
 .cst k_0x01 = 0x01      ;similarly, define and place constant k_0x01 at address
                         ;0x101 (i.e. next address in segment k0)
 .cst @0x10a: = 0x0a     ;change the allocation pointer of the current segment k0
                         ;to address 0x10a, and place an unnamed constant 0x0a
 .cst %k1 k_0x10 = 0x10  ;change the explicit constants allocation segment to k1,
                         ;declare and define k_0x10, and place it at location 0x110
                         ;(the beginning of segment k1)
 .cst kM                 ;declare without defining a kM constant at location 0x111
 .cst %k0 = 0x1b         ;switch back to allocating explicit constants in segment k0,
                         ;and place an unnamed constant 0x1b at address 0x10b
                         ;(where the k0 allocation pointer remained after the last
                         ;allocation).

Remark

The address and value fields must be expressions that can be evaluated to a constant at assembly time; if a constant value cannot be calculated for these fields the assembler will issue an error message.

The DataRAM Memory

The processor has two ways of addressing the DataRAM memory: direct and indirect. Like with the constants addressing, the direct DataRAM access is restricted to a small number of 2*256=512 memory locations at any given moment. These two 256-word memory banks are the zero-page bank and the bar-based biased-address bank. Since only 8 bits are allocated for storing a memory location in the op-code, the op-code size is not related to the total amount of physical memory available in the system.
As described in the CDSP-k2 programming model, a "bar" base address register is used to allow direct access to banks of 256 words of memory that can reside anywhere in the physical DataRAM. The assembler supports this 256-word blocks structure with the .var directive.

The .var directive

The var directive allows to give an alias to the contents of DataRAM memory locations. The assembler keeps a single internal D_allocp DataRAM allocation pointer that is used with variables declarations. The complete .var syntax is:

   .var <@Address:> <VarName> <Size>

The @Address: field, if present, instructs the assembler to assign the Address value to the D_allocp; subsequently declared variables will thus be allocated at and after Address.
The VarName field, if present, will become the actual alias for the content of the DataRAM memory location pointed by D_allocp
If the VarName field is present, the Size field instructs the assembler to allocate Size DataRAM memory words for the variable, i.e. to increment the D_allocp by Size following the .var directive. If the Size field is absent, a one-word variable declaration is assumed and the D_allocp will be incremented by 1.
If VarName is not present, the Size field cannot be present.
The D_allocp DataRAM allocation pointer is an 8-bit pointer, i.e. only 8-bit addresses can be specified for the declared variables. In order to allow access to the whole DataRAM memory space, these addresses will have to be interpreted as relative displacements w/r to the bar register. For example, after declaring a V_Zero variable as being located at address 0 (zero), this variable name can be used either for directly accessing the zero-page address 0 (with the instruction argument V_Zero), or to access location 0 of a 256-word block that has its base address at location bar-128 (with the instruction argument bar:V_Zero).

Example

   .var @0x00: V_0x00  ;declare a symbol of type variable, assign it value 0x00

   lda r0,V_0x00       ;load r0 with DataRAM location 0
   lds.bar 0x100       ;load the base address register with 0x100
   lda r1,bar:V_0x0    ;load r1 with DataRAM location bar + 0x00 = 0x100

Memory Operands Assembler Syntax

Following is a summary of the assembler syntax for the various memory operands.
The CDSP-k2 assembler accepts three types of direct memory addressing operands: direct ConstantsROM location addressing using a numeric argument (equivalent to the common immediate addressing mode on most processors), direct ConstantsROM location addressing using a symbolic constant argument (declared with .cst), direct DataRAM location addressing using the explicit @ syntax (@address), and direct DataRAM location addressing using symbolic variables (declared with .var).

The ConstantsROM addressing using the numeric operand syntax (equivalent to the common immediate addressing mode on most processors) has been discussed in previous sections. When an immediate numeric value appears as an assembler instruction operand, the constant is allocated in the ConstantsROM (using allocation pointers as previously described), and a reference to the allocated constant is embedded in the instruction op-code.

Example

   lda r0,123:456  ;the 123:456 complex constant is placed (or, if already allocated,
                   ;is found) in the ConstantsROM, and its address is embedded in
                   ;the instruction op-code.

The ConstantsROM addressing using pre-declared and allocated constants (as described in previous sections) allows the usage of symbolic names for the contents of a ConstantsROM location. The constant's address is embedded in the instruction op-code.

Example

   .cst K_Const = 100  ;place constant 100 using the current ConstantsROM alloc pointer
   lda r0,K_Const      ;load r0 with the value of K_Const (100); the assembler is
                       ;embedding the K_Const address in the op-code.

The DataRAM addressing using explicit memory location specification accesses data based on the explicit address in the instruction operand. The address can be interpreted (depending on the addressing mode) as zero-page address or as a bar-biased address. The address (that is always 8 bits wide) is embedded in the op-code.

Example

   lda r0,@0x10     ;load r0 with the contents of memory location 0x10
   lda r1,@bar:1    ;load r1 with the contents of memory location (bar+1)
   lad r2,@bar:-1   ;load r2 with the contents of memory location (bar-1)

The DataRAM addressing using symbolic variable names provides a way to use a uniform syntax for all data references. If a symbol is declared as variable (with the .var directive), all occurrences of the symbol will signify that the contents of the specified memory location is to be used (rather than the value of the symbol itself, which is an actual memory address). This holds true for both zero-page and bar-based addressing.
One can create blocks of variables in DataRAM memory by specifying the variables in each block as starting at a given address (for example 0), and then interpreting the variables as offsets to a block base that can be loaded in the bar register. The offsets can be negative numbers, thus a negative address for a variable is valid (but it can only be meaningfully used in conjunction with the bar register)

Example

   .var @100: V_Var   ;set the variables allocation pointer to 100 and allocate
                      ;the V_Var variable at address 100 (the V_Var symbol
                      ;itself is assigned the value 100)
   .var @-100: V_VarM ;set the variables allocation pointer to -100 and define
                      ;the V_VarM symbol as being -100
   lds.bar 1000       ;make the bar register point at address 1000
   lda r0,V_Var       ;load r0 with the contents of DataRAM location 100
   lda r1,bar:V_Var   ;load r1 with the contents of DataRAM location 1100
   lda r2,bar:V_VarM  ;load r2 with the contents of DataRAM location 900

Other Assembler Directives and Functions

The .blk and .klb directives

These two directives are used to define "blocks" of assembler source code. Blocks can be used to group in logical modules code, data, or both. The outermost block level is implicitly called MAIN. Blocks cannot be nested or overlapped (there is only one level of block definitions).
All identifiers declared inside one block can be accessed from within that same block using their declared name; in order to access any identifiers that are declared in another block, a qualified notation has to be used (the qualified identifier syntax is <BlockName>.<IdentifierName>).

Example

   ExteriorBlockLabel:
;     […]
   bra ExteriorBlockLabel           ;the label needs not be qualified when
                                    ;accessed within the same block (MAIN)

   .blk DataBlock                   ;block directive used to group variables
   .equ BaseAddress 100             ;define a BaseAddress for this data block
   .var @0
   .var v0                          ;the variables' addresses will be interpreted
   .var v1                          ;as offsets with respect to the BaseAddress
   .klb DataBlock

   .blk SubrBlock
   Subr:
;     […]
   lda r0, &MAIN.ExteriorBlockLabel ;the full qualified label name has to be used here
   lds.bar DataBlock.BaseAddress    ;load variable block's base address into bar
   lda r1, bar:DataBlock.v1         ;access the v1 variable inside DataBlock
;     […]
   .klb SubrBlock

   jsr SubrBlock.Subr               ;the full qualified Subr identifier has to be
                                    ;used at this point. For using an unqualified
                                    ;identifier for the function call, the identifier
                                    ;should have been placed outside the block,
                                    ;right before the .blk directive.

Remark

The block identifiers have their name space separate from the other types of identifiers, so they are allowed to have the same name as other identifiers without conflict.

The .equ directive

The .equ directive can be used to give an alias to a constant numerical expression (that can be evaluated at assembly-time). It does not allocate any space in any of the CDSP memory modules. The complete .equ syntax is:

   .equ Symbol ConstExpression

Following this .equ directive all occurrences of Symbol in the assembler file will be replaced by the assembler with the value of ConstExpression.

Example

   .equ Expr123 123    ;the Expr123 symbol is assigned the 123 value.

Labels

Labels can be inserted at any point in the assembler file and they will be assigned with the current value of the ProgramROM allocation pointer (see previous sections). Two or more labels can be placed successively without having any assembler code between them, in which case they will all have the same value. The complete label syntax is:

Label_001: <Instruction>

A label can be placed alone on a line, or it can precede a assembler instruction.

Example

Loop:   nop     ;declare (and define) the Loop label symbol
    bra Loop    ;use the Loop label as immediate branch-instruction argument.

Remark

The label symbols can only be used as arguments for instructions in the branch group, or their symbol-value (the address they appear at) can be obtained by dereferencing.

The $ reference

During the assembly process, the $ symbol designates the "current" ProgramROM allocation pointer, i.e. the location at which the instruction in the current assembler line will be placed (see also the .org directive); this allows for an alternative to defining labels by specifying ProgramROM locations relatively to the current assembler line. This can be especially convenient for assembly language coding of short-range branches.

Example

;  [...]        ; test code sequence: if test is true r0 should be incremented
   brf $+2      ; branch to location <current>+2 if previous test failed
   add r0,1     ; if previous test returned true r0 is incremented
;  [...]        ; continue

Remark

Any constant-value expressions can be used to specify the offset for the $-based (relative) form of branches ($+<Expression>); however, in the absolute format only a label identifier is allowed (i.e. an explicit absolute numeric addreess is not allowed as a branch target)

The & de-referencing operator

The dereferencing operator can be used to obtain the actual value of a pointer symbol. Pointer symbols are used to directly refer the contents of the memory location they are pointing at, while by dereferencing the actual pointer value is obtained.

Example

   .var @10: V0
   .cst K0 = 11
   .equ Q0 = 12

   .org 1000
Label:
   lda r0,&Label    ;r0 will be loaded with the address of this instruction (1000)
   lda r1,&V0       ;r1 will be loaded with the address of variable V0 (10)
   lda r2,&K0       ;r2 will be loaded with the address of constant K0 (11)
   lda r3,&Q0       ;illegal dereferencing: Q0 is not a pointer symbol

Numeric Expressions

Constant Expressions

The assembler supports a set of constant-value expressions consisting of basic arithmetic operators applied to numbers and symbols. Only numeric constants and symbols declared with the .equ directive can be used directly as constants in the expressions; the lables, .cst-declared constants and the .var-declared variables can only have their address taken and used in constant expressions (see the & operator).
The supported operators are, in the order of precedence, [unary '+', '-'], [the complex number constructor ':'], ['bool', address-of '&', one's complements '~'], [binary '*'], [binary '+', '-'], [arithmetic shift '<<', '>>'], [binary logic '&', '|', '^'], and the paranthesis '(', ')'.

Example

   .cst k0 = 100
   .var v0
   .equ e0 = 10
   lda r0, &k0+e0
   lda r1, &v0+64
   lda r2, k0+v0+e0; illegal: cannot use any of constant data, label,
                   ; or variable values in an expression without dereferencing

Remark

Because the precedence of the unary '+' and '-' operators is greater than the precedence of the ':' operator (complex number constructor), expressions consisting of arithmetic operations on complex numbers that have signed components can be written in a straight-forward way:

   lda r0, -10:20             -> (-10: 20)
   lda r0,  10:20 + -30:-40;  -> ( 10: 20) + (-30:-40)
   lda r0,  10:20 -  30: 40;  -> ( 10: 20) - ( 30: 40) -> ( 10: 20) + (-30:-40)
   lda r0,  10:20 -  30:-40;  -> ( 10: 20) - ( 30:-40) -> ( 10: 20) + (-30: 40)
   lda r0,  10:20 - -30:-40;  -> ( 10: 20) - (-30:-40) -> ( 10: 20) + ( 30: 40)

Conditional Assembly

Conditional assembly directives

The assembler supports conditional assembly through the following set of commands:

   .if BOOL_ConstExpression
;     [assembler statements]
   .else
;     [assembler statements]
   .endif

The BoolConstExpression can contain symbols defined with the .equ directive and numbers.
In addition to the operators accepted in constant expressions, the BoolConstExpression can contain the following relational operators, in decreasing order of precedence:
[less than "<", greater than ">", smaller or equial "<=", greater or equal ">=", is equal "==", is not equal "!="], [boolean and "&&", boolean or "||"].
These operators have lower precedence than the all arithmetic operators previously mentioned.

Example

   .equ DEBUG_LEVEL 1   ;setup a conditional assebly parameter
;  [...]
   .if DEBUG_LEVEL >= 1
      jsr debug_all     ;insert a routine call only used during the debug process
   .endif
;  [...]